Curve-fitting is a term for fitting a statistical distribution to a series of claims, in order to make analysis easier by giving a smooth representation of the claim dataset. It is a commonly used technique within the actuarial profession and the results of curve-fitting exercises are relied upon in key areas such as pricing and capital modelling. With ever-increasing scrutiny on the output and validity of actuarial models, now is an ideal time to review current methodology and highlight some of the many pitfalls that actuaries face when fitting curves in practice.

Curve-fitting can give an estimate of claims where there is no experience — this is where it has the greatest financial impact. For example, with excess-of-loss insurance, companies will often buy up to 10 times their maximum historical loss. Actuaries have few tools to help put a value on these contracts and curve-fitting may help, but there are common pitfalls. We address some of these issues in a reinsurance pricing context.

**The size of your curves**

First, when fitting a statistical distribution to a claim set, a suitable minimum and maximum value for the distribution must be chosen. This step is often overlooked or not given enough attention. In the reinsurance pricing world, we usually fit to claims that have been provided above an observation level, such as reported claims above £1m. The claims need to be trended to the middle of the contract period and developed to ultimate. It is important to trend the observation level to ensure a consistent cohort of claims. £1m trended back to 2001 at 5% would be around £1.5m. Therefore, if we want to use claims from 2001 we must select at least £1.5m as our minimum. If we pick a lower value but use 2001 claims, we are potentially omitting claims that would trend above our threshold but never make it into our sample because they were below the observation level. On the other hand, using £1.5m we could lose some more recent claims that are in the range £1m-1.5m.

Another related issue is that the further back in time you go, the more important the inflation assumption becomes and the more likely that there is a change in portfolio composition or mix of business. Conversely, with more recent years the loss development of claims assumption becomes more critical.

**Actuaries and models — a perfect fit? **

Once you have decided which years to select, your minimum and maximum thresholds, and are happy to make the assumption that the claims are homogeneous and independent and identically distributed, you can actually do some fitting.

We consider the problem of modelling the losses on a typical UK motor excess of loss reinsurance programme, in this example unlimited cover with an excess of £2m. Initially, we wanted to work with a theoretical dataset and document the difficulties before moving onto ‘real’ UK motor data. In this article we talk about the theoretical study only.

We sampled 3,000 claims from a simple Pareto distribution with a fixed alpha of 1.6. This is a far greater number of claims than would typically be found in a real-world case. We then used MetaRisk Fit to test 28 statistical distributions. The first finding was that, without previous knowledge, it was not obvious that we should choose a Pareto distribution. The Pareto distribution was lurking in the top quartile of all our tests but was not a clear first choice. Our tests consisted of visual inspection of the empirical cumulative distribution function against that implied by the test distribution and also looking at goodness of fit measures.*..*

Figure 1 shows what the impact would have been if we had selected another distribution, such as the lognormal or transformed beta for our motor reinsurance structure. The red dots are the expected loss to the reinsurance contracts rebased to 100% for the Pareto 1.6 and the results from using other distributions are relative to this level.

As expected, there can be mis-pricing if the wrong distribution is selected, and the error increases as you move up to the higher layers. Although it is widely known that the lognormal has a ‘thin' tail, it remains a commonly selected distribution in stochastic models.

The next step in our theoretical experiment is to assume that we choose the right distribution, but not the correct value, for alpha. Figure 2 shows the sensitivity of the reinsurance layers to the choice of alpha. Again, the sample distribution is our base; it shows that a 10% increase in the value for alpha can lead to a decrease in the reinsurance loss in the higher layers of up to 35%. A 10% decrease in alpha can lead to over-pricing of 60% or more.

**Parameter uncertainty**

The uncertainty in choosing the right value for the parameter(s) is due to having a limited claims dataset. If we believe the claims follow an underlying distribution, then the smaller the claims dataset the further we will be from the true parameters. The greater the number of parameters required to define the distribution, the more of an issue this becomes.

In practice this uncertainty can be mitigated by giving the parameter distributions as well. This may sound like adding another degree of complexity to the modelling process, but within the ‘maximum likelihood estimator' framework it is possible to estimate the parameter standard deviations. It is also necessary to derive correlations between parameters and this can be done by determining the information matrix, which is the expected value of the second derivative matrix for the log likelihood function.

Assuming a distribution for the parameters allows each simulation of a stochastic model to use different parameters based on this distribution. We typically run 250,000 simulations and the parameter uncertainty and parameter correlation matrix helps ensure that the results capture the full range of potential outcomes, and are not overly dependent on a single parameter set.

The question then naturally becomes, what distribution to choose for the parameters, for now we use the lognormal distribution for most of our models. Having a coefficient of variation for each parameter is also useful when selecting distributions. A distribution may fit the data very well but have parameters with high coefficients, which may suggest it would be unstable in a stochastic model.

**Do you know your curves?**

We wanted to see how many practising actuaries are aware of these three issues: sample size, selecting the right model and selecting appropriate parameters. So we came up with a test. We generated four claim sets by sampling from chosen distributions with pre-defined parameters. We then sent these claim sets to practising actuaries in the market and asked them to fit a distribution to the claims.

Uptake of the blind-test study was limited - actuaries seem to be shy in showing off their curve-fitting skills, but we will repeat the study on a wider scale (let us know if you are interested in participating). There is very little chance actuaries will pick the right distribution or parameters and it depends on the software at their disposal. However, we want to measure the impact of their selections against the true underlying distributions by seeing the differences to our motor excess-of-loss layers.

Given the chance of getting it wrong with sampled data, imagine the problem of trying to price an unlimited excess £25m contract for a motor portfolio with just 40 claims, the largest of which is only £33m. Yet this is the type of problem facing pricing actuaries in the reinsurance community on a regular basis.

______________________________________________________________

Amit Parmar and Michael Cane's 2011 GIRO presentation will take this theoretical study further and talk about the practical pitfalls they encountered in their study of the UK motor market when they analysed claims that constituted approximately 60% of the market.

______________________________________________________________

*Amit Parmar and Michael Cane are actuaries working in the analytical department of Guy Carpenter in London. They will be presenting on curve-fitting at the Actuarial Profession's GIRO conference in October*