Stochastic models might help financial firms to understand risks, but do we forget the risk that the model itself fails? Andrew Smith and Parit Jakhria investigate

We often face the problem of constructing a model based on limited data or information. This is fully understandable, given that a model is only intended to be a representation of the real world. After all, a model that perfectly represented the world would need to be at least as big as the real world.
We can often spot patterns; the historical data might look like a random sample from a distribution with a bell-shaped curve. In that case, there is no difficulty finding a distribution to fit.

Instead, the problem is more often too many candidate distributions. Given 20 years of lapse data, for example, we might try to fit a number of different statistical distributions to match the sample mean and standard deviation; so perhaps a Gaussian distribution, a logistic distribution, an extreme value (Gumbel) distribution or a Student's t-distribution with four degrees of freedom. Figure 1 shows standardised plots of these matching distributions. For solvency purposes, we need to estimate a 99.5th percentile, which, of course, is different according to which distribution we think the data has come from and how we estimate the parameters.
Parameter uncertainty
An acknowledgement of parameter uncertainty complicates the definition of a percentile, since the mathematical 99.5th percentile is a function of underlying parameters we cannot observe. Regulation, however, demands that firms pick a single number and there are several potential ways to do this:
> Assume the parameter estimates are the exact parameters, so avoiding the need to hold capital for parameter error
> Evaluate a statistically unbiased estimate for the 99.5th percentile (unbiased means the estimator's average value is the true percentile).
> Calculate a confidence interval for the true 99.5th percentile - for example, a one-sided 95% confidence interval.
> Construct a prediction interval - for example, an estimate that has a 99.5% probability of exceeding the next observation, allowing for randomness in both the past data and the next observation. Based on our example, this would be the 21st year.
> Propose a prior distribution for the parameters and use Bayes' theorem to construct a posterior distribution.

Figure 2 shows the resulting figures for samples of 20 data points, excluding the Bayesian approach, whose answer varies according to the chosen prior distribution. These measures differ only because the data is limited; as the data increases, these are all consistent estimators of the 'true' percentile. None of these is the right answer; the different numbers answer different questions.
Model uncertainty
Model uncertainty adds a further layer of risk. Goodness-of-fit tests, such as Kolmogorov-Smirnov and Anderson-Darling, have low power when data is scarce. Table 1 shows the power of Kolmogorov-Smirnov and Anderson-Darling tests with 20 data points. The chance of rejecting an incorrect model is often only marginally better than the chance of rejecting the correct model, and in a few cases the correct model is more likely rejected than an incorrect one.

Even with samples of 200 or more, it is common not to reject any of these four models. This implies that model risk remains relevant for many applications, including scenario generators, longevity forecasts or estimates of reserve variability in general insurance. Are we then at risk of channelling too much energy and expense into the 'holy grail' route of modelling, justifying each parameter and component of a single model? Does our governance process consider model risk, or does board approval for one model entail rejection for all others?
Given the inevitable uncertainty in attempting to identify which model is correct, how can we make any progress at all? There are several possible ways to proceed.
> Pick a standard distribution - for example, the Gaussian, as it is not rejected. But don't confuse 'not rejected' with 'accepted'.
> Take the highest 99.5th percentile from all the models to ensure capital adequacy at a level of at least 99.5th percentile.
> Build a 'hyper-model' that simulates data from a mix of the available models, although expert judgment is still needed to assess prior weights.
> Collaboration on validation standards may lead to generally accepted practices. For example, a requirement to demonstrate at least 99.5% confidence if the data comes from a Gaussian or logistic distribution, but not for Student's t or Gumbel.
Statistics provide useful tools for estimating percentiles in the presence of parameter uncertainty. However, human judgment is still a large factor, magnified when data limitations leave open several possible interpretations.
Thanks go to Stuart Jarvis for checking our calculations. Any errors and all views expressed are those of the authors, not their employers