**Radu Popescu and Daniel Suciu compare and contrast backtesting in insurance and banking**

Solvency II requires backtesting to be performed as part of the internal model validation. Backtesting of the internal model has long been a requirement in banking, so we might look there for inspiration and best practice.

**Solvency II backtesting **

In the context of Solvency II internal modelling, backtesting can be thought of as a comparison of the internal model output with past experience.

For a particular risk factor, this analysis involves two steps: a visual examination of the model output compared to historical experience, and statistical testing. Figure 1 shows an example, using changes in the credit spreads of AAA Financial UK bonds at a particular term to illustrate this process. We plotted historical changes in red and the model output absolute changes in pink, both as probability density functions (PDFs) and cumulative distribution functions (CDFs). We chose a credit spread

as the representative risk factor because shocks are calculated as absolute differences, while in other risk factors, shocks are calculated as relative change or using mixed formulae, adding a layer of complexity that we want to avoid in this simple case.

The aim of backtesting is to decide if the model:

- generally matches past experience and, if not, whether it
- generates shocks that are conservative with respect to the past.

Note that (a) relates to a whole distribution question, not favouring a particular quantile among the generated distribution, while (b) may be more relevant when there is a threshold beyond which one wants to be conservative, relative to past experience (for example, around the points contributing to solvency capital).

The answer to (a) is illustrated by the PDF distributions, which demonstrate not only that the historical distribution is less smooth than the simulated one (which is the result of an analytical formula), but also that the historical distribution has two extra regimes that appear in the low and high values of spreads, which can be linked with particular periods in history.

For part (b), looking at the CDF plot will show the areas in which the simulated distribution is more conservative than the historical one, ie the areas over the mid-values where the pink graph is below or matches the red line. While the capital is linked with a 99.5% worst case scenario in terms of the impact on changes in net assets, this may come from a different part of the risk factor distribution, especially if hedges are in place or there is diversification with other risks.

The question now becomes how to measure these differences between the historical and fitted distribution – thus transitioning from a simple visual examination to statistical analysis – and decide what a ‘good’ outcome is.

There are a number of statistical tests to choose from. For problem (a) we can use tests such as Kolmogorov-Smirnov or Cramer-von Mises, which compare whole distributions and are based on the idea of distance between distributions. Alternatively, we can zoom in and examine the differences between the two distributions quantile by quantile. As the model is calibrated to match the historical data, it fits well on average, meaning that there can be areas of poor fit. We leverage this approach to answer part (b), using for example a one-sided binomial test to see if the simulated distribution is conservative enough at particular points (see Table 1).

The choice of tests is influenced by what has been used in the model calibration; new tests will highlight other aspects of data. Tests can be reported as p-values, and if the number of tests is large, false discovery rate procedures can be used to reduce the number of bad results happening by mere chance (ie the Benjamini-Hochberg procedure).

As this process is repeated for all risk factors simulated by the internal model, decisions need to be made about what to do with areas of poor fit. In some circumstances, they can be accepted – for example, it is not important if small changes in credit spreads do not fit well around the median. Otherwise, further analysis may be recommended for the future, or a particular approach may be rejected as inadequate. This will go beyond looking at individual risk factors in a model-agnostic manner, and the interaction with other risks will need to be taken into account (and tested) since it may be the impact on the final loss distribution that is most relevant.

**Backtesting in banking **

We now move to backtesting in the banking industry, which is prescribed by the Basel regulations and supplemented by local regulations and guidance. For simplicity, we chose a single such document, the consolidated targeted review of the internal model (TRIM) issued by ECB in 2018, which covers the most relevant points.

Both market risk (MR) and counterparty credit risk (CCR) are relevant because they involve comparing historical and simulated distributions. More has been written about MR with respect to backtesting; for a comprehensive list of tests and how they can be logically linked, see Carsten Wehn’s 2008 Risk.net article ‘Looking forward to backtesting’.

CCR is closest to Solvency II in terms of the general requirement to backtest the whole distribution. MR has recently been reviewed, with the VaR measure being replaced by an expected shortfall measure.

The fundamental conceptual difference between Solvency II backtesting and MR and CCR backtesting is that for Solvency II the hypothetical distribution is projected once over the whole year, while for MR and CCR a hypothetical distribution is calculated at periodic time intervals (daily for MR). The historical vs simulated comparison is then performed, where the realised value on that distribution is summarised by a quantile at time t, (qt). For MR this is consistent with an associated risk management process, to understand and control the risks as they happen. The model performs well if the (qt) follow a uniform distribution over [0,1]. In particular, losses over a given threshold (in relative terms) are independent of each other, as the model has recalculated the future at each step, eliminating a possible bias.

This is less applicable to Solvency II backtesting; historical clusters cannot be resolved through a single distribution against which they can be backtested, unless one uses the historical distribution itself in the model.

**Models with purpose **

Backtesting provides crucial information to the validation of the model, especially around model choices, and can point to necessary modifications such as increasing the number of parameters to better fit historical data. While backtesting aims to be as model-agnostic as possible, the interpretation of results must take into account model design, as limitations are necessarily placed by the choice of distributions. The question should always be “Is the model good enough for the purposes intended?” rather than whether it passes every single backtest we can throw at it.

**Radu Popescu** is a modelling consultant and director atModel Tree

**Daniel Suciu** is an actuarial student working in modelling at EY Romania