[Skip to content]

Sign up for our daily newsletter
The Actuary The magazine of the Institute & Faculty of Actuaries

Bayesian methods: mind your step

Andrew Smith asks: can Bayesian methods give better long-term interest forecasts?


cover feature

The idea of a long-term interest rate is embedded in actuarial thought and practice. While market interest rates fluctuate, we think about long-run averages driven by economic fundamentals. Tasks ranging from budgeting for pension contributions to the ultimate forward rate in Solvency II require assessments of long-run average returns.

Estimation of long-run returns involves a mix of judgment and, sometimes, intricate quantitative models. Bayesian statistics gives us a framework for combining these elements: the judgment corresponds to a prior distribution of parameters, while the forecast is based on a posterior parameter distribution given some data.  

However, that is not how judgment is typically applied in actuarial problems. The most conspicuous use of judgment comes in the selection of parameters, by choosing (or not) to modify parameters emerging from a mechanical model fitting exercise.Less visibly, judgment is involved in the selection of data and models.

Since the 1990s, sampling-based methods such as the Metropolis–Hastings algorithm (MHA) have transformed Bayesian statistics in fields as diverse as agriculture, medicine and statistical physics. 

If we wanted to, we could now implement Bayesian methods to incorporate long-term return estimation. Do we want to? Why are so few actuaries doing this?

The interest rate modelling problem: Classical solution

The Bank of England publishes a data set of yield curves, based on gilts, with month-end nominal curves available from January 1970. Figure 1 shows the history of five, 10 and 15-year spot interest rates.

Statistically, a long-term average arises from a stationary stochastic process, such as an AR1 model. However, the path of interest rates in many countries since the 1970s looks more like a random walk with steady downward drift, which is not stationary and has no long-term average. To cover both cases – the stationary and random walk models – we propose a model for the 10-year yield yt in month t using a discrete Pearson process:


This model class allows for mean reversion leading to stationary models, but also includes random walks with drift. The volatility term allows for constant conditional variance, but can also allow conditional variance to fall as yields fall, behaviour associated with some popular stochastic interest rate models such as Cox–Ingersoll–Ross or Black–Karasinski. The errors εt are independent standard normal random variables.

Fit by conditional maximum likelihood

A maximum likelihood optimiser applied to the discrete Pearson process based on monthly data 

gives the parameters in Table 1. We show 20 paths from the model in Figure 2.

Table 1
Table 1

Statistically, these are plausible extrapolations of the history we have. A striking feature of the fitted model is the high frequency of large negative interest rates – much more frequent than some would consider acceptable from an economic scenario generator. Furthermore, thanks to the quadratic form in the Pearson process definition, rates become more volatile as they drift into large negative territory. This is a mirror image of what happened in the 1970s and 1980s. 

The question is: are these statistically plausible paths economically credible? Even before we apply MHA, the Pearson scenarios are wilder than those produced by commercial scenario generators. We might ask if it is legitimate to use judgment to tame those negative scenarios.

All is not as it seems with the maximum likelihood fit, either. The optimiser has found a local maximum likelihood, but not a global maximum. Indeed, globally, the likelihood is unbounded, which you can see by setting λ equal to one of the observations and letting σ become small. The optimiser has failed to solve the problem posed, yet we have no warning because the algorithm has wrongly converged to a suboptimal but plausible set of parameters. The difficulty of unbounded estimation is common in financial models with non-constant volatility, and is not special to the Pearson model we have fitted here.

figure 1
figure 2

Prior parameters and Metropolis–Hastings 

The 1990s saw the widespread use of Markov chain Monte Carlo sampling-based methods, of which MHA is one of the most popular. The idea of MHA is to build a random walk model to describe parameter values. However, each step of the random walk is subject to an acceptance test. If a proposed new parameter value is much less plausible than the previous value (as measured based on a combination of data likelihood and prior distribution), the step is cancelled. The result is a constrained random walk that spends more time in more plausible parameter regions. We are free to choose the standard deviation of the random step size, but not the direction that has to be governed by the acceptance rule. There is a theorem saying that, over many steps, the distribution of random walk observations converges to the Bayesian posterior distribution.

There are some practical obstacles in the case of interest rates time series. Our likelihood function has a long ridge around the local maximum, which complicates how we set the step size in the random walk in MHA. In theory, we are free to choose whatever step size we want. Set the step too large and you keep falling off the ridge (and the step is rejected, meaning you are back where you started). If the steps are too small, you hardly move along the ridge at all. With many steps, the MHA mathematically converges to a limiting distribution, but that convergence may be so slow as to be useless in practice. We can fix the slow convergence in this case by mapping out the ridge around the maximised likelihood. Then we adjust MHA to make large steps parallel to the ridge but small steps orthogonal to it. 

Once we got the MHA working, we found in the context of interest rates that allowing for parameter error, and imposing a judgment that yields should be stationary, had an imperceptible or only modest impact on interest percentiles over horizons up to 10 years. 

Since the 1990s, sampling-based methods such as the Metropolis–Hastings algorithm (MHA) have transformed Bayesian statistics in fields as diverse as agriculture, medicine and statistical physics



The ‘uncertainty onion’

Uncertainty matters in modelling long-term returns. If we say we know the long-term return, we are kidding ourselves and mispresenting long-term investment risks. We should consider different paths with different long-term returns, rather than assume all future paths are driven with the same parameters. Algorithms such as MHA can help us incorporate parameter uncertainty.

There are several layers of uncertainty, like unpeeling an onion. The outer layer is the stochastic uncertainty, as described by a model whose parameters we know. This model might be fitted mechanically to available data. What we usually call stochastic modelling does not penetrate all the layers of uncertainty, just the outer one. 

The next layer is parameter uncertainty, in which we vary our model choices, testing alternative plausible parameters. In this layer, we still assume a single set of parameters for each future path and, more significantly, we assume the same set of parameters held throughout the historic data. The constant historic parameters are important because this is how we judge what constitutes plausible future values. We can use Bayesian methods to penetrate the second layer. The advent of algorithms such as MHA has made it more feasible to unpeel this second layer, and this is common practice in other areas of statistics.

There are further layers of uncertainty to explore. We should consider whether markets in the 1970s could have operated with different parameters to today, or whether data may have been cherry-picked or wilful blindness employed to direct us away from painful scenarios. 

Compared to many other statistical applications, actuarial problems can have a thin second layer, due to large amounts of data that may not be relevant if parameters were not constant. It would be wrong to claim that the Bayesian methods we have applied penetrate the inner layers of the uncertainty onion, and it’s not clear that actuarial judgment does, either.

Andrew Smith is an assistant professor in the School of Mathematics and Statistics at University College Dublin