Skip to main content
The Actuary: The magazine of the Institute and Faculty of Actuaries - return to the homepage Logo of The Actuary website
  • Search
  • Visit The Actuary Magazine on Facebook
  • Visit The Actuary Magazine on LinkedIn
  • Visit @TheActuaryMag on Twitter
Visit the website of the Institute and Faculty of Actuaries Logo of the Institute and Faculty of Actuaries

Main navigation

  • News
  • Features
    • General Features
    • Interviews
    • Students
    • Opinion
  • Topics
  • Knowledge
    • Business Skills
    • Careers
    • Events
    • Predictions by The Actuary
    • Whitepapers
    • Moody's - Climate Risk Insurers series
    • Webinars
    • Podcasts
  • Jobs
  • IFoA
    • CEO Comment
    • IFoA News
    • People & Social News
    • President Comment
  • Archive
Quick links:
  • Home
  • The Actuary Issues
  • November 2020
General Features

Completing the matrix

Open-access content Wednesday 4th November 2020
Authors
Dan Georgescu
Nick Higham

Correlation matrices arise in many applications to model the dependence between variables. Dan Georgescu and Nick Higham ask what happens when we have a partially specified matrix and we wish to fill in the missing elements

Completing the matrix

In linear algebra terms, a correlation matrix is a symmetric positive semidefinite (PSD) matrix with unit diagonal. In other words, it is a symmetric matrix with ones on the diagonal whose eigenvalues are all non-negative. Eigenvalues may seem to be an unnecessary complication, but they determine whether or not a given matrix with ones on the diagonal is a correlation matrix. For example,

Completing the matrix

is not a correlation matrix: it has eigenvalues -0.4142, 1, 2.4142. Since a correlation matrix is a scaled covariance matrix, a negative eigenvalue would suggest that one of the variables has a negative variance, which cannot happen. A matrix having positive eigenvalues is the matrix equivalent of a real number being non-negative. We also need our correlation matrices to have this property because capital models reasonably expect inputs of positive variances and simulate possible future states of the world by first calculating the square root of the correlation matrix. 

Completing the matrixCompleting the matrixCompleting the matrixCompleting the matrix

Why we need correlation matrix completion

In practice, there is often incomplete or missing information for the variables and this may lead to missing values in the correlation matrix itself, hence the problem of how to complete the matrix. We show that some of these practical problems can be solved explicitly via simple formulae, and explain how to use mathematical tools to solve the more general problem where explicit solutions may not exist. (‘Simple’ is, of course, a relative term.)

As an example, consider the 3-by-3 matrix A above and suppose the zeroes in the top-right and bottom-left corners are omitted and these elements have to be chosen.  Can we find a value for these entries that produces a valid correlation matrix? The only such value  is 1, which can be seen from the requirement that a correlation matrix must have a non-negative determinant (the determinant being the product of the eigenvalues).

Correlation matrices are used in the aggregation of risk exposures required by regulations. The values of the correlations really matter because, for many insurers, aggregate diversification effects will be the most significant determinant of required capital, with a 40%-60% reduction in the capital required between risk types not being atypical for a large, well-diversified insurer or reinsurer.

Some of the correlations are known because they have been estimated from data, prescribed by regulations or assigned by expert judgment, but the other entries are not known. This could be because there is little reliable data, no specific regulations and few available experts with knowledge of both of the risks being correlated. The aim is to complete the missing entries in order to produce a valid correlation matrix to calculate a capital requirement as realistically as is possible given the absence of hard evidence. 

In general there are many possible completions (our simple 3-by-3 example is unusual in having only one solution), and the choice of which to use is important because correlations determine the diversification between capital for the various risk variables.

For an example, consider the following simplified problem, which is included to illustrate the issue but is not supposed to represent a realistic calibration. A firm is exposed to three risks, a, b and c, and is organised in two business units, BU1 and BU2. Risk a represents corporate bond spread in BU1 and BU2. Risk b reflects, say, equity risk. Risk c in BU1 is an exposure to an asset class that is not traded and therefore has no market data history from which to calibrate a correlation. All the correlation coefficients are known in BU1 and BU2 (the upper left and bottom right of the matrix) but not between risk c in BU1 and the risks in BU2. This could be because the firm operates in two markets (say the UK and Italy, corresponding to BU1 and BU2) and UK experts were able to advise on the correlations between risk c and risks a and b by making an expert judgment. However, no expert judgment could be made on the correlation between risk c in BU1 and the risks in BU2. Typically, missing correlations arise between risks in different business units because there are few experts who understand both businesses and can make these judgments.

For the partially specified matrix given in Figure 1, a valid correlation matrix completion must lie in the dark yellow region in Figure 2. The centre of this region is the maximum determinant completion, where x is 0.72 and y is 0.64, to two decimal places. In that sense, the maximum determinant completion is unbiased. To give an easy-to-interpret number, if the capital held for the risks in BU1 was £10m, £20m and £70m respectively, and the capital held for the risks in BU2 was £50m and £30m, then using these values for the missing correlations and applying the Solvency II standard formula for capital calculation would lead to a required capital of just under £159m.

Other completion frameworks are possible. For example, we could find those completions that maximise and minimise capital requirement, in order to understand the materiality of the uncertainty around a central completion. These will depend on the framework used to calculate capital, but example values are plotted in Figure 2 to show the diversity of potential answers. For example, the least and most onerous completions would lead to a required capital of either £150m or £167m, an uncertainty interval of 10% around the £159m calculated above. Or we could complete such that the matrix is the most stable, where stability might be defined by reference to the positive semidefiniteness requirements, so that we might want to find the matrix where the smallest eigenvalue is maximised. 

In practical applications, large correlation matrices are often considered in small 3×3 subsets with only one unknown correlation. In these cases practitioners use rules of thumb to complete the subsets, such as the ‘product rule’ that a reasonable estimate of the missing correlation is the product of the two known correlations. However, these rules tend to lead to non-PSD matrices which then have to be ‘repaired’ by computing the nearest correlation matrix. The added problem in this case is that it is not clear which 3×3 subset is most relevant for the missing correlation x, say: 

Completing the matrix
giving a choice of two product rule solutions of x = 0.6*0.5 = 0.3 (which is outside the feasible bound in 
the picture above) or x = 0.85*0.85 = 0.72. The second value looks similar to the maximum determinant (MaxDet) completion, but this is a coincidence partly due to the rounding. An alternative approach has been 
to average these two values, but we can see that x = 1/2*(0.3 + 0.72) = 0.51 understates the central value 
for the feasible region in dark yellow in Figure 2.

Explicit solutions in certain cases

Our peer-reviewed research shows how to complete the correlation matrix in certain cases such as the one above, for the MaxDet framework. The framework has several useful theoretical properties. 

  •  Existence and uniqueness: if PSD completions exist then there is exactly one MaxDet completion.
  •  Maximum entropy model: MaxDet is the maximum entropy completion for the multivariate normal model, where maximum entropy is a principle of favouring the simplest explanations. In the absence of other explanations, we should choose this principle for the null hypothesis in Bayesian analysis.
  •  Maximum likelihood estimation: MaxDet is the maximum likelihood estimate of the correlation matrix of the unknown underlying multivariate Normal model.
  •  Analytic centre: MaxDet is the analytic centre of the feasible region described by the positive semidefiniteness constraints, where this is defined as the point that maximises the product of distances to the defining hyperplanes.

In the simple case above, we can express the correlation matrix pattern above as blocks,

 Completing the matrix

               
where block E is unspecified. The MaxDet completion is shown to be,

Completing the matrix

which is an explicit solution expressed as a matrix operation and simple to translate into an Excel calculation. Our work then extends this idea to other similar cases where the unknown entries can be grouped into particular block patterns. For other patterns, such simple solutions may not exist and the problem becomes a difficult nonlinear optimisation problem.

While the MaxDet completion has attractive mathematical properties, the MaxDet solution coincides with the conditionally independent solution, as discussed in the research paper. MaxDet does not add any more dependence to the correlation matrix, which may be unrealistic for some risk pairs.

More general cases

To solve the problem in more general cases than the simple pattern of specified and unspecified entries above where explicit solutions might not exist, one approach is to use an interior point algorithm that has been adapted to search through the feasible region determined by the space of PSD matrices. Many such algorithms exist (see for example the semidefinite programming solvers at bit.ly/311DmgS), and we used SDPT3. One of the drawbacks for those without a background in mathematical optimisation is that specifying the inputs to such solvers is laborious and error-prone. Fortunately, tools exist that parse the problem specified as above to produce inputs that are suitable for most free and commercial solvers. The parser sed is called YALMIP and is freely available, but we are aware of alternatives such as CVX. YALMIP has a MATLAB implementation. After installing a solver and YALMIP, the problem can be specified to YALMIP as seen in Figure 3.

In this example, we have specified two objective functions. The first objective function reproduces our MaxDet solution using SDPT3. As the code takes up to a second to run depending on the speed of the particular computer used and the precision is only six decimal places by default, there is little advantage to using approximate algorithms where explicit solutions exist.

The second objective function solves the problem – see Figure 4. This second objective function is relevant to our insurance example because under certain simplifying assumptions, capital is calculated using the formula

equation 4

where V  is a vector of capital held for various risks and Σ is a correlation matrix. We have used the standard convention that the objective is specified as a minimisation (minimising a negative is the same as maximising), and left out the square root in the objective function (as it is not required for the optimal solution).  The second line means that Σ has to be PSD.

This article is based on an earlier version, which appeared in Bank Underground at bit.ly/3jT5J8o

 “There is often incomplete or missing information for the variables and this may lead to missing values in the correlation matrix itself, hence the problem of how to complete the matrix”

Dan Georgescu is deputy chief actuary at Just

Nick Higham is a Royal Society Research Professor of Applied Mathematics at the University of Manchester

Image Credit | iStock
ACT Nov20_Full.jpg
This article appeared in our November 2020 issue of The Actuary.
Click here to view this issue
Filed in:
General Features

You might also like...

Share
  • Twitter
  • Facebook
  • Linked in
  • Mail
  • Print

Latest Jobs

Senior Underwriting Risk Manager

London (Central)
£85K-£95K + Benefits
Reference
124386

Reserving Manager (Contract)

London (Central)
£1200 - £1400 per day
Reference
124385

Life Actuary - Contract - IFRS 17 Financial Impact

England, London / England, Bristol / North Yorkshire, England
£900 - £1150 per day
Reference
124384
See all jobs »
 
 

Today's top reads

 
 

Sign up to our newsletter

News, jobs and updates

Sign up

Subscribe to The Actuary

Receive the print edition straight to your door

Subscribe
Spread-iPad-slantB-june.png

Topics

  • Data Science
  • Investment
  • Risk & ERM
  • Pensions
  • Environment
  • Soft skills
  • General Insurance
  • Regulation Standards
  • Health care
  • Technology
  • Reinsurance
  • Global
  • Life insurance
​
FOLLOW US
The Actuary on LinkedIn
@TheActuaryMag on Twitter
Facebook: The Actuary Magazine
CONTACT US
The Actuary
Tel: (+44) 020 7880 6200
​

IFoA

About IFoA
Become an actuary
IFoA Events
About membership

Information

Privacy Policy
Terms & Conditions
Cookie Policy
Think Green

Get in touch

Contact us
Advertise with us
Subscribe to The Actuary Magazine
Contribute

The Actuary Jobs

Actuarial job search
Pensions jobs
General insurance jobs
Solvency II jobs

© 2022 The Actuary. The Actuary is published on behalf of the Institute and Faculty of Actuaries by Redactive Publishing Limited. All rights reserved. Reproduction of any part is not allowed without written permission.

Redactive Media Group Ltd, 71-75 Shelton Street, London WC2H 9JQ