Skip to main content
The Actuary: The magazine of the Institute and Faculty of Actuaries - return to the homepage Logo of The Actuary website
  • Search
  • Visit The Actuary Magazine on Facebook
  • Visit The Actuary Magazine on LinkedIn
  • Visit @TheActuaryMag on Twitter
Visit the website of the Institute and Faculty of Actuaries Logo of the Institute and Faculty of Actuaries

Main navigation

  • News
  • Features
    • General Features
    • Interviews
    • Students
    • Opinion
  • Topics
  • Knowledge
    • Business Skills
    • Careers
    • Events
    • Predictions by The Actuary
    • Whitepapers
    • Moody's - Climate Risk Insurers series
    • Webinars
    • Podcasts
  • Jobs
  • IFoA
    • CEO Comment
    • IFoA News
    • People & Social News
    • President Comment
  • Archive
Quick links:
  • Home
  • The Actuary Issues
  • July 2020
General Features

Insurance claims forecasting with cluster analysis

Open-access content Wednesday 8th July 2020
Authors
Sen Hu, Adrian O’Hagan

Sen Hu and Adrian O’Hagan investigate how cluster analysis with copulas can improve insurance claims forecasting

web_p32_mainimage.jpg


Machine learning has increasingly become a tool for actuaries in the era of big data, and the idea of actuaries teaming up with data scientists has been continually debated by industry leaders. In a nutshell, machine learning is a sub-stream of artificial intelligence, and provides suites of algorithms and models for computers 
to learn from data, so that they can help find data patterns and therefore make inferences, decisions or predictions.

Machine learning is a rather broad term that includes various approaches. For example, logistic regression (within generalised linear models) is a classic example of a machine learning classification algorithm. One field of particular use for actuaries is cluster analysis. How can cluster analysis, together with the copula approach, improve insurance claims forecasting?
 

Clustering methods

Cluster analysis has long been a popular technique within statistical data analysis and machine learning, helping to uncover group structures in data. It groups objects in such a way that objects in the same group (‘cluster’) are relatively more similar to each other than to those in other groups. In an actuarial setting, it has been used in applications such as insurance product marketing, and variable annuity valuation and ratemaking.

There are many clustering algorithms available, commonly categorised as either partitional or hierarchical algorithms. Partitional methods generally segregate observations into a required number of clusters that optimise certain similarity measures, most notably k-means. Hierarchical methods create a hierarchical decomposition of observations, forming a tree-like structure that splits the dataset into smaller subsets. We focus on model-based clustering, which is a partitional method. Rather than representing each cluster with a single datapoint, this method represents each cluster with a probability distribution, providing a more theoretically sound statistical framework.

“By segregating policyholders’ data via cluster analysis, dependence structures within clusters can be amplified”
 

Risks are dependent and heterogenous across categories

Cluster analysis is especially useful for actuaries who deal with dependent risks, although it can also be used for individual risks. A dependence is reflected in the fact that information about one risk category provides information about the likely distribution of other risk values. Insurance companies, in particular, should investigate such dependencies between different lines of business and the effects that an extreme loss event has across multiple lines when assessing multiple risks simultaneously. Copulas provide a convenient approach for this.

The risk heterogeneity commonly present in an insurance portfolio also needs to be addressed. For example, in general insurance some policyholders are more prone to making multiple claims, but these claim sizes are usually small compared to other policyholders who represent higher risks overall. Risk heterogeneity could also be caused by different behaviour and attitudes – for example, the differing attitudes of motor insurance policyholders towards driving. As a result, an important part of claims forecasting is risk classification, which involves grouping policies into clusters that share more homogeneous risk potential.

One consequence of risk heterogeneity is that the joint claims empirical data are usually very dispersed, so data may present a weak correlation overall. This could be used as justification for implementing independent modelling without considering dependencies among risks, meaning the underlying claims structure is ignored. However, by segregating policyholders’ data using cluster analysis, dependence structures within clusters can be amplified, and different modelling strategies can be implemented to suit different clusters.
 

web_p32_Clustering_plot_new.jpg


Cluster analysis with copulas

For simplicity, let’s look at scenarios involving two risks. When modelling such bivariate risk perils simultaneously, bivariate distributions such as bivariate Poisson distributions naturally come to mind. However, although bivariate distributions are a natural extension to their univariate counterparts, they can be restrictive and conceptually challenging due to the possibly complex specification and implementation (especially in higher dimensions). There is also a limited number of options that suit the required data characteristics. For example, bivariate gamma distributions are complex and not commonly used, even though they are well-suited for bivariate claim severity modelling.

Copulas are a popular choice for analysing the dependence between risks in joint claims modelling. The copula is a distribution function of random variables with uniform marginals; it contains all information on the dependence structure between risks represented by the marginals. The marginal distribution functions contain all information on individual risks. This gives copulas their key advantage: they allow the marginals and the dependence structure to be modelled separately. Due to the rich existing varieties of copulas, a wide range of flexible dependence structures is possible.

It is easy to envisage that there are different dependence features among policies in a portfolio due to the joint risk heterogeneity. To model this with finite mixture models for model-based clustering, we can employ a finite mixture of copulas to segregate policies with respect to different dependence structures in the data, represented by different copulas, which constitute more flexible dependence structures overall.

Because copulas are distribution functions of certain parameter(s), a finite mixture of copulas can be expressed 
in a fashion similar to a standard finite mixture model. For our claims modelling using parametric methods, estimation of the copula depends on estimation of the marginal distributions. Furthermore, other independent predictors (covariates), such as characteristics of the policies, can be incorporated in the marginals via generalised linear model (GLM) frameworks to improve marginal and copula estimation. This is called copula regression. In the finite mixture of copulas setting, we can use a GLM framework to further allow the mixing proportions to depend on covariates, in order to better identify which cluster the observation belongs to. In machine learning, such model setting is called mixtures of experts.

Suppose we have real-world sample motor insurance empirical claim severity data for accidental damage and third-party property damage risks, together with some characteristics of the policies. Cluster analysis, using a finite mixture of copulas with covariates and univariate gamma distributions as marginals, leads to the clustering result in Figure 1. One cluster (in orange) captures the medium claim sizes where the dependence is modelled using a Gumbel copula, while the other cluster (in teal) accounts for very small and large claim sizes with a Frank copula; this clustering corresponds to the fact that there are a lot more policies leading to medium-sized claims (ie the dense scatter cloud in the middle) than those leading to very small or very large claims.

Through such clustering analysis, a better understanding of the risk structure is achieved, in which each cluster shows more prominent dependence structure. Each is characterised by a copula that explains various aspects of dependence, which leads to better dependence estimation such as tail dependence. Future claim forecasting can be achieved because predictors are incorporated via GLM frameworks, similar to univariate claim modelling via standard GLMs. This model can therefore be regarded as a finite mixture of copula regressions for predictive analysis. Furthermore, once clusters are identified, different models can be fitted based on each cluster’s characteristics for claims forecasting. This cluster analysis with copulas can not only identify different claim behaviours and identify high or low risk policies, but also provide claims forecasting while taking the dependence of each cluster into account.
 

Dr Sen Hu is a post-doctoral researcher at University College Dublin

Dr Adrian O’Hagan is an assistant professor at University College Dublin

ACT Jul20_Full.jpg
This article appeared in our July 2020 issue of The Actuary.
Click here to view this issue
Filed in
General Features
Topics
General Insurance
Modelling/software

You might also like...

Share
  • Twitter
  • Facebook
  • Linked in
  • Mail
  • Print

Latest Jobs

New Fast-Growing Team - Actuarial Systems Development

London (Greater)
Excellent Salary Package
Reference
143762

Actuarial Pension Consultant – Scotland/Remote – Up to £90,000 plus bonus

Edinburgh / Glasgow / Remote working
Up to £90,000 + Bonus
Reference
143761

Part Qualified Pensions Actuary– Specialised Pensions Consultancy - Scotland/Remote - Up to £70,000

Edinburgh / Glasgow / Remote working
Up to £70,000 + Bonus
Reference
143760
See all jobs »
 
 

Today's top reads

 
 

Sign up to our newsletter

News, jobs and updates

Sign up

Subscribe to The Actuary

Receive the print edition straight to your door

Subscribe
Spread-iPad-slantB-june.png

Topics

  • Data Science
  • Investment
  • Risk & ERM
  • Pensions
  • Environment
  • Soft skills
  • General Insurance
  • Regulation Standards
  • Health care
  • Technology
  • Reinsurance
  • Global
  • Life insurance
​
FOLLOW US
The Actuary on LinkedIn
@TheActuaryMag on Twitter
Facebook: The Actuary Magazine
CONTACT US
The Actuary
Tel: (+44) 020 7880 6200
​

IFoA

About IFoA
Become an actuary
IFoA Events
About membership

Information

Privacy Policy
Terms & Conditions
Cookie Policy
Think Green

Get in touch

Contact us
Advertise with us
Subscribe to The Actuary Magazine
Contribute

The Actuary Jobs

Actuarial job search
Pensions jobs
General insurance jobs
Solvency II jobs

© 2023 The Actuary. The Actuary is published on behalf of the Institute and Faculty of Actuaries by Redactive Publishing Limited. All rights reserved. Reproduction of any part is not allowed without written permission.

Redactive Media Group Ltd, 71-75 Shelton Street, London WC2H 9JQ