Skip to main content
The Actuary: The magazine of the Institute and Faculty of Actuaries - return to the homepage Logo of The Actuary website
  • Search
  • Visit The Actuary Magazine on Facebook
  • Visit The Actuary Magazine on LinkedIn
  • Visit @TheActuaryMag on Twitter
Visit the website of the Institute and Faculty of Actuaries Logo of the Institute and Faculty of Actuaries

Main navigation

  • News
  • Features
    • General Features
    • Interviews
    • Students
    • Opinion
  • Topics
  • Knowledge
    • Business Skills
    • Careers
    • Events
    • Predictions by The Actuary
    • Whitepapers
    • Moody's - Climate Risk Insurers series
    • Webinars
    • Podcasts
  • Jobs
  • IFoA
    • CEO Comment
    • IFoA News
    • People & Social News
    • President Comment
  • Archive
Quick links:
  • Home
  • The Actuary Issues
  • August 2022
General Features

On the write track: using machine learning to predict underwriting decisions

Open-access content Wednesday 3rd August 2022
Authors
Yafei Wang

Yafei (Patricia) Wang looks at the use of machine learning to predict underwriting decisions for life and health insurance

tsj

Advances in machine learning, and an explosion of non-structured data, have created huge scope for the application of machine learning models in life and health, including predictive underwriting. In the July 2022 issue of The Actuary, Reza Hekmat and Balint Bone consider, on a commercial level, how such machine-learning models could help to automate and enhance the underwriting process (bit.ly/EndUnderwriters).

Real-world data can demonstrate how machine learning models may be used in practice to predict underwriting decisions in cases that cannot be processed by prescriptive rule-based engines. The data used for training the model and testing model performance are cases that are referred to reinsurers. These are cases that have not passed through the rule-based engine or manual underwriting for various reasons – typically complicated medical conditions or family medical histories. What is the data processing and modelling process of machine learning models, and how do they perform when predicting underwriting decisions?

Modelling process

The first step is to use natural language processing to process the free-text variables in the data, for example: descriptions of medical conditions, lifestyle risk factors, hobbies and occupations (Figure 1). After keywords such as ‘stomach cancer’ are extracted, machine learning techniques are used to sort applications into different groups of medical conditions and occupation classes, so dummy variables and categorical variables can be created in the second step.

,

The third step is preliminary data

analysis, so we can gain some early insights into data and decide on the appropriate treatment of missing data. Word clouds are often used to gain some idea of the medical conditions that are frequently referred to reinsurers. This shows the medical conditions that underwriters are unfamiliar with so they may require more training.

We then use various feature selection techniques to eliminate irrelevant and redundant variables in order to reduce run time and overfitting. This is a useful technique; the data typically contains a few thousand features, so run time could be a real issue in practice. More importantly, overfitting is a common issue. This is where the model fits too closely to the training dataset because it has learnt the training dataset’s randomness and noise, and would not perform well for unseen future data. Feature selection is a useful technique in tackling this issue. Parameter tuning as part of the model training can also be used to reduce overfitting.

Table 1 shows how the output variable – the underwriting decision – is coded.

;.

The cases that are accepted on standard terms are labelled class 0; those that are declined are labelled class 100. In practice, the underwriters give loadings only in multiples of 25%, so the models are designed to mimic this practice; therefore, the loadings are divided by 25% and labelled as shown in Table 1. In practice, underwriters rarely give loading of greater than 400%, so the cases with extremely large loadings (greater than 400%) are grouped into one class to give this class enough data points. In short, we are training machine learning models for a multi-class classification problem. As shown in Table 1, there are 19 classes.

Some insurers may want to implement the model to classify the applications into three broader categories – ‘accept on standard terms’, ‘accept with loading’ and ‘decline’ – and then manually underwrite only certain classes, such as ‘accept with loading’. In the case of Table 1, the model could be adapted to be a three-class classification in which all of the classes with loadings are grouped into one class, and we have only three classes: ‘standard’, ‘loaded’, and ‘decline’.

inin

Overall model performances

We trained and tested 10 machine-learning algorithms: random forests, decision tree, gradient boosting, extreme gradient boosting (XGB), bagging, AdaBoost, support-vector machine (SVM), stochastic gradient descent (SGD), K neighbours and ordinal logistic regression. Some are classification models, while others are regression models.

“The best performing algorithm is XGB, followed by random forests and bagging. This is not surprising, because manual underwriting processes resemble a decision tree”

The dataset is randomly split into a training dataset and testing dataset, in the ratio of 80:20. The models are trained using the training dataset and then tested on the unseen data in the testing dataset. In a sense, the performance on the testing dataset gives us an idea of how the models will perform when used on future data, provided that there are no fundamental changes in the underwriting philosophies (for example, how lenient the underwriting is at the entry stage), so the performance on the testing dataset is more important.

The accuracy score, defined as the number of correct predictions divided by the number of data points, is approximately 80% on the testing dataset for a 19-class classification and 89% on the testing dataset for a three-class classification. The accuracy is achieved across all product lines – life, critical illness, income protection and so on – and across more than 25 insurers, where underwriting manuals differ from product to product and from insurer to insurer.

The best performing algorithm is XGB, followed by random forests and bagging, for both 19-class and three-class classifications. All three of these algorithms combine the outputs of weak learners into the final output, and the underlying weak learners are decision trees. This is not surprising, because manual underwriting processes resemble a decision tree.

The XGB algorithm has won multiple Kaggle challenges (bit.ly/3a91NRA).

One of the main reasons that it often outperforms other machine learning models is that it handles sparse data well, which is important here because there are some classes with little or no data points. Another major advantage of XGB is that it has an in-built penalty term added to the loss function, so the algorithm prefers models that have less complexity and is thus less prone to overfitting.

Bagging algorithms outperform other boosting algorithms, namely AdaBoost and gradient boost. The insurance dataset contains noise, so this result echoes Dietterich’s research suggesting that bagging algorithms perform better than boosting algorithms when used on datasets with a lot of noise (bit.ly/3y3VnLo).

Not surprisingly, logistic linear regression, SGD with log loss function, and SVM with radial basis function kernel did not perform well. The underwriting decision resembles a decision tree, so the relationships between the outcome variable and input variables are unlikely to be solely logistic linear. Furthermore, regression models do not typically perform well on datasets with high dimensions, which is the case here.

Practical implementations

The overall accuracy scores of the algorithms can be broken down by class. For example, the XGB model achieved an accuracy of 92% on the testing dataset for the standard class in the 19-class classification, so is extremely accurate at predicting standard cases. This means that if insurers want to improve their straight-through rates, they can do so by implementing the XGB model. It would not only improve operational efficiency, but also increase sales by increasing the number of cases that can be accepted straightaway.

Furthermore, underwriters can now focus on the high-risk, high-cost cases – such as cases with large loadings – so the quality of underwriting decisions on these cases can be improved, too. Cost-benefit analysis is also useful, where the cost saving of using machine-learning models can be analysed against the cost of manual underwriting.

Yafei (Patricia) Wang has more than 10 years of experience in financial reporting, financial modelling, machine learning and data analytics in the South African and London markets

Image credit | iStock

Linked Actuary_August2022_LR.jpg
This article appeared in our August 2022 issue of The Actuary .
Click here to view this issue

You may also be interested in...

rae

Re-examing capital management actions

Rosalind Rossouw re-examines the capital management actions taken by firms during the COVID-19 pandemic
Wednesday 3rd August 2022
Open-access content
web_p27_An-end-to-underwriters_CREDIT_iStock-1219172362.jpg

An end to underwriters? The challenges of machine learning

Is machine learning a threat to the underwriting practice? Reza Hekmat and Balint Bone consider this possibility and its challenges
Wednesday 6th July 2022
Open-access content
web_p24-26_Bringing-data-to-mind_Mental-health-and-data_Illustration_CREDIT_Getty-1307270547.jpg

Bringing data to mind: understanding mental health

Lisa Balboa, Maryse Nashime, Serena Soong and Joe Wilson assess how data use could help to progress actuarial understanding of mental health
Wednesday 6th July 2022
Open-access content
web_p21_Screen-refresh_CREDIT_shutterstock_1911963175.jpg

Screen refresh: Could the HPV DNA test affect critical illness claims?

Stephen Tseng and David Lu discuss the WHO’s recommendation on replacing the cervical smear test with an HPV DNA test, and what it could mean for critical illness insurance
Wednesday 6th July 2022
Open-access content
web_p18-20_Wear-and-share_CREDIT_Shutterstock_1009058053.jpg

Wear and share: the capabilities of wearable technology

Lisa Balboa, Tim Smith and Etienne van Wyk share their thoughts on the emerging claims prevention capabilities of wearable technology
Wednesday 6th July 2022
Open-access content
res

Nature and society: examining the scope of the Taskforce on Nature-related Financial Disclosures

Monica Filkova discusses the scope and activity of the Taskforce on Nature-related Financial Disclosures
Wednesday 3rd August 2022
Open-access content

Latest from Life insurance

ytg

Seek cover

When it comes to sustaining your products’ performance in a ‘polycrisis’, customer engagement is key. Marco Spagnuolo outlines how life insurers can weather today’s economic storm
Wednesday 1st March 2023
Open-access content
67

Knock-on effects: the risks of cyber crime for life insurers

Life and health insurers need to consider how cyber risk could potentially impact them, say Visesh Gosrani, Mikhail Norshteyn and Karl Oliver
Wednesday 30th November 2022
Open-access content
EG\

Uneven outcomes: findings on cancer mortality

Ayşe Arık, Andrew Cairns, Erengul Dodd, Adam Shao and George Streftaris share their findings on the impact of socio-economic differences and diagnostic delays on cancer mortality
Wednesday 1st June 2022
Open-access content

Latest from Health care

yf

Animal crossing: the threat of zoonotic diseases

Prachi Patkee and Adam Strange discuss what the rising threat of climate-driven communicable disease means for insurers
Wednesday 30th November 2022
Open-access content
hb

Boiling point: the effect of rising temperatures on future mortality

As quantifying climate risk exposure becomes increasingly important, Dan Gill, Rajinder Poonian and Alex Harding investigate the effect of rising temperatures on future mortality
Wednesday 2nd November 2022
Open-access content
vb

Interview: Professor Paul Dalziel on changing the focus of economies from growth to wellbeing

Paul Dalziel talks to Alex Martin about the true purpose of economics and the lessons we can draw from the 2019 New Zealand wellbeing budget
Wednesday 2nd November 2022
Open-access content

Latest from General Features

yguk

Is anybody out there?

There’s no point speaking if no one hears you. Effective communication starts with silence – this is the understated art of listening, says Tan Suee Chieh
Thursday 2nd March 2023
Open-access content
ers

By halves

Reducing the pensions gap between men and women is a work in progress – and there’s still a long way to go, with women retiring on 50% less than men, says Alexandra Miles
Thursday 2nd March 2023
Open-access content
web_Question-mark-lightbulbs_credit_iStock-1348235111.png

Figuring it out

Psychologist Wendy Johnson recalls how qualifying as an actuary and running her own consultancy in the US allowed her to overcome shyness and gave her essential skills for life
Wednesday 1st March 2023
Open-access content

Latest from Data Science

gc

Free for all

Coding: those who love it can benefit those who don’t by creating open-source tools. Yiannis Parizas outlines two popular data science programming languages, and the simulator he devised and shared
Wednesday 1st March 2023
Open-access content
il

When 'human' isn't female

It was only last year that the first anatomically correct female crash test dummy was created. With so much data still based on the male perspective, are we truly meeting all consumer needs? Adél Drew discusses her thoughts, based on the book Invisible Women by Caroline Criado Perez
Wednesday 1st February 2023
Open-access content
res

Interview: Tim Harford on the importance of questioning our assumptions

Tim Harford speaks to Ruolin Wang about why it’s so important to slow down and question things from emotive headlines to the numbers and algorithms we use in our work
Wednesday 30th November 2022
Open-access content

Latest from August 2022

fku

A toolkit for risk

Welcome to this special ‘risk’ edition of The Actuary. For many of us, the first thing we think of when we see the word ‘risk’ is risk-based capital.
Thursday 4th August 2022
Open-access content
v

Interview: Hisham Ramadan on the growth and challenges of the actuarial and insurance sectors in Egypt

Hisham Ramadan tells Ruolin Wang and Yiannis Parizas about the growth of the actuarial and insurance sectors in his country, and the challenges that remain
Wednesday 3rd August 2022
Open-access content
ytd

Green blueprint for building

Lending in the property sector is increasingly focusing on environmental factors, write Wojciech Herchel, Sam Taylor, Clarence Er and David Devlin
Wednesday 3rd August 2022
Open-access content
Share
  • Twitter
  • Facebook
  • Linked in
  • Mail
  • Print

Latest Jobs

Shape the Future of Insurance

London (Central)
£ excellent package
Reference
149090

Senior Pricing Actuary - Life Reinsurance

London (Central)
£ excellent
Reference
149089

Insurance Investment Leadership Opportunities

Flexible / hybrid with 2 days p/w office-based
£ dependent upon experience
Reference
149088
See all jobs »
 
 
 
 

Sign up to our newsletter

News, jobs and updates

Sign up

Subscribe to The Actuary

Receive the print edition straight to your door

Subscribe
Spread-iPad-slantB-june.png

Topics

  • Data Science
  • Investment
  • Risk & ERM
  • Pensions
  • Environment
  • Soft skills
  • General Insurance
  • Regulation Standards
  • Health care
  • Technology
  • Reinsurance
  • Global
  • Life insurance
​
FOLLOW US
The Actuary on LinkedIn
@TheActuaryMag on Twitter
Facebook: The Actuary Magazine
CONTACT US
The Actuary
Tel: (+44) 020 7880 6200
​

IFoA

About IFoA
Become an actuary
IFoA Events
About membership

Information

Privacy Policy
Terms & Conditions
Cookie Policy
Think Green

Get in touch

Contact us
Advertise with us
Subscribe to The Actuary Magazine
Contribute

The Actuary Jobs

Actuarial job search
Pensions jobs
General insurance jobs
Solvency II jobs

© 2023 The Actuary. The Actuary is published on behalf of the Institute and Faculty of Actuaries by Redactive Publishing Limited. All rights reserved. Reproduction of any part is not allowed without written permission.

Redactive Media Group Ltd, 71-75 Shelton Street, London WC2H 9JQ