Skip to main content
The Actuary: The magazine of the Institute and Faculty of Actuaries - return to the homepage Logo of The Actuary website
  • Search
  • Visit The Actuary Magazine on Facebook
  • Visit The Actuary Magazine on LinkedIn
  • Visit @TheActuaryMag on Twitter
Visit the website of the Institute and Faculty of Actuaries Logo of the Institute and Faculty of Actuaries

Main navigation

  • News
  • Features
    • General Features
    • Interviews
    • Students
    • Opinion
  • Topics
  • Knowledge
    • Business Skills
    • Careers
    • Events
    • Predictions by The Actuary
    • Whitepapers
    • Moody's - Climate Risk Insurers series
    • Webinars
    • Podcasts
  • Jobs
  • IFoA
    • CEO Comment
    • IFoA News
    • People & Social News
    • President Comment
  • Archive
Quick links:
  • Home
  • The Actuary Issues
  • June 2021
General Features

Road testing: machine learning and the efficiency of fraud detection

Open-access content Wednesday 2nd June 2021
Authors
Nefeli Pamballi
Phanis Ioannou
Yiannis Parizas

Nefeli Pamballi, Phanis Ioannou and Yiannis Parizas outline how machine learning could help increase the efficiency of fraud detection in motor insurance

we_p26_Road-testing

Fraudulent claims are a significant cost to personal motor insurance products, typically increasing the combined ratio of the insurer by 5%-10%. Traditionally, firms would use expert judgment algorithms to decide which claims would be investigated for fraud. We set up a machine learning pipeline to help optimise processes in the Fraud Management Unit (FMU), to reduce the cost of fraudulent claims. Using data science techniques, we focused on reducing costs and increasing the efficiency of fraud detection processes by concentrating fraud investigation efforts on claims that were more likely to be fraudulent. Organisations would benefit from:

  • A reduction in operating expenses, as claims with low probability of being fraudulent will be fast-tracked

  • A better customer experience from fast-tracked customers, leading to higher customer satisfaction and retention levels

  • An increased fraud detection rate that reduces the combined ratio.

In our case study, claims were assigned a fraud-likeliness score, with two thresholds for intervention. The lower threshold was interpreted as the cut-off for fast-tracked claims and anything above the higher threshold was interpreted as requiring anti-fraud action; anything between the two thresholds was sent for assessment by the FMU. The number of claims falling between the two thresholds was driven by the FMU’s monthly capacity for investigating claims.

The training and testing data was from past fraud cases, and we tested various statistical and machine learning models for predicting fraudulent claims. The conclusion was that three particular models, in combination, yielded the best predictions. Future monitoring will be an important element of the process, as it will guarantee the sustainability of the work by ensuring the framework remains up to date and fit for purpose.

Data preparation and exploratory analysis

Before exploratory analysis or modelling was performed, an extract, transform, load (ETL) process was set up for data preparation and cleansing. Building the ETL took up most of the project time, but making it flexible and easy to use will be beneficial for future calibrations, and the time invested here will pay dividends by increasing the speed of processing in future recalibrations.

The average fraud rate across the entire data-frame was 0.83%. This represents the reported or confirmed fraud rate, rather than the actual fraud rate, which would be expected to be higher than the reported rate since it is unlikely that 100% of actual frauds were detected. Based on exploratory analysis, the most important dimensions to include in the model were:

  • The time taken to report the fraudulent claim from the beginning of the policy: We saw evidence that higher fraud rates occurred closer to the policy start date

  • Policy duration: It seems reasonable that the perpetrator of pre-meditated fraud does not require long cover

  • Customer duration days: The longer the claimant has been a customer, the less likely they are to file a fraudulent claim. This was expected, as fraudulent customers tend to switch insurers frequently

  • Number of previous claims: Another reasonable assumption is that a large number of previous claims could mean that the latest claim is fraudulent

  • Claim cover type: The fraud rate was lowest for third-party liability claims, highest for own damages and theft, and somewhere in the middle for glass. This is not a surprising observation, as the claimant does not benefit from the third-party cover claims.

The above findings were discussed with the FMU and validated for reasonableness. The FMU provided additional possible risk drivers, but the analysis only supported the smaller set above. However, the FMU’s expertise was key, and other drivers that emerge in the future could be incorporated into the model if the data supports this.

Methodology

Only supervised learning algorithms were considered appropriate in this case and, since the response variable was binary, we decided to approach this problem as a binary classification problem, with the model predicting two classifications: Fraud or Non-Fraud.

Taking into consideration the results of our preliminary analysis and our choice of response variable, we opted to test logistic regression, classification tree and gradient boosting (XGBoost) algorithms, as well as an ensemble model that combined all three methods. 

  • Logistic regression: Logistic regression is a probabilistic statistical classification model and can be used when the dependent variable Y is binary. It involves the transformation of the linear regression model using a sigmoid function. Logistic regression can be used as a powerful classification algorithm, which assigns observations for a discrete set of classes (in this case binary: Fraud or Not Fraud).

  • Decision trees: Decision trees work for both categorical and continuous input and output variables. There are two types of decision trees: regression trees and classification trees. Regression trees predict a quantitative response, while classification trees predict a qualitative one. There are many decision tree variants, but they all do the same thing – subdivide the feature space into regions with mostly the same label. Decision trees are easy to understand and implement.

  • Gradient boosting (XGBoost): The algorithm sequentially builds trees so that, in every subsequent tree, it aims to reduce the errors of the previous tree by using predecessors as learning sources. This technique is called ‘boosting’ in the field of data science and it builds small and highly interpretable trees by giving the modeller the option to choose and optimise the hyperparameters during the process.

  • Ensemble model: The ensemble method is a technique that takes the predictions generated by several base models and combines them to reach a single prediction. In this exercise, we have averaged the three models to combine them. This method usually generates more accurate results than a single model. The key to the success of this method is that base models perform better in different parts of the dataset – so by averaging the results of several models, we could improve the model performance as a whole.

 

web_p27_auto_fraud
Figure 1: Start to accident days- model predictions on out-of-sample data.

 

The data available to us for this exercise was split into three sub-samples:

  • 60% for training these models: used to fit the parameters

  • 20% for validation: used in optimising the hyperparameters and keeping track of the performance

  • 20% for testing: used to provide an unbiased evaluation of a final model fit.

Model comparison

We fitted the four models and then compared them using different measures to determine which one offered the most predictive power for fraud detection. Figure 1 shows the model predictions compared with the actual fraud rate for the out-of-sample data, and the measure of the time taken to report the fraudulent claim from the beginning of the policy – which turned out to be the most important variable in the models. The bars represent the number of claims in each 50-day period. We can see that all models capture the fact that the earlier reported claims are more likely to be fraudulent. The decision tree method is less smooth than the other methods and does not decrease sufficiently after the first two periods.

To be able to compare the four models and decide on the most appropriate one, we used the ‘area under the ROC curve’ metric, which is standard practice in such cases. A receiver operating characteristic (ROC) curve is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold changes. Table 1 summarises the performance of the fitted models in the three data categories (training, validation and testing). The most appropriate for decision-making is the final test data category, because this is an out-of-sample test based on data not used in the calibration.

We observe that the ensemble model is not ranked first in the training set, but it offers better performance for the validation and test sets. We can thus conclude that it is the best fraud detection model in this scenario.

“We fitted four models and compared them to determine which one offered the most predictive power for fraud detection”

In addition to the ‘area under the curve’ performance metric, we analysed the confusion matrices for the four models, to further evaluate models and extract additional information about their performance on the test set. By confusion matrix, we refer to a table used to describe the performance of a classification model on a set of test data for which the true values are known. A 5% cut-off point was considered for the probability of fraud at this stage.

Table 2 summarises the results of the confusion matrix analysis. We observe that the accuracy (how often the classifier is correct is its prediction), precision (when the classifier predicts a fraudulent case, how often it is correct) and sensitivity (from all the actually fraudulent cases, how often the classifier predicts that it is fraudulent) levels of the ensemble method show the best performance among the four models. As such, we have decided to proceed with the ensemble model for implementation and deployment.

 

we_p28|_auto_fraud
Table 1: Model performance (AUC) for each of the base models and ensemble methods. 

Table 2: Confusion matrix-statistics - cut off 5%. 

 

From model to decision making

For the practical implementation of the model, operational use by the FMUs and subsequent use for decision-making purposes, two threshold levels will have to be determined, as mentioned above: an upper threshold that initiates anti-fraud actions, and a lower threshold for determining whether to investigate or fast-track.

When determining the upper threshold level, we considered how to minimise the number of claims wrongfully classified as fraud (false positives). False positives could adversely affect the company’s reputation and customers’ satisfaction.

When choosing the level of the lower threshold, we considered FMU operational capacity and the number of claims that the FMU can investigate daily. A higher proportion of claims being classified in the category below the lower threshold could also attest to lower operational costs and a more pleasant customer journey. The current FMU’s monthly operational capacity allows for 150 claims to be investigated, on average. As shown in Figure 2, the claim investigation capacity corresponds to a cut-off probability of fraud of 2%.

 

we_p29_auto-fraud
Figure 2: Predicted fraudulent cases vs. cut-off point. 

 

Furthermore, considering the precision level at several cut-off points, as shown in Figure 3, it was observed that its value is maximised at 23% probability of fraud – meaning this is the point at which we are most confident that the detected cases are fraudulent with the highest probability.

 

web_p29_auto-fraud
Figure 3: Precision vs. cut-off point- maximisation of precision. 

 

Based on the above considerations, we set thresholds and respective actions in Table 3. The FMU received a claim fraud score and, based on that, took appropriate actions. Based on the choices in Table 3, we could achieve accuracy of 93.5% at the 2% cut-off point and accuracy of 99% at 23% cut-off.

 

web_p28_auto-fraud
Table 3: Claims fraud rating - actions. 

 

Model monitoring

Once the model is in production, a monitoring framework should be set to allow us to identify performance drops that would trigger model recalibration or redevelopment. For that, we will need to compare actual vs. predicted fraud rates on a standard basis. This process can be automated and presented on dashboards. Once more data is available, we will be able to assess the interaction of different dimensions to the time factor, to reassure time consistency of the patterns.
 


Conclusion

We have seen that the traditional rule-based approach can be blended with a machine learning pipeline in order to benefit FMU operations. Existing fraud case data can be modelled using different methods, and the predictions used to optimise the FMU operations. Monitoring can then be used to assess the effectiveness of the model and initiate feature recalibrations. The overall benefit of setting up this process is a reduction to the combined ratio through more fraud being identified and the possibly the pushing back of feedback to underwriting.
 



Nefeli Pamballi is a senior consultant at EY

Phanis Ioannou is a risk modelling manager at RCB Bank

Yiannis Parizas is head of pricing and actuarial analytics at Hellas Direct

Image credit | iStock

ACT Jun21_Full.jpg
This article appeared in our June 2021 issue of The Actuary .
Click here to view this issue

You may also be interested in...

web_p22_23_Machine-Binning_main

"Precision parameters: clustering approaches when binning dynamic risk factor data"

Paul Papenfus explains how a clustering approach can help when binning dynamic risk factor data
Wednesday 2nd June 2021
Open-access content
web_p20_PRO_CREDIT_iStock-531057937.jpg

Weighing the options of PPOs

Peter Towers and Justin Thomas explain the falling popularity of PPOs in claim settlements, and their implications for insurers
Wednesday 2nd June 2021
Open-access content
web_p36-37_solvency

How Solvency II regulation could be improved to better serve a post-Brexit UK

On behalf of the General Insurance Board and Solvency II Working Party, Amerjit Grewal shares members’ views on how Solvency II regulation could be tweaked to better serve a post-Brexit UK
Wednesday 2nd June 2021
Open-access content
web_p12-13_Interview_KN3new-tools-final_Illustration_Sarah-Auld_iStock.jpg

An alternative proposal: reforming the NHS

Kristian Niemietz talks to Chris Seekings and Ruolin Wang about his controversial ideas for reforming the UK’s National Health Service
Wednesday 2nd June 2021
Open-access content
web_p15_Climate_CREDIT_Alex Williamson-Ikon_00001105.jpg

Climate risk scenarios for pension schemes

What might climate-related risk analysis look like for pension schemes? Neil Mitchell, Claire Jones and Lisa Eichler investigate
Wednesday 2nd June 2021
Open-access content
web_p40-42

Steep learning curve: microinsurance products in the Philippines

Lorenzo Chan reflects on the lessons gained while building microinsurance products for Filipinos on low incomes
Wednesday 2nd June 2021
Open-access content

Latest from Risk & ERM

KV

Liability-driven investments: new landscape

What now for liability-driven investments, after last year’s crash in the market? Pensions experts Rakesh Girdharlal and Moiz Khan say it should lead to a more balanced approach
Wednesday 1st February 2023
Open-access content
cj

Natural capital investing

Chris Howells and Andrew Dreaneen discuss how today’s investments in natural capital profit portfolios as well as the planet and humanity
Wednesday 1st February 2023
Open-access content
bl

'Takaful' models of Islamic insurance

Ethical, varied and a growing market – ‘takaful’ Islamic insurance is worth knowing about, wherever you’re from and whatever your beliefs, says Ali Asghar Bhuriwala
Wednesday 1st February 2023
Open-access content

Latest from General Insurance

td

Brain power

The latest microchips mimic cerebral function. Smaller, faster and more efficient than their predecessors, they have the potential to save lives and help insurers, argues Amarnath Suggu
Wednesday 1st March 2023
Open-access content
bl

'Takaful' models of Islamic insurance

Ethical, varied and a growing market – ‘takaful’ Islamic insurance is worth knowing about, wherever you’re from and whatever your beliefs, says Ali Asghar Bhuriwala
Wednesday 1st February 2023
Open-access content
il

When 'human' isn't female

It was only last year that the first anatomically correct female crash test dummy was created. With so much data still based on the male perspective, are we truly meeting all consumer needs? Adél Drew discusses her thoughts, based on the book Invisible Women by Caroline Criado Perez
Wednesday 1st February 2023
Open-access content

Latest from General Features

yguk

Is anybody out there?

There’s no point speaking if no one hears you. Effective communication starts with silence – this is the understated art of listening, says Tan Suee Chieh
Thursday 2nd March 2023
Open-access content
ers

By halves

Reducing the pensions gap between men and women is a work in progress – and there’s still a long way to go, with women retiring on 50% less than men, says Alexandra Miles
Thursday 2nd March 2023
Open-access content
web_Question-mark-lightbulbs_credit_iStock-1348235111.png

Figuring it out

Psychologist Wendy Johnson recalls how qualifying as an actuary and running her own consultancy in the US allowed her to overcome shyness and gave her essential skills for life
Wednesday 1st March 2023
Open-access content

Latest from Data Science

gc

Free for all

Coding: those who love it can benefit those who don’t by creating open-source tools. Yiannis Parizas outlines two popular data science programming languages, and the simulator he devised and shared
Wednesday 1st March 2023
Open-access content
il

When 'human' isn't female

It was only last year that the first anatomically correct female crash test dummy was created. With so much data still based on the male perspective, are we truly meeting all consumer needs? Adél Drew discusses her thoughts, based on the book Invisible Women by Caroline Criado Perez
Wednesday 1st February 2023
Open-access content
res

Interview: Tim Harford on the importance of questioning our assumptions

Tim Harford speaks to Ruolin Wang about why it’s so important to slow down and question things from emotive headlines to the numbers and algorithms we use in our work
Wednesday 30th November 2022
Open-access content

Latest from Nefeli Pamballi

yuvf

Home or away? Opportunities and challenges

Yiannis Parizas and Nefeli Pamballi speak to expatriate and repatriate actuaries about the opportunities and challenges of working abroad and at home
Wednesday 1st June 2022
Open-access content
web-p36-37-CEO_shutterstock_1802990767.jpg

Moving on up: from actuary to CEO

Yiannis Parizas and Nefeli Pamballi explore what it takes for an actuary to become CEO of an insurance organisation
Wednesday 6th October 2021
Open-access content

Latest from Phanis Ioannou

ij

Choosing a pricing architecture

Yiannis Parizas and Phanis Ioannou weigh up the benefits of open-source and commercial solutions for general insurance pricing
Wednesday 2nd November 2022
Open-access content
uig

Taking the initiative: exploring pricing innovation

Yiannis Parizas and Phanis Ioannou consider how non-life organisations could gain a strategic advantage through various aspects of the pricing process
Wednesday 5th October 2022
Open-access content

Latest from Yiannis Parizas

gc

Going for it

Welcome to the first issue of 2023. This year has already got off to a good start for many, not least the students who have just qualified.
Wednesday 1st February 2023
Open-access content
h

Agility trial: How can ‘agile methodology’ benefit insurance?

Stefania Varnava and Yiannis Parizas examine the benefits of using agile methodology within the insurance industry
Wednesday 30th November 2022
Open-access content
ij

Choosing a pricing architecture

Yiannis Parizas and Phanis Ioannou weigh up the benefits of open-source and commercial solutions for general insurance pricing
Wednesday 2nd November 2022
Open-access content

Latest from June 2021

web_p4_dan-head7.png

Tackling sensitive topics

This month we interview Kristian Niemitz, head of political economy at the IEA, who posits that there is a better way to organise a health system than the NHS, in order to deliver improved outcomes (p12).
Wednesday 2nd June 2021
Open-access content
web_p44_Obituary_Nicolas Hornby Taylor FIA_Nick-Taylor_Life-article.jpg

People and society news: June

People and society news: June
Wednesday 2nd June 2021
Open-access content
web_p18-19_CDI_CREDIT_iStock-1217057529_v2.jpg

Cashflow driven investment strategies for DB pension schemes

Derek Steeden and Kedi Huang discuss how cashflow-driven investment can help defined benefit pension schemes manage cashflow and meet long-term funding targets
Wednesday 2nd June 2021
Open-access content
Share
  • Twitter
  • Facebook
  • Linked in
  • Mail
  • Print

Latest Jobs

Life Actuarial Contract - Capital Project (outside IR35)

England
Negotiable
Reference
149010

Pricing Consultant (Non-Life)

London / Leeds
Up to £70,000 + Benefits
Reference
148996

Senior Actuary

London (Central)
Negotiable
Reference
148991
See all jobs »
 
 
 
 

Sign up to our newsletter

News, jobs and updates

Sign up

Subscribe to The Actuary

Receive the print edition straight to your door

Subscribe
Spread-iPad-slantB-june.png

Topics

  • Data Science
  • Investment
  • Risk & ERM
  • Pensions
  • Environment
  • Soft skills
  • General Insurance
  • Regulation Standards
  • Health care
  • Technology
  • Reinsurance
  • Global
  • Life insurance
​
FOLLOW US
The Actuary on LinkedIn
@TheActuaryMag on Twitter
Facebook: The Actuary Magazine
CONTACT US
The Actuary
Tel: (+44) 020 7880 6200
​

IFoA

About IFoA
Become an actuary
IFoA Events
About membership

Information

Privacy Policy
Terms & Conditions
Cookie Policy
Think Green

Get in touch

Contact us
Advertise with us
Subscribe to The Actuary Magazine
Contribute

The Actuary Jobs

Actuarial job search
Pensions jobs
General insurance jobs
Solvency II jobs

© 2023 The Actuary. The Actuary is published on behalf of the Institute and Faculty of Actuaries by Redactive Publishing Limited. All rights reserved. Reproduction of any part is not allowed without written permission.

Redactive Media Group Ltd, 71-75 Shelton Street, London WC2H 9JQ