Skip to main content
The Actuary: The magazine of the Institute and Faculty of Actuaries - return to the homepage Logo of The Actuary website
  • Search
  • Visit The Actuary Magazine on Facebook
  • Visit The Actuary Magazine on LinkedIn
  • Visit @TheActuaryMag on Twitter
Visit the website of the Institute and Faculty of Actuaries Logo of the Institute and Faculty of Actuaries

Main navigation

  • News
  • Features
    • General Features
    • Interviews
    • Students
    • Opinion
  • Topics
  • Knowledge
    • Business Skills
    • Careers
    • Events
    • Predictions by The Actuary
    • Whitepapers
    • Moody's - Climate Risk Insurers series
    • Webinars
    • Podcasts
  • Jobs
  • IFoA
    • CEO Comment
    • IFoA News
    • People & Social News
    • President Comment
  • Archive
Quick links:
  • Home
  • The Actuary Issues
  • June 2021
General Features

Road testing: machine learning and the efficiency of fraud detection

Open-access content Wednesday 2nd June 2021
Authors
Nefeli Pamballi
Phanis Ioannou
Yiannis Parizas

Nefeli Pamballi, Phanis Ioannou and Yiannis Parizas outline how machine learning could help increase the efficiency of fraud detection in motor insurance

we_p26_Road-testing

Fraudulent claims are a significant cost to personal motor insurance products, typically increasing the combined ratio of the insurer by 5%-10%. Traditionally, firms would use expert judgment algorithms to decide which claims would be investigated for fraud. We set up a machine learning pipeline to help optimise processes in the Fraud Management Unit (FMU), to reduce the cost of fraudulent claims. Using data science techniques, we focused on reducing costs and increasing the efficiency of fraud detection processes by concentrating fraud investigation efforts on claims that were more likely to be fraudulent. Organisations would benefit from:

  • A reduction in operating expenses, as claims with low probability of being fraudulent will be fast-tracked

  • A better customer experience from fast-tracked customers, leading to higher customer satisfaction and retention levels

  • An increased fraud detection rate that reduces the combined ratio.

In our case study, claims were assigned a fraud-likeliness score, with two thresholds for intervention. The lower threshold was interpreted as the cut-off for fast-tracked claims and anything above the higher threshold was interpreted as requiring anti-fraud action; anything between the two thresholds was sent for assessment by the FMU. The number of claims falling between the two thresholds was driven by the FMU’s monthly capacity for investigating claims.

The training and testing data was from past fraud cases, and we tested various statistical and machine learning models for predicting fraudulent claims. The conclusion was that three particular models, in combination, yielded the best predictions. Future monitoring will be an important element of the process, as it will guarantee the sustainability of the work by ensuring the framework remains up to date and fit for purpose.

Data preparation and exploratory analysis

Before exploratory analysis or modelling was performed, an extract, transform, load (ETL) process was set up for data preparation and cleansing. Building the ETL took up most of the project time, but making it flexible and easy to use will be beneficial for future calibrations, and the time invested here will pay dividends by increasing the speed of processing in future recalibrations.

The average fraud rate across the entire data-frame was 0.83%. This represents the reported or confirmed fraud rate, rather than the actual fraud rate, which would be expected to be higher than the reported rate since it is unlikely that 100% of actual frauds were detected. Based on exploratory analysis, the most important dimensions to include in the model were:

  • The time taken to report the fraudulent claim from the beginning of the policy: We saw evidence that higher fraud rates occurred closer to the policy start date

  • Policy duration: It seems reasonable that the perpetrator of pre-meditated fraud does not require long cover

  • Customer duration days: The longer the claimant has been a customer, the less likely they are to file a fraudulent claim. This was expected, as fraudulent customers tend to switch insurers frequently

  • Number of previous claims: Another reasonable assumption is that a large number of previous claims could mean that the latest claim is fraudulent

  • Claim cover type: The fraud rate was lowest for third-party liability claims, highest for own damages and theft, and somewhere in the middle for glass. This is not a surprising observation, as the claimant does not benefit from the third-party cover claims.

The above findings were discussed with the FMU and validated for reasonableness. The FMU provided additional possible risk drivers, but the analysis only supported the smaller set above. However, the FMU’s expertise was key, and other drivers that emerge in the future could be incorporated into the model if the data supports this.

Methodology

Only supervised learning algorithms were considered appropriate in this case and, since the response variable was binary, we decided to approach this problem as a binary classification problem, with the model predicting two classifications: Fraud or Non-Fraud.

Taking into consideration the results of our preliminary analysis and our choice of response variable, we opted to test logistic regression, classification tree and gradient boosting (XGBoost) algorithms, as well as an ensemble model that combined all three methods. 

  • Logistic regression: Logistic regression is a probabilistic statistical classification model and can be used when the dependent variable Y is binary. It involves the transformation of the linear regression model using a sigmoid function. Logistic regression can be used as a powerful classification algorithm, which assigns observations for a discrete set of classes (in this case binary: Fraud or Not Fraud).

  • Decision trees: Decision trees work for both categorical and continuous input and output variables. There are two types of decision trees: regression trees and classification trees. Regression trees predict a quantitative response, while classification trees predict a qualitative one. There are many decision tree variants, but they all do the same thing – subdivide the feature space into regions with mostly the same label. Decision trees are easy to understand and implement.

  • Gradient boosting (XGBoost): The algorithm sequentially builds trees so that, in every subsequent tree, it aims to reduce the errors of the previous tree by using predecessors as learning sources. This technique is called ‘boosting’ in the field of data science and it builds small and highly interpretable trees by giving the modeller the option to choose and optimise the hyperparameters during the process.

  • Ensemble model: The ensemble method is a technique that takes the predictions generated by several base models and combines them to reach a single prediction. In this exercise, we have averaged the three models to combine them. This method usually generates more accurate results than a single model. The key to the success of this method is that base models perform better in different parts of the dataset – so by averaging the results of several models, we could improve the model performance as a whole.

 

web_p27_auto_fraud
Figure 1: Start to accident days- model predictions on out-of-sample data.

 

The data available to us for this exercise was split into three sub-samples:

  • 60% for training these models: used to fit the parameters

  • 20% for validation: used in optimising the hyperparameters and keeping track of the performance

  • 20% for testing: used to provide an unbiased evaluation of a final model fit.

Model comparison

We fitted the four models and then compared them using different measures to determine which one offered the most predictive power for fraud detection. Figure 1 shows the model predictions compared with the actual fraud rate for the out-of-sample data, and the measure of the time taken to report the fraudulent claim from the beginning of the policy – which turned out to be the most important variable in the models. The bars represent the number of claims in each 50-day period. We can see that all models capture the fact that the earlier reported claims are more likely to be fraudulent. The decision tree method is less smooth than the other methods and does not decrease sufficiently after the first two periods.

To be able to compare the four models and decide on the most appropriate one, we used the ‘area under the ROC curve’ metric, which is standard practice in such cases. A receiver operating characteristic (ROC) curve is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold changes. Table 1 summarises the performance of the fitted models in the three data categories (training, validation and testing). The most appropriate for decision-making is the final test data category, because this is an out-of-sample test based on data not used in the calibration.

We observe that the ensemble model is not ranked first in the training set, but it offers better performance for the validation and test sets. We can thus conclude that it is the best fraud detection model in this scenario.

“We fitted four models and compared them to determine which one offered the most predictive power for fraud detection”

In addition to the ‘area under the curve’ performance metric, we analysed the confusion matrices for the four models, to further evaluate models and extract additional information about their performance on the test set. By confusion matrix, we refer to a table used to describe the performance of a classification model on a set of test data for which the true values are known. A 5% cut-off point was considered for the probability of fraud at this stage.

Table 2 summarises the results of the confusion matrix analysis. We observe that the accuracy (how often the classifier is correct is its prediction), precision (when the classifier predicts a fraudulent case, how often it is correct) and sensitivity (from all the actually fraudulent cases, how often the classifier predicts that it is fraudulent) levels of the ensemble method show the best performance among the four models. As such, we have decided to proceed with the ensemble model for implementation and deployment.

 

we_p28|_auto_fraud
Table 1: Model performance (AUC) for each of the base models and ensemble methods. 

Table 2: Confusion matrix-statistics - cut off 5%. 

 

From model to decision making

For the practical implementation of the model, operational use by the FMUs and subsequent use for decision-making purposes, two threshold levels will have to be determined, as mentioned above: an upper threshold that initiates anti-fraud actions, and a lower threshold for determining whether to investigate or fast-track.

When determining the upper threshold level, we considered how to minimise the number of claims wrongfully classified as fraud (false positives). False positives could adversely affect the company’s reputation and customers’ satisfaction.

When choosing the level of the lower threshold, we considered FMU operational capacity and the number of claims that the FMU can investigate daily. A higher proportion of claims being classified in the category below the lower threshold could also attest to lower operational costs and a more pleasant customer journey. The current FMU’s monthly operational capacity allows for 150 claims to be investigated, on average. As shown in Figure 2, the claim investigation capacity corresponds to a cut-off probability of fraud of 2%.

 

we_p29_auto-fraud
Figure 2: Predicted fraudulent cases vs. cut-off point. 

 

Furthermore, considering the precision level at several cut-off points, as shown in Figure 3, it was observed that its value is maximised at 23% probability of fraud – meaning this is the point at which we are most confident that the detected cases are fraudulent with the highest probability.

 

web_p29_auto-fraud
Figure 3: Precision vs. cut-off point- maximisation of precision. 

 

Based on the above considerations, we set thresholds and respective actions in Table 3. The FMU received a claim fraud score and, based on that, took appropriate actions. Based on the choices in Table 3, we could achieve accuracy of 93.5% at the 2% cut-off point and accuracy of 99% at 23% cut-off.

 

web_p28_auto-fraud
Table 3: Claims fraud rating - actions. 

 

Model monitoring

Once the model is in production, a monitoring framework should be set to allow us to identify performance drops that would trigger model recalibration or redevelopment. For that, we will need to compare actual vs. predicted fraud rates on a standard basis. This process can be automated and presented on dashboards. Once more data is available, we will be able to assess the interaction of different dimensions to the time factor, to reassure time consistency of the patterns.
 


Conclusion

We have seen that the traditional rule-based approach can be blended with a machine learning pipeline in order to benefit FMU operations. Existing fraud case data can be modelled using different methods, and the predictions used to optimise the FMU operations. Monitoring can then be used to assess the effectiveness of the model and initiate feature recalibrations. The overall benefit of setting up this process is a reduction to the combined ratio through more fraud being identified and the possibly the pushing back of feedback to underwriting.
 



Nefeli Pamballi is a senior consultant at EY

Phanis Ioannou is a risk modelling manager at RCB Bank

Yiannis Parizas is head of pricing and actuarial analytics at Hellas Direct

Image credit | iStock

ACT Jun21_Full.jpg
This article appeared in our June 2021 issue of The Actuary.
Click here to view this issue
Also filed in:
General Features
Topics:
Data Science
Risk & ERM
General Insurance

You might also like...

Share
  • Twitter
  • Facebook
  • Linked in
  • Mail
  • Print

Latest Jobs

Senior Underwriting Risk Manager

London (Central)
£85K-£95K + Benefits
Reference
124386

Reserving Manager (Contract)

London (Central)
£1200 - £1400 per day
Reference
124385

Life Actuary - Contract - IFRS 17 Financial Impact

England, London / England, Bristol / North Yorkshire, England
£900 - £1150 per day
Reference
124384
See all jobs »
 
 

Today's top reads

 
 

Sign up to our newsletter

News, jobs and updates

Sign up

Subscribe to The Actuary

Receive the print edition straight to your door

Subscribe
Spread-iPad-slantB-june.png

Topics

  • Data Science
  • Investment
  • Risk & ERM
  • Pensions
  • Environment
  • Soft skills
  • General Insurance
  • Regulation Standards
  • Health care
  • Technology
  • Reinsurance
  • Global
  • Life insurance
​
FOLLOW US
The Actuary on LinkedIn
@TheActuaryMag on Twitter
Facebook: The Actuary Magazine
CONTACT US
The Actuary
Tel: (+44) 020 7880 6200
​

IFoA

About IFoA
Become an actuary
IFoA Events
About membership

Information

Privacy Policy
Terms & Conditions
Cookie Policy
Think Green

Get in touch

Contact us
Advertise with us
Subscribe to The Actuary Magazine
Contribute

The Actuary Jobs

Actuarial job search
Pensions jobs
General insurance jobs
Solvency II jobs

© 2022 The Actuary. The Actuary is published on behalf of the Institute and Faculty of Actuaries by Redactive Publishing Limited. All rights reserved. Reproduction of any part is not allowed without written permission.

Redactive Media Group Ltd, 71-75 Shelton Street, London WC2H 9JQ