Skip to main content
The Actuary: The magazine of the Institute and Faculty of Actuaries - return to the homepage Logo of The Actuary website
  • Search
  • Visit The Actuary Magazine on Facebook
  • Visit The Actuary Magazine on LinkedIn
  • Visit @TheActuaryMag on Twitter
Visit the website of the Institute and Faculty of Actuaries Logo of the Institute and Faculty of Actuaries

Main navigation

  • News
  • Features
    • General Features
    • Interviews
    • Students
    • Opinion
  • Topics
  • Knowledge
    • Business Skills
    • Careers
    • Events
    • Predictions by The Actuary
    • Whitepapers
    • Moody's - Climate Risk Insurers series
    • Webinars
    • Podcasts
  • Jobs
  • IFoA
    • CEO Comment
    • IFoA News
    • People & Social News
    • President Comment
  • Archive
Quick links:
  • Home
  • The Actuary Issues
  • December 2020
General Features

Who do we think you are?

Open-access content Wednesday 2nd December 2020 — updated 10.45am, Tuesday 4th May 2021
Authors
Chantal Bond
Kai Zhu

Chantal Bond and Kai Zhu discuss how machine learning techniques can be used in consumer segmentation analysis  

web_p32-33_shutterstock_169721648 [Converted].jpg

Marriage and birth rates continue to decline worldwide, and home ownership rates have plummeted in a number of developed economies. A traditional life insurance consumer segmentation approach, which seeks to focus on the socioeconomic and demographic drivers for life events that lead to insurance purchases, will begin to lose its relevance in this context.

At the same time, insurers have access to a rapidly growing pool of data about consumers – but few have managed to really get to grips with it. How can this data be used to enhance understanding of consumer needs and therefore gain insights that lead to better outcomes for insurers and customers?

We will demonstrate a data analytics approach to consumer segmentation that uses machine learning techniques such as k-mean clustering and random forest classification, which can be applied to a variety of data sources. In this example, we will use data from a consumer needs survey commissioned by the IFoA Life Asia Sub-committee to identify and describe three distinct consumer segments based on their responses to the survey questions. 

“Future financial priorities were more important than country, age, income or education level in predicting which segment the consumer belonged to”

A practical three-step approach to consumer segmentation analysis

Consumer survey results are a typical example of unlabelled data sources, and it can be resource-intensive to derive insights from the large resulting datasets. Here we use the data from an independently commissioned consumer needs survey across three Asia markets (Mainland China, Hong Kong and Singapore) that had more than 1,000 participants, to show that if there is a quantitative framework and process in place, consumer insights can be obtained quickly and reliably from a non-traditional high-dimensional dataset such as this. 

In this example we outline a practical three-step approach that could be automated to significantly reduce the turnaround time from analysing data to generating actionable insights. 

In some ways, this process is the reverse of the traditional approach to consumer segmentation – rather than first defining some demographic or socioeconomic buckets and then segmenting consumers into them, we first segment the consumers into homogeneous groups based on their survey responses (step 1), then look at what variable connects the consumers in each group (step 2), and finally describe the groups based on this variable (step 3).

Step 1: Segment your data 

When given an unlabelled dataset, the first step of our process is to segment it using k-mean clustering, a common unsupervised machine learning method used to understand data structure. The k-mean clustering will group unlabelled data points into a pre-specified number of segments such that the data points within each segment are as homogeneous as possible. 

Before we apply the k-mean clustering, we need to determine the number of segments we should divide the data into. We use the elbow method to determine the appropriate number of distinct segments. This method examines the amount of variance explained by the segment analysis as a function of the number of distinct segments used. The segment number used in the k-mean clustering exercise is chosen such that any additional segments used would yield a decreasing marginal gain in reducing the variance explained in the segment analysis.

Based on interviewees’ responses to the IFoA Asia consumer needs survey, we found for each market that k-mean clustering yields decreasing marginal gain when the number of segments used to divide the data is more than three.

Figure 1

Step 2: Identify the key independent variables that define the segments

After the data is segmented we convert the unlabelled dataset into a ‘labelled’ dataset, as each data point has been labelled by the distinct segment to which it belongs based on the k-mean clustering results in step 1. The random forest classifier is a regression model that uses a large number of decision trees built from the top-down approach based on the order of independent variables in terms of their influences, measured by information gain, in predicting the outcome of the dataset. The random forest classifier is trained using the labelled dataset to predict which data point would belong to which segment, and the impact of each independent variable on the accuracy of the model is measured by the information gained in order to identify the most influential independent variables in predicting the segment that data point belongs to.

We used Python’s sklearn library to train the random forest classifier based on the already segmented dataset from step 1. After the model was trained, we used the random forest feature selection method 
in the sklearn library to  rank the variables in order of the information gained from each of them. It was found that the most important variable, accounting for more than two-thirds of the total information gain, was how an interviewee ranked their future financial priorities. This was more important than country, age, income or education level in predicting which segment the consumer belonged to – showing some of the limitations of a traditional demographics-based segmentation approach.

Figure 2

Step 3: Profile the segments’ characteristics based on key independent variables  

As the last step, we generate characteristic profiles for all the segments based on the most influential variables identified in step 2. As ‘future financial priorities’ was identified as the most important determinant of predicting the segment the interviewee belongs to, the future financial priority profiles were generated for the three consumer segments. Figure 2 shows how we characterised the three consumer segments identified in the Asia consumer markets.

“While it’s clear that the life stage model is still useful, its relevance is waning”

Differences between markets

Generally, the segmentation results for the three markets (Mainland China, Hong Kong and Singapore) were remarkably similar, demonstrating the wide applicability of a financial priorities-based segmentation. Nonetheless, there were a few key differences, reflecting different economic contexts, for instance:

  • The consumers in Hong Kong tend to move into each of the segments at a later age. This may be linked to Hong Kong’s housing market, which is one of the least affordable globally.
  • Singapore has fewer individuals in the ‘managing competing needs’ segment (39%) than the other two markets (around 50%). This may be because of affordable public housing and accessible high-quality public education, which reduces some of the financial needs for working families.
  • Singapore respondents in all segments ranked buying a car as a low priority (Singapore’s car ownership rate is very low), whereas Mainland China respondents gave greater priority to paying taxes (the top income tax rate is 45% in China, vs 22% in Singapore and 17% in Hong Kong).

What are the uses of this technique for the life insurance industry?

A data analytical approach to segmentation can yield results that are more relevant to today’s consumer landscape, and can make better use of a wider range of data sources. These could include any labelled or unlabelled consumer data already available to insurers, including consumer interactions and feedback on social media, purchasing patterns and web browsing data, call centre transcripts, postcode/location insights, and commercially available data. It could also include emerging sources of consumer data such as connected devices. Setting up an enterprise-level analytical framework and processes to derive consumer insights in real time from the ever-growing pool of data can, for example, improve sales conversion rates and facilitate cross-selling by creating a richer understanding of financial needs.

This, of course, has implications for marketing and sales strategies for both insurers and distributors, as they seek to identify the most relevant markets for different products. There are also opportunities for improved product design. For example, one of the key findings of our survey was a strong desire for more flexibility in insurance products. Hence, the ability to design products which can grow with consumers or be adapted for different customer segments would be likely to be well received by policyholders while also having persistency benefits for insurers.

While machine learning techniques are already used in predictive underwriting and may also be used for analysing insurers’ claims and persistency experience, they are rarely applied to the more qualitative data sources discussed here – but the real value is in looking at these data sets together. Once we have a richer understanding of, say, the lapse behaviour of a particular consumer segment, we can use these insights in a predictive context, which in turn can create more proactive opportunities for engagement, communication, sales and retention.

Chantal Bond is head of Actuarial, APAC at SCOR Global Life and chair of the IFoA Life Asia Sub-committee

Kai Zhu is a manager at KPMG Advisory (Hong Kong) Limited and a member of the IFoA Life Asia Sub-committee

Image Credit: Shutterstock
ACT Dec20_Full.jpg
This article appeared in our December 2020 issue of The Actuary .
Click here to view this issue

You may also be interested in...

web_p22-25_081218-03 [Converted].jpg

A new domain

A new domain
Wednesday 2nd December 2020
Open-access content
web_p38-39_iStock-1277020057.jpg

Tuesday's child

Kevin Olding is the creator of the Mathsaurus website (mathsaurus.com) and YouTube channel, and a PhD student in the SAMBa Doctoral Training Centre at the University of Bath
Wednesday 2nd December 2020
Open-access content
web_p35_iStock-1146518933.jpg

Guiding lights

Bradley Shearer is executive director of Protagion Active Career Management, actuary and CFA charterholder
Wednesday 2nd December 2020
Open-access content
Celebrating 30 years

A trip through time

The Actuary has been connecting the actuarial community since 1990. In this 30th anniversary commemorative edition, some of our past editors tell us about their time at the helm
Wednesday 2nd December 2020
Open-access content
web_p40_iStock-1215953761 [Converted].jpg

From FIA to IFA

From FIA to IFA What is an independent financial advisor, how do you become one, and why is this relevant to actuaries? Darryl Boulton explains
Wednesday 2nd December 2020
Open-access content
After the dust settles

After the dust settles

After the dust settles
Wednesday 2nd December 2020
Open-access content

Latest from Modelling/software

EG\

Uneven outcomes: findings on cancer mortality

Ayşe Arık, Andrew Cairns, Erengul Dodd, Adam Shao and George Streftaris share their findings on the impact of socio-economic differences and diagnostic delays on cancer mortality
Wednesday 1st June 2022
Open-access content
dtj

Talking census: making use of data

With the ONS starting to release the data from the 2021 census, Jeremy Keating considers how those working in insurance can make use of it
Wednesday 1st June 2022
Open-access content
hrts

Storm watch: Can IPCC models be used in cat modelling?

Can IPCC projections be used to adjust catastrophe models for climate change? Nigel Winspear and David Maneval investigate, using US hurricanes as an example
Wednesday 1st June 2022
Open-access content

Latest from Technology

gc

Free for all

Coding: those who love it can benefit those who don’t by creating open-source tools. Yiannis Parizas outlines two popular data science programming languages, and the simulator he devised and shared
Wednesday 1st March 2023
Open-access content
ty

Data detective

Heard about the chatbot ChatGPT? Artificial intelligence is advancing rapidly, says Arjun Brara – and could soon be used to refine ESG ratings and expose greenwashing
Wednesday 1st March 2023
Open-access content
td

Brain power

The latest microchips mimic cerebral function. Smaller, faster and more efficient than their predecessors, they have the potential to save lives and help insurers, argues Amarnath Suggu
Wednesday 1st March 2023
Open-access content

Latest from General Features

yguk

Is anybody out there?

There’s no point speaking if no one hears you. Effective communication starts with silence – this is the understated art of listening, says Tan Suee Chieh
Thursday 2nd March 2023
Open-access content
ers

By halves

Reducing the pensions gap between men and women is a work in progress – and there’s still a long way to go, with women retiring on 50% less than men, says Alexandra Miles
Thursday 2nd March 2023
Open-access content
web_Question-mark-lightbulbs_credit_iStock-1348235111.png

Figuring it out

Psychologist Wendy Johnson recalls how qualifying as an actuary and running her own consultancy in the US allowed her to overcome shyness and gave her essential skills for life
Wednesday 1st March 2023
Open-access content

Latest from December 2020

Fiasco

Fiasco: An amateur joy

As The Actuary celebrates its 30th birthday, David Raymont digs into the archives to acknowledge its predecessor Fiasco, which ran for 127 issues between February 1978 and June 1990
Wednesday 2nd December 2020
Open-access content
web_p43_IFoA Foundation Logo - gold & blue - CMYK.jpg

People and society news: December

People and society news: December
Wednesday 2nd December 2020
Open-access content
web_p41_student_scarsbrook_dec.jpg

Student: Widening the net

Arpit Surana is a guest student editor
Wednesday 2nd December 2020
Open-access content
Share
  • Twitter
  • Facebook
  • Linked in
  • Mail
  • Print

Latest Jobs

Actuarial Contract Opportunities - Life Insurance

United Kingdom, Ireland and Remote
Competitive
Reference
148599

Pricing Manager (Mid-Corp)

London (Central)
£75000.00 - £90000.00 per annum
Reference
148749

Head of Insurance Pricing Risk

London (Central)
£100000.00 - £130000.00 per annum
Reference
148748
See all jobs »
 
 
 
 

Sign up to our newsletter

News, jobs and updates

Sign up

Subscribe to The Actuary

Receive the print edition straight to your door

Subscribe
Spread-iPad-slantB-june.png

Topics

  • Data Science
  • Investment
  • Risk & ERM
  • Pensions
  • Environment
  • Soft skills
  • General Insurance
  • Regulation Standards
  • Health care
  • Technology
  • Reinsurance
  • Global
  • Life insurance
​
FOLLOW US
The Actuary on LinkedIn
@TheActuaryMag on Twitter
Facebook: The Actuary Magazine
CONTACT US
The Actuary
Tel: (+44) 020 7880 6200
​

IFoA

About IFoA
Become an actuary
IFoA Events
About membership

Information

Privacy Policy
Terms & Conditions
Cookie Policy
Think Green

Get in touch

Contact us
Advertise with us
Subscribe to The Actuary Magazine
Contribute

The Actuary Jobs

Actuarial job search
Pensions jobs
General insurance jobs
Solvency II jobs

© 2023 The Actuary. The Actuary is published on behalf of the Institute and Faculty of Actuaries by Redactive Publishing Limited. All rights reserved. Reproduction of any part is not allowed without written permission.

Redactive Media Group Ltd, 71-75 Shelton Street, London WC2H 9JQ