Skip to main content
The Actuary: The magazine of the Institute and Faculty of Actuaries - return to the homepage Logo of The Actuary website
  • Search
  • Visit The Actuary Magazine on Facebook
  • Visit The Actuary Magazine on LinkedIn
  • Visit @TheActuaryMag on Twitter
Visit the website of the Institute and Faculty of Actuaries Logo of the Institute and Faculty of Actuaries

Main navigation

  • News
  • Features
    • General Features
    • Interviews
    • Students
    • Opinion
  • Topics
  • Knowledge
    • Business Skills
    • Careers
    • Events
    • Predictions by The Actuary
    • Whitepapers
    • Moody's - Climate Risk Insurers series
    • Webinars
    • Podcasts
  • Jobs
  • IFoA
    • CEO Comment
    • IFoA News
    • People & Social News
    • President Comment
  • Archive
Quick links:
  • Home
  • The Actuary Issues
  • January/February 2021
General Features

Insurance and social media: Keyboard worriers

Open-access content Wednesday 3rd February 2021
Authors
John Ng
Melanie Zhang

John Ng and Melanie Zhang discuss how they analysed Twitter sentiment relating to COVID-19 and insurance – and how insurers could use such analysis

web_p27_HIRES_CMYK_Emoji-Artwork_RAT-C7-M0-Y100-K0.png

Twitter offers the public the opportunity to give their real-time thoughts on world events by posting short texts known as ‘tweets’. The COVID-19 pandemic was the defining event of 2020, which makes it a great subject for sentiment analysis – the use of natural language processing to automatically determine the emotion a writer is expressing in a piece of text – or tweets. We wanted to see if we could use Twitter data relating to COVID-19 in the UK to uncover insights pertinent to public interest and the insurance industry.

From unstructured data to sentiment prediction

Twitter is a goldmine of big data, involving more than 1bn user accounts that generate around 500m daily tweets – roughly 200bn tweets per year. This data comes with challenges, such as use of unconventional language, potential biases, and lengthy processing time. There are also a number of steps involved in preparing the data, building models and making sentiment predictions.

Step 1: Preparing the data

Twitter data is openly accessible to developers via Twitter’s application programming interface (API). We used the Twitter identification numbers (IDs) from 1 January to 22 November 2020, which were published in the study ‘COVID-19 Twitter chatter dataset for scientific use’ by Georgia State University’s Panacea Lab.  

DocNow’s Hydrator was used to extract (‘hydrate’) the original tweets and metadata via these IDs. Twitter restricts the hydration rate, so we took a 25% sample of the full dataset and adopted a distributed approach to data mining. This resulted in 25m English tweets, for which hydration would take around 10 days using a single Twitter account.

User location data is noisy. We used techniques such as direct mapping, semi-structured mapping and Google’s Geocoding API to obtain 1.7m UK tweets.

Step 2: Training machine learning models

For supervised learning, a sentiment label needs to be assigned to each tweet. Manual labelling is labour-intensive, so we applied an automated binary classification approach based on tweets containing positive or negative emoticons – labelled ‘positive’ (eg positive emojis) or ‘negative’ (eg negative emojis) accordingly. Only a small subset of tweets contained strong sentiment emoticons. We used 7,000 emoticon-labelled tweets for training and manually labelled 3,000 tweets for testing, taken from 1 January to 26 April 2020.

We augmented our training set with the ‘sentiment140’ dataset, a non-COVID-19 labelled Twitter dataset; this brought our enriched training set to 200,000 labelled tweets, with equal numbers of positive and negative labels. This contributed to a larger vocabulary and could improve predictive performance.

Next, we carried out the pre-processing and encoding steps depicted in Figure 1. Encoding is the feature extraction step that converts a set of words (‘tokens’) into numerical vectors (‘features’).

For each dataset, we explored a variety of machine learning algorithms and encodings to predict binary classification of positive (+1) or negative (-1) sentiment. The model performance metric is ‘area under the ROC curve’ (AUC), an aggregate measure of performance across all classification thresholds. We compared these against simple baseline models – SentiWordNet and TextBlob (open-source tools ready for ‘out-of-the-box’ use).

Based on results in Table 1, run-time and simplicity, our final selected model is regularised logistic regression with TF-IDF encoding trained on the enriched dataset. This achieved a 0.859 AUC. Fine-tuning machine learning models on a COVID-19-specific Twitter dataset can significantly outperform open-source tools.

Figure 1

Table 1

Step 3: Sentiment analysis

Our selected model was then used to assign individual sentiment scores to all 1.7m UK tweets. The resulting scores enable further analysis of overall trends over time, sentiment relating to specific topics, and underlying drivers.

Top concerns relating to coronavirus

Figure 2 compares 20 of the most popular topics during the first wave versus during the whole duration of pandemic. Top-of-mind topics include ‘lockdown’, ‘government’, ‘deaths’, ‘cases’ and ‘health’.

The UK’s NHS was a frequent topic during the first wave, but not during subsequent months. It was associated with the positive sentiment resulting from the ‘clap for carers’ initiative, but there were dips in sentiment relating to fears over shortages of hospital beds and personal protective equipment. Discussion of ‘vaccines’, ‘school’ and ‘masks’ were relatively uncommon during the first wave but subsequently became mainstream.

Figure 2

Sentiment analysis and overall trend

Tweets in February focused on COVID-19 development in other countries and carried a more negative sentiment. Since the inflection point in mid-March, overall sentiment has remained positive for the rest of the year. Sentiment towards the first lockdown was generally positive.

The granularity of text data enables us to perform deeper topical analysis by analysing the sentiment and context around certain words. We will look at two examples here: the words ‘government’ and ‘insurance’.

Figure 5 shows that the sentiment on government was low during the first wave, but improved and hovered around neutral from April to November. These trends are broadly consistent with University College London’s COVID-19 social study, a panel study of more than 70,000 respondents conducted via online weekly surveys. Sentiment analysis of social media could be a cost-effective tool for analysing the evolution of public opinion; traditional surveys can suffer from lower coverage and time lags. However, there are potential biases relating to the demographics of social media users as compared to the wider population.

Figure 3 and 4

Sentiment on insurance and insurers

Figure 6 shows a large peak in February due to an increase in tweets about travel insurance advice. The dip in March before lockdown was mainly due to government advice that asked the public to stay away from pubs and restaurants without enforcing closures, leaving businesses unable to claim insurance and liable to bankruptcy.

Many insurers are perceived negatively due to COVID-19-related claims and losses, business interruption, event cancellation, legal disputes, mismanagement of funds, and dividend cuts. Conversely, NFU Mutual, Admiral, Vitality and Cigna are examples of insurers with favourable sentiment thanks to their customer service, motor policy refunds and financial resilience. It is encouraging to see positivity towards customer service and insurer advice on mental health, exercise and workplace culture – these actions could be emulated by other insurers for the good of society.

Figure 5 and 6

Sentiment analysis in the insurance industry

Sentiment analysis and social media could be leveraged by insurers in their digital transformation journey. Trends can be identified from ‘voice of the customer’ analysis, leading to value proposition. For example, the pandemic could drive demand for protection products, usage-based insurance and bike insurance. 

In addition, Twitter sentiment analysis is useful for reputation management, allowing insurers to monitor public opinion of their organisations, products or marketing campaigns.  

Studies such as ‘Psychological language on Twitter predicts county-level heart disease mortality’ (Eichstaedt et al., 2015) and ‘Correlating Twitter language with community-level health outcomes’ (Schneuwly et al., 2019) have found Twitter language to be correlated with mortality and morbidity outcomes such as heart disease, diabetes and cancer. Inevitably, this suggests potential application in underwriting and pricing. However, this application would require rigorous checks around ethical and privacy considerations, and analysis of correlation-versus-causation effects.

Nevertheless, these methods have promising applications across the insurance value chain, including in product development, sales, marketing, competitor analysis, social profiling and, ultimately, providing better services to customers.

John Ng is a senior data scientist at RGA and chair of the IFoA Data Science Research Section

Melanie Zhang is a senior portfolio manager at Ki Insurance
 

ACT JanFeb21 Full LR.jpg
This article appeared in our January/February 2021 issue of The Actuary .
Click here to view this issue

You may also be interested in...

filling_niche_iStock-1203139711-.png

Filling the niche

What is an actuarial data scientist, what do they need to know and why are they necessary as actuarial departments start to unlock the full potential of AI? Dawid Kopczyk explains
Wednesday 3rd February 2021
Open-access content
winter-of-discontent_iStock-1204344645-[Converted].png

Winter of discontent

Adele Groyer and John O’Brien assess the impact of the coronavirus pandemic on claims in 2020, and how this might play out
Wednesday 3rd February 2021
Open-access content
Insurance: Collaboration without compromise

Insurance: Collaboration without compromise

Małgorzata Śmietanka introduces the opportunities for federated learning and privacy-preserving data access in insurance
Wednesday 3rd March 2021
Open-access content
Angle of approach to climate change

Angle of approach to climate change

Firms need to develop a structured approach to managing climate risks within long-dated asset portfolios, say Jonathan Lim, Sandy Trust and Ryan Allison
Wednesday 3rd March 2021
Open-access content
Proxy-models_iStock-1224300127-v2-back-ground.jpg

Proxy models: uncertain terms

Peter Murphy and Marco Radun share a technique for improving the accuracy of proxy models
Wednesday 3rd March 2021
Open-access content
Mortgage credit guarantees: a step into the unknown

Mortgage credit guarantees: a step into the unknown

Imran Haider on how he developed a capital model for mortgage credit guarantees in Saudi Arabia – from scratch
Wednesday 3rd March 2021
Open-access content

Latest from Modelling/software

EG\

Uneven outcomes: findings on cancer mortality

Ayşe Arık, Andrew Cairns, Erengul Dodd, Adam Shao and George Streftaris share their findings on the impact of socio-economic differences and diagnostic delays on cancer mortality
Wednesday 1st June 2022
Open-access content
dtj

Talking census: making use of data

With the ONS starting to release the data from the 2021 census, Jeremy Keating considers how those working in insurance can make use of it
Wednesday 1st June 2022
Open-access content
hrts

Storm watch: Can IPCC models be used in cat modelling?

Can IPCC projections be used to adjust catastrophe models for climate change? Nigel Winspear and David Maneval investigate, using US hurricanes as an example
Wednesday 1st June 2022
Open-access content

Latest from Technology

gc

Free for all

Coding: those who love it can benefit those who don’t by creating open-source tools. Yiannis Parizas outlines two popular data science programming languages, and the simulator he devised and shared
Wednesday 1st March 2023
Open-access content
ty

Data detective

Heard about the chatbot ChatGPT? Artificial intelligence is advancing rapidly, says Arjun Brara – and could soon be used to refine ESG ratings and expose greenwashing
Wednesday 1st March 2023
Open-access content
td

Brain power

The latest microchips mimic cerebral function. Smaller, faster and more efficient than their predecessors, they have the potential to save lives and help insurers, argues Amarnath Suggu
Wednesday 1st March 2023
Open-access content

Latest from General Features

yguk

Is anybody out there?

There’s no point speaking if no one hears you. Effective communication starts with silence – this is the understated art of listening, says Tan Suee Chieh
Thursday 2nd March 2023
Open-access content
ers

By halves

Reducing the pensions gap between men and women is a work in progress – and there’s still a long way to go, with women retiring on 50% less than men, says Alexandra Miles
Thursday 2nd March 2023
Open-access content
web_Question-mark-lightbulbs_credit_iStock-1348235111.png

Figuring it out

Psychologist Wendy Johnson recalls how qualifying as an actuary and running her own consultancy in the US allowed her to overcome shyness and gave her essential skills for life
Wednesday 1st March 2023
Open-access content

Latest from January/February 2021

Liquidity risk: A wake-up call

Liquidity risk: A wake-up call

Matt Roberts-Sklar and Sheila Torrance explain why there is increased attention on liquidity risk from margin calls
Wednesday 3rd February 2021
Open-access content
Interview Ian Allan: Forging his own path

Interview Iain Allan: Forging his own path

Iain Allan talks to Chris Seekings and Mahidhara Davangere about his unconventional career path, and the exciting opportunities awaiting actuaries in the banking world
Wednesday 3rd February 2021
Open-access content
A call to action: IFoA president

A call to action: IFoA president

IFoA president Tan Suee Chieh discusses how actuaries can work for the public interest in a time of technological and societal upheaval, and looks ahead to 2021’s exciting programme of events
Wednesday 3rd February 2021
Open-access content
Share
  • Twitter
  • Facebook
  • Linked in
  • Mail
  • Print

Latest Jobs

Investment Consultant

Scotland / Scotland, Edinburgh / London, England
Up to £70000.00 per annum
Reference
148689

Market Risk Capital Actuary/Quant

London (Central)
£65,000 - £115,000 plus bonus and package
Reference
148688

Experience Analysis Contractor

England
Negotiable
Reference
148687
See all jobs »
 
 
 
 

Sign up to our newsletter

News, jobs and updates

Sign up

Subscribe to The Actuary

Receive the print edition straight to your door

Subscribe
Spread-iPad-slantB-june.png

Topics

  • Data Science
  • Investment
  • Risk & ERM
  • Pensions
  • Environment
  • Soft skills
  • General Insurance
  • Regulation Standards
  • Health care
  • Technology
  • Reinsurance
  • Global
  • Life insurance
​
FOLLOW US
The Actuary on LinkedIn
@TheActuaryMag on Twitter
Facebook: The Actuary Magazine
CONTACT US
The Actuary
Tel: (+44) 020 7880 6200
​

IFoA

About IFoA
Become an actuary
IFoA Events
About membership

Information

Privacy Policy
Terms & Conditions
Cookie Policy
Think Green

Get in touch

Contact us
Advertise with us
Subscribe to The Actuary Magazine
Contribute

The Actuary Jobs

Actuarial job search
Pensions jobs
General insurance jobs
Solvency II jobs

© 2023 The Actuary. The Actuary is published on behalf of the Institute and Faculty of Actuaries by Redactive Publishing Limited. All rights reserved. Reproduction of any part is not allowed without written permission.

Redactive Media Group Ltd, 71-75 Shelton Street, London WC2H 9JQ