Skip to main content
The Actuary: The magazine of the Institute and Faculty of Actuaries - return to the homepage Logo of The Actuary website
  • Search
  • Visit The Actuary Magazine on Facebook
  • Visit The Actuary Magazine on LinkedIn
  • Visit @TheActuaryMag on Twitter
Visit the website of the Institute and Faculty of Actuaries Logo of the Institute and Faculty of Actuaries

Main navigation

  • News
  • Features
    • General Features
    • Interviews
    • Students
    • Opinion
  • Topics
  • Knowledge
    • Business Skills
    • Careers
    • Events
    • Predictions by The Actuary
    • Whitepapers
    • Webinars
    • Podcasts
  • Jobs
  • IFoA
    • CEO Comment
    • IFoA News
    • People & Social News
    • President Comment
  • Archive
Quick links:
  • Home
  • The Actuary Issues
  • November 2021
General Features

Caught on camera: how data can help preserve biodiversity

Open-access content Wednesday 3rd November 2021 — updated 10.51am, Thursday 4th November 2021

Olga Mierzwa-Sulima, Robin Whytock and Jędrzej Świeżewski share their experience of building a machine learning algorithm that helps track biodiversity in Gabon’s tropical forests

web_p40-41_Caught-on-camera_CREDIT_shutterstock_1231599400.jpg

Climate change is increasingly affecting the distribution and composition of ecosystems. This has profound implications for global biodiversity and the prosperity of at-risk communities.

Although technology has accelerated climate and environmental degradation, it can also be used to mitigate impacts and correct our current trajectory.

A ‘big data’ revolution is underway in ecology, including the use of satellite imagery, GPS tags and other sensor arrays. The data generated has potential to support and streamline conservation efforts. However, with big data come big challenges; from collection and storage to validation and interpretation, data handling poses a daunting task for researchers and stakeholders alike.

If we can resolve these issues, we will open the door to ecological ‘forecasting’ and automated pipelines for ecosystem monitoring and response frameworks. We will have the opportunity to streamline biodiversity conservation efforts and tackle large-scale challenges such as wildlife tracking, deforestation and greenhouse gas emissions.

Data scientists can be a part of the solution by using exploratory machine learning (ML) approaches to address climate change and its impacts on ecosystems. Appsilon’s Data for Good initiative aims to help the scientific community, NGOs and non-profits by developing data analytics solutions and ML pipelines to provide actionable and reproducible insights that can help combat climate change and support environmental protection projects.

In one such case, it partnered with researchers at the National Parks Agency of Gabon and the University of Stirling to assist with biodiversity conservation efforts in Gabon. Gabon’s tropical forests in central Africa are home to 80% of the world’s critically endangered forest elephants, among other endangered species. Using computer vision, Data for Good provided artificial intelligence (AI) assisted biodiversity monitoring via an easy-to-use, open-source software tool called Mbaza AI. This automatically detects and classifies wildlife species in images captured by researchers using automated ‘camera traps’.

The challenge

Gabon’s National Parks Agency uses hundreds of camera traps to survey reclusive mammalian and avian species in the central African forests. The camera trap arrays are typically spread over large areas and generate hundreds of thousands of images, which require manual inspection and interpretation. The resulting delay impedes conservation and reaction times to ecological problems. If the agency can identify species quickly and accurately, it can mount appropriate responses to time-sensitive projects, including land and conservation management and anti-poaching efforts.

ML algorithms can improve data processing, but the models are often not accurate enough to be relied upon for full automation. They serve as a ‘first pass’ check that requires an extra validation step, either partially or in full. A new approach was needed to test an ML model for automated labelling using computer vision.

In our case, the model’s precision and accuracy were measured in the context of ecological modelling by comparing species richness, activity patterns and occupancy from ML labels to expert manual labels. By evaluating predictive performance in a domain-specific context such as ecological modelling, we showed that ML labelling can be used in fully automated pipelines (bit.ly/ML_CameraTraps).

The application needed to be standalone and available offline. Add a multi-platform, multi-language user interface that doesn’t require familiarity with programming, and such a tool would open access to projects without geographic or skillset constraints.

n renamed to web_p40-41_Caught-on-camera_YoungAfricanChimp_bezdane_CREDIT_ANPN_Panthera_0.jpg.
An adolescent chimpanzee explores a camera trap set in the Central African forest of Gabon

 

ANPN_Panthera.jpg
African forest elephant caught on camera in Gabon

8h
Elusive African golden cat photographed in the Central African tropical forest

 

Training data

To achieve a highly accurate model for classifying forest animals, we used a sizeable training dataset (n = 347,120) curated from a raw collection of more than 1.5m images. The dataset contained samples from multiple countries, with each source using different camera trap models and field protocols. The resulting variations in resolution, quality and error type produced a challenging, but effective, training dataset for creating an ML model that could generalise to other sites.

The images used to train the model were ‘real-life’ camera trap data. This unprocessed dataset required an iterative approach to allow for better handling of errors, from hardware faults to human labelling errors. The iterative process consisted of model training, validation, error correction and subsequent model updating, which allowed us to accurately assess the model’s performance. The process proved beneficial, particularly in under-exposed and seemingly blank images that contained animals which were undetectable to the human eye but were labelled by the model with high confidence.

An established architecture was selected for the ML model: ResNet50. To speed up training, we used transfer learning, an

ML technique that repurposes learning tasks to generalise (or transfer) knowledge to another setting. Most of the approaches and mechanisms used to augment the training were taken from fast.ai, an easy-to-use and robust open-source Python library. We trained the models on various virtual machines run on the Google Cloud Platform; this was made possible by a Google Cloud Education grant.

Model performance and thresholding

To be applicable for broader use cases, the model needed to perform well when generalising to new ‘out-of-sample’ data. For fully automated pipelines, we needed to ensure it learned the features of the animals in the study, rather than focusing on features of the camera sites (the backgrounds). It should be noted that valid identification derived from ML labelling requires all concerned species to be included in the training dataset.

Four focal species – the African golden cat, the chimpanzee, the leopard and the African forest elephant – were selected, as they are conservation priority species. Three ecological metrics were used in the study:

  • Species richness, for quantifying the species count both temporally and spatially

  • Activity patterns, for determining activity and life behaviour traits – for example, nocturnal and crepuscular animals

  • Occupancy – a hierarchical model that can account for imperfect detection.

With these metrics in place, the model could be evaluated for accuracy and precision in an ecological context.

We found that thresholding improved model performance for the ecological metrics in the study. Thresholding means applying a cutoff value to a binary categorisation of regression analyses. The model had a top-five accuracy of 95% – that is, in 95% of cases, the ‘actual’ expert labels were among the top five ML-predicted labels. However, no matter the threshold selected, predictions for out-of-sample data remained at around 95%. Overall, we recommended that users applied a threshold of 70% for general monitoring in central African forests. However, different thresholds can have impacts on inference, and thresholds should be adjusted when targeting species. For example, the model’s elephant occupancy estimates improved significantly with an increase in the threshold limit.

Meeting real needs

The outcome of the project was Mbaza AI (github.com/Appsilon/mbaza): an ML algorithm for classifying camera trap images offline and with 90% accuracy for predictions on out-of-sample data. The tool is free to use and can rapidly process data with output accuracy and precision levels that are high enough for ecological analyses. It has decreased the time needed to analyse thousands of images from two-to-three weeks to one day. Depending on the hardware used, the model can classify roughly 4,000 images per hour and operate 24/7.

Mbaza AI is just one example of how data science and ML can be used in environmental mitigation. With expert guidance, data can be leveraged with interactive data visualisations, applications and AI for climate change solutions. Automated workflows, open-source software and practical problem-solving, keeping stakeholders in mind, help make large-scale environmental efforts achievable.

When we understand the needs of researchers and practitioners, data scientists and AI developers can address the limitations of current technologies and offer applicable solutions. The field of data science and ML has made impressive advances in the past decade; we must continue to demonstrate that the available technology can play a meaningful role in the preservation of the planet.

Olga Mierzwa-Sulima is an engineering manager at Appsilon and leads Data for Good.

Dr Robin Whytock is a scientist, ecologist and conservationist interested in forest biodiversity

Dr Jędrzej Świeżewski is machine learning lead at Appsilon

Image Credit | Shutterstock | ANPN

Linked ACT Nov21_Full LR.jpg
This article appeared in our November 2021 issue of The Actuary.
Click here to view this issue
Also filed in:
General Features
Topics:
Data Science
Environment
Modelling/software

You might also like...

Share
  • Twitter
  • Facebook
  • Linked in
  • Mail
  • Print

Latest Jobs

Senior Underwriting Risk Manager

London (Central)
£85K-£95K + Benefits
Reference
124386

Reserving Manager (Contract)

London (Central)
£1200 - £1400 per day
Reference
124385

Life Actuary - Contract - IFRS 17 Financial Impact

England, London / England, Bristol / North Yorkshire, England
£900 - £1150 per day
Reference
124384
See all jobs »
 
 

Today's top reads

 
 

Sign up to our newsletter

News, jobs and updates

Sign up

Subscribe to The Actuary

Receive the print edition straight to your door

Subscribe
Spread-iPad-slantB-june.png

Topics

  • Data Science
  • Investment
  • Risk & ERM
  • Pensions
  • Environment
  • Soft skills
  • General Insurance
  • Regulation Standards
  • Health care
  • Technology
  • Reinsurance
  • Global
  • Life insurance
​
FOLLOW US
The Actuary on LinkedIn
@TheActuaryMag on Twitter
Facebook: The Actuary Magazine
CONTACT US
The Actuary
Tel: (+44) 020 7880 6200
​

IFoA

About IFoA
Become an actuary
IFoA Events
About membership

Information

Privacy Policy
Terms & Conditions
Cookie Policy
Think Green

Get in touch

Contact us
Advertise with us
Subscribe to The Actuary Magazine
Contribute

The Actuary Jobs

Actuarial job search
Pensions jobs
General insurance jobs
Solvency II jobs

© 2022 The Actuary. The Actuary is published on behalf of the Institute and Faculty of Actuaries by Redactive Publishing Limited. All rights reserved. Reproduction of any part is not allowed without written permission.

Redactive Media Group Ltd, 71-75 Shelton Street, London WC2H 9JQ