Tim Harris and Jacob Wilcock describe how data science is helping the UK’s Department for International Development tackle global development challenges
The Department for International Development (DFID, soon to be merged with the Foreign and Commonwealth Office to form the new Foreign, Commonwealth and Development Office) leads the UK’s work to end extreme poverty. Globally, the challenges are huge: around 700 million people live in extreme poverty; hunger and malnutrition are the number one health risk worldwide; 650 million people do not have access to safe water. UK Aid supports developing countries in addressing these challenges and many others – including disease, conflict, access to education, economic growth and climate change.
DFID has a budget of around £10bn, although as this is calculated as 0.7% of the UK’s Gross National Income the exact budget figure can vary depending on the size of the economy. Some of this money supports multilateral organisations such as United Nations agencies, the World Bank and (particularly significant at the moment) Gavi, the Vaccine Alliance. Most goes on programmes that directly support developing countries – and some of these programmes support the data infrastructure, because good quality data is a fundamental building block of good decision-making. To this end, a Data Science Hub was created last year: a collaboration between DFID and the UK’s Data Science Campus (part of the Office for National Statistics [ONS]).
The Data Science Hub
Data science is what happens at the interface of statistics and computer programming. It describes the ability to access and organise the huge volumes of data that the world generates, extracting and presenting information that can be used to solve a problem or help make the right decision.
Data science is particularly useful for gaining new insights in areas where traditional statistics are lacking, hard to collect or not very timely. This often applies to the countries in which DFID works, and while DFID continues to support traditional statistics – which remain crucial – the Data Science Hub is an opportunity to demonstrate the added value of new data sources and techniques.
The Hub is a team of a dozen people, bringing together a range of skills. Their expertise includes the use of
satellite imagery, mobile phone data, natural language processing and machine learning. The team works closely with statisticians across DFID and also has staff whose key role is engaging with users – because there is no benefit in producing data that isn’t useful to anyone. As part of the wider ONS Data Science Campus team, the Hub applies the tools, methods and practices of the digital and data age to create new understanding and improve decision-making.
One of the team’s main roles is to build analytical tools. One such tool is being used to estimate cattle numbers in South Sudan, as shown in Figure 1. While this may sound rather esoteric, an understanding of the strength of agriculture is critical to the country’s economy and food security, and in an environment such as South Sudan, traditional agriculture censuses are not possible. The tool uses machine learning from satellite imagery to identify cattle enclosures so that, combined with other features, the number of cattle present can be estimated.
Another tool uses natural language processing to analyse and make sense of free text in the International Aid Transparency Initiative (IATI) database. IATI aims to bring together spending and other information on development projects by government and non-government organisations to bring greater transparency to aid spending. The tool aims to deliver more focused and relevant results than traditional key word search facilities are capable of delivering.
Alongside building analytical tools, the Hub focuses on training, both within DFID and in developing countries. For example, a mentoring scheme has been established with the National Institute of Statistics Rwanda, which is establishing its own data science campus. One outcome of this is greater automation in the production of Rwanda’s trade statistics, while future objectives are to support geospatial activities such as extracting building footprints from satellite imagery using machine learning processes, and making sampling designs for surveys more efficient by incorporating information on spatial variation. Meanwhile, in Ghana the Hub is supporting the statistics office in its work on measuring inflation, with a focus on automating key processes.
In recent months, most of the Hub’s resource has been redirected to address the coronavirus pandemic. The pandemic is compounding the data challenges faced in developing countries: data is needed now more than ever, but some traditional sources are drying up. Many censuses planned for 2020 have been postponed, but other statistics are equally affected, as household and business surveys are undermined by lockdown. Can data
science help fill the void?
Monitoring the impact of COVID-19 does not just involve understanding the number of cases and the death rate, but also the secondary impacts. The pandemic threatens to undo years of international development gains. What are the knock-on effects in terms of the economy, health, education, governance? The Hub has been finding creative ways to use data science to help address these questions.
The Hub is exploring the use of global shipping data and flight datasets to see whether they can contribute to almost real-time indicators of economic activity (Figure 2).
There has been an explosion of mathematical epidemiological modelling work related to COVID-19. The Hub’s
data scientists have been reviewing a range of these models and tools to explore and highlight those that are
most useful for decision-making in developing countries.
In Cote d’Ivoire, the Hub was already combining survey data with other sources using machine learning to produce more detailed geographical analysis of the prevalence of HIV. This project will now also explore how the location of immunocompromised populations is relevant to the spread of COVID-19.
The Hub’s mentoring role with developing country partners has also pivoted to support the response to COVID-19, for example by training staff at the Kenya National Bureau of Statistics in Python coding skills to support the analysis of population data from its new census, and working with the Namibia Statistics Agency to develop a tool to identify misinformation related to COVID-19 on Twitter.
While it is still early days for the Data Science Hub, the opportunities are certainly there. Success will come through collaboration with data science colleagues working in other development agencies, as well as across academic and research institutions in the UK and abroad. There has never been more scope for using data to
help tackle global poverty, and data science will play a key role delivering this vital work.
Jacob Wilcockis a statistician working at the Department for International Development inflation matters.
Tim Harrisworks in the Data Science Hub at the Department for International Development.