Barbara Sinkinson discusses how actuaries can keep informed about the ethical issues surrounding the use of big data
The increasing use of big data and data science techniques opens up a new world of ethical issues. It is important that actuaries using these techniques or making decisions based on their outputs consider their responsibilities.
In May, the Cabinet Office published its Data Science Ethical Framework paper (bit.ly/1sB2lEw) with the objective of giving guidance to government on conducting data science projects, balancing the confidence to innovate with respect for privacy.
While designed for government users, the framework will be useful to actuaries grappling with these issues in other contexts. It might serve as a helpful check against your own firm's guidance, or as a starting point to develop your own principles.
Moreover, the Cabinet Office guidance is not a 'done deal' but a first iteration, a work in progress, with input sought from interested parties. Actuaries' dual perspective as technicians and professionals, grounded in considering the underlying ethics, means that we are well-placed to provide feedback from our own experiences, and help shape this guidance for the better.
Of course, there are also legal restrictions around the use of data, notably the Data Protection and Intellectual Property Acts. The framework seeks to outline both the legal framework and wider ethical considerations to make it easier for those within government to undertake data science projects.
What is data science?
The Cabinet Office's open policymaking toolkit glossary (bit.ly/1k6eX1b) includes the following definition:
"Data science uses advanced software, computer power and artificial intelligence to analyse and visualise big and complex data to provide useful insight that can improve an understanding of a problem and design better policy."
The framework is set around six principles, described below:
1. Start with clear user need and public benefit
A clear understanding of aims, public benefits and risks of a project helps in efficient management and communication. Moreover, it allows you to focus on the decisions to be made as a result of the analysis, and to consider what risks and costs are justified in the pursuit of the public benefit.
The framework states that the public cannot easily distinguish between the ethics of data science and the decision or outcome arising from the analysis. It can be as important to demonstrate the benefits of the analysis as to ensure the process is robust.
2. Use data and tools that have the minimum intrusion necessary
The wealth of data available on individuals means that it is possible to ascertain more about them, and groups they belong to, than most people realise. Principle two is around using only what is strictly necessary for the purpose at hand. It reminds us of the 'minimisation principle' from data protection legislation - that is, use only the data you need to meet the project aim - and the benefit of using anonymised data where possible.
We are reminded that, even where data may be legally the same, people's expectations around it are not. Consider, for example, public data gleaned from social media. While it is in the public domain, it is still personal data and needs to be processed fairly. People may be far more relaxed about the use of data they provided in a tweet than in sensitive discussions on a social media site such as mumsnet. These considerations may be particularly relevant when buying data in, especially if the provider has used web-scraping tools.
3. Create robust data science models
As the complexity of models increases, so does the risk of inaccuracy and inappropriate use. We should not only use the appropriate tool for the job, but also remember the role of human intervention in interpreting the results. Inappropriate conclusions can easily be drawn from bias in the input data or by implying causation from correlation. Even though a tool has produced good results for a particular project, it is not necessarily appropriate for a different purpose.
It is important not only to ensure that we ask the right questions, use appropriate data, incorporate the right model features and regularly test the model, but also to ensure that we revisit these steps to ensure continued robustness. As time and circumstances change, a once robust tool can easily 'break'. The framework cites the case of an automatic pricing algorithm used by a taxi firm in Sydney. The unanticipated circumstances of a hostage crisis caused up to a fourfold price rise, resulting in the need for human intervention to adjust prices back to normal levels.
We also should consider widely the factors that may influence the data we have. Are we in danger of drawing erroneous conclusions? The framework cites as an example social media analysis of Hurricane Sandy, which wrongly suggested Manhattan was the centre of the damage, because of the concentration of people with smartphones there.
4. Be alert to public perceptions
This principle touches on similar issues to principle two, reminding us that advances in technology push our understanding of the law to its limits. But using data is not just about what is legal, it is also about what is 'right' - and this is not fixed. Public expectations on what is acceptable are continually shifting - indeed, there is a difference between people's actual and stated positions on this.
Again, we are reminded to be careful with our data source, particularly where data mining or data scraping tools are used.
While it may be possible to buy in data that has been obtained in this way, there may be restrictions on its use - for example, through exclusion protocols within the sites or in the site terms and conditions, even if the data itself is in the public domain.
5. Be as open and accountable as possible
One of the issues around the use of data is one of consent. People are more nervous about supplying or consenting to their data being used if they don't know how it will be used, and indeed will be more concerned about conclusions drawn where they feel there has been a lack of openness (perhaps feeling that they supplied information under false pretences). We are reminded, where possible, to be clear about the purpose for which data is to be used, to give people access to their own data and to be aware of unintended consequences.
6. Keep data secure
People are rightly concerned about the security of their data and much has been written about this, so in a large part this principle serves as a reminder. The Data Protection Act contains detailed rules regarding the retention and deletion of data, supported by guidance on its deletion by the Information Commissioner's Office (ICO). Government has been putting significant effort into building sets of trusted data, some of which will be publicly accessible through the canonical registers on the gov.uk website. As this set of trusted data grows, it will be worth considering whether it may provide a suitable, and possibly superior, data source, avoiding the costs and security headaches that are associated with the need to collect and maintain your own data.
As available data expands exponentially, the legal and ethical issues around its use will also mushroom. While as actuaries we are experienced users of data, some of these new tools are posing issues we have not considered before. Being aware of the perspectives of those outside the actuarial community should help us to think through the issues. The Cabinet Office framework is a very good starting place for this. Moreover, our experience, skill set and ethical training means we are well placed to help shape and improve the framework for the future.
To find out the latest on the IFoA modelling, analytics and insights from data (MAID) working party, go to bit.ly/28IMSc5
Barbara Sinkinson is an actuary at the Government Actuary's Department