Big data benefits organisations, but there are important drawbacks. Dina Gray, Pietro Micheli and Andrey Pavlov explain how relying on too much data can lead to measurement madness
The proliferation of data has exploded in recent years. A recent Harvard Business Review article stated 2.5 exabytes of data are created each day, and more is sent across the internet every second than was even stored 20 years ago. The likes of Walmart collect 2.5 petabytes (2.5 quadrillion bytes) of data every hour from their customers. To get a feel for what this means, consider that a million seconds is 11.5 days, a billion seconds is 32 years and 2.5 quadrillion seconds is 80 million years!
'Big data' is a welcome business development, promising to process larger quantities of data, from variable sources, ever more quickly.
It enables an organisation, through the use of mathematical algorithms, to find patterns and relationships that would be otherwise impossible to discover. The promised advantages are many, from reducing costs to improving the quality of products and services, to identifying new customer segments or uncovering hidden societal needs.
Big data is certainly changing the way organisations operate; our capacity for planning, budgeting and forecasting, as well as the management of processes and supply chains has radically improved. However, greater availability of data is also being accompanied by two major challenges: first, many managers are now required to develop data-oriented management systems to make sense of the phenomenal amount of data their firms and partners are producing. Second, while the volume of data we now have access to is certainly seductive and potentially very useful, it can also be overwhelming. Therefore we need to ask whether or not we are getting better at gaining insight from big data or whether we are simply measuring more?
It is not only the scale of big data that is fascinating top executives; indeed, any performance indicator, financial or non-financial, forward-looking or lagging, is gradually taking over much of organisational life. Data has a strange tendency to make managers believe their decisions are based on science, and therefore measurement becomes addictive. This property of measurement is at the root of many companies' woes and one of the chief culprits of measurement madness. Tim Ambler, recently retired senior fellow at London Business School, writes: "Important as financial metrics are, they distort reality and provide the illusion of control. Cannabis does much the same thing."
Collecting large volumes of data to provide insight is laudable, but the datasets have to be relevant to today's business environment.
One anecdote is of an organisation that was, until recently, one of the largest manufacturing companies in the UK. At one of their production sites was a hut near a river, which housed a piece of machinery used to measure the height of the water in the river. For decades the data had been regularly collected, recorded, and dutifully reported up the chain of command. A newly appointed manager queried why this data was being collected, as it did not appear to have any obvious relevance for the manufacturing operations at the site. The ensuing investigation revealed that the measure had been introduced during World War II, when a bomb, which exploded in the river, had temporarily raised the height of the water to a threatening level. Decades had passed and the water had receded, yet the measure remained present and was still unquestioningly tracked and reported.
So the problem is not only the introduction of an excessive number of measures collected but also the fact that measures tend to stick, unless questioned and revised. Priorities change, new drivers of performance emerge and different operating models are employed.
It would therefore make sense that the big datasets are also revised to reflect these changes as the resulting complexity makes decision making harder rather than easier and reduces the overall relevance of the measurement system itself.
Another question that needs to be asked is whether or not the data in our big datasets have been collected on exactly the same basis. On first coming to power, the current government made a bold pledge to reduce the number of heart-related deaths in the UK, as a benchmarking study had shown the British were woefully behind other European countries in this area, especially France. On face value this appears to be a fair comparison. After all, countries in Europe have similar demographics, similar standards of living, and, more often than not, similar standards of social health care.
Why then, in comparison, are heart-related deaths so high in the UK? After all that wonderful French cheese, you would expect this statistic in France to be comparable. Although many observers have labelled this phenomenon the 'French paradox', on closer inspection it appears it can be attributed to the way the data is recorded. In the UK, when someone dies and the doctor is unsure of the underlying reasons, the physician will officially record heart failure as cause of death. However, in France, there is a category for unknown cause of death, and only when the doctor is sure do they record the death as heart failure. Therefore, making important decisions about the allocation of valuable public funds based on flawed comparative data is not wise.
Precise definitions of the underlying data are crucial for any form of comparison. However, it is also as important to understand the methodology used to interrogate the datasets. For example, there are numerous league tables, published in national papers, which attempt to compare the performance of universities within individual countries and across the world. However, the results of the published league tables can vary widely, even though they are based on the same data.
The more reputable compilers do attempt to address this problem, carrying out robust comparisons by taking into consideration the different sizes of institutions, the demographics of their intakes, even the quality of the beer in the students' union bar, and adjusting the score so as to make it comparable. More often than not, however, only the final figure is reported, keeping the process of compiling the league table far away from the reader's reach.
No amount of number crunching will eliminate the possibility that we are only being presented with a selection of data that delivers a particular message. Over the past decade, governments around the world, along with the scientific and popular press, have paid increasing attention to global warming.
The argument that the average temperature of our planet's atmosphere and oceans is steadily rising has polarised expert opinion, even though the hard data is compelling. Sceptics, however, were fuelled by the actions of the Climatic Research Unit (CRU) of the University of East Anglia, which plays a leading role in compiling UN reports and tracks long-term changes in temperature, when the CRU refused to publish the underlying data. It was of no surprise when, in November 2009, we awoke to reports that hackers had broken into the CRU database to show the data could be interpreted differently. The infiltrators were keen to expose that eminent scientists had discussed, via email, the potential biases of the outcomes. However, it wasn't the climate scientists manipulating the data to support their messages; it was the hackers who were selectively releasing the data owned by the CRU.
By filtering the emails and selecting the ones that supported their viewpoint, they were able to build a credible case against the CRU, on which questions were asked in the highest political offices. This selective disclosure gave the impression of impropriety because the data had not been publicly available. Investigations into these claims examined email exchanges
to determine whether there was evidence of suppression or manipulation of data by the researchers in the CRU. Eventually, the scientists were vindicated, but the hackers were not.
Despite performance measurement advances, much of the madness we encounter in firms is due to the human behaviour it engenders. Unfortunately, the claim of minimising the 'human element' by providing objective insight through hard data and sophisticated analytical tools is largely unfounded. Far from eliminating behavioural issues, big data amplifies them.
No amount of processing can provide a substitute for clarity of thinking and a concern about the impact of measurement on people. Also, analysing larger volumes of data may mean including unreliable external datasets, potentially disconnecting data from the processes and people they are supposed to support, and basing decision-making processes on data that may be scarcely comprehensible, but is nevertheless treated as authoritative.
The ability to analyse and examine so many data points may mask what is important, while at the same time giving us the impression that we can manage resources and processes in a purely mechanical way. When analytics are treated as a way to avoid the hard work of managing people and organisations, bigger data may well lead to ever greater madness!
Dina Gray, Pietro Micheli and Andrey Pavlov are the authors of Measurement Madness: Recognizing and Avoiding the Pitfalls of Performance Measurement
Dr Dina Gray is a strategic business consultant, lecturing on Cranfield University's executive education programmes. Dr Pietro Micheli is associate professor of organisational performance at Warwick Business School. Dr Andrey Pavlov is a lecturer in business performance management and a director at Cranfield School of Management.