Big Data and analytics lessons from the Pandemic

By Richard Self - 8 February 2021

We have learned many lessons during the last ten months of Covid-19 about Big Data, analytics and storytelling. We were warned of catastrophic consequences of the growth of Covid-19 in early March, and yet there were delays in lockdown for two weeks because the models did not indicate urgency.

Over the years, we have seen Formula 1 teams puzzling over lack of performance of their cars in their race day results, in spite of all the computational models used to design their cars.

Some years ago, Netflix and Amazon launched pilot episodes for a new USA White House drama; one succeeded (House of Cards) while the other (Alpha House) has almost disappeared from public memory.

It turns out that there are common themes that will be explored in this, and future, blogs.

They have importance to us all, whether in government, business, sport or academia. This short article will cover two of the most important common themes here:

Trust in Algorithms, Data and Models
Complexity, Context and Consequences

Trust in Algorithms, Data and Models

Business and management schools have been encouraging business leaders to follow the data in their decision making to the point that now we have so much data and analytical capabilities we no longer should use experience and intuition.

In fact, they are encouraged to replace humans with big data analytics and artificial intelligence (AI) and machine learning (ML)., As a consequence, it appears that business should now be data analysis dependent; in effect, "managed by the data".

If we go back to the early 2000s, belief and trust in computer algorithms associated with mortgage approvals and the creation of the packages of mortgages which were supposed to dilute the risks of default, developed into the financial crash of 2007 to 2009.

At that time, there were several warnings of the problems that this irrational belief in algorithms was going to result in. And yet some of those who used data analysis, associated with experience and intuition, were able to make enormous fortunes.

The House of Cards/Alpha House saga demonstrated very clearly that data analysis can be valuable when used to inform human decision-making, but is not reliable if the machine analysis supplants human intuition and experience.

As data scientist Sebastian Wernicke explained in his 2015 TEDx talk, Netflix used the analysis of the viewer data from the pilot to aid their decisions, whereas Amazon relied totally on the data analysis.

How many of us remember the mantra during the 1970s and 1980s that "humans make the decisions, computers only advise"?

Covid-19 has demonstrated the limitations of this approach in situations where the data and knowledge (science) is incomplete, the models are inaccurate and do not accurately represent reality, and the decision makers have what could be called tunnel-vision and only consider what their models tell them.

We saw this in the very early days in the UK in March 2020, when the models suggested a case doubling time of six days, but the real data from the testing processes showed very clearly that the doubling time was actually three days.

It is not known why the modellers did not take into account this real-world data. The impact was a delay in the creation of the first lockdown in late March 2020.

Part of the Covid-19 challenge was that the model being used was an old model created over a period of decades for a totally different context and is, apparently, some of the worst coding - the process of writing computer programmes or activity - ever seen (it was a 15,000 line programme in a single block of code), according to a Google software expert, Mike Hearn,originally writing under a pseudonym to provide a code review and then a second analysis of the model. In addition, it seems that the code has never been put through any form of quality assurance process or verification and validation of the code.

Complexity, Context and Consequences

Models of varying degrees of complexity lie at the heart of all that humans do and think.

A model is a simplified representation of reality containing those elements that the model builder considers relevant for the purpose at hand. In this respect, every thought humans have is related to the unique model that we each have in our heads of how things work, whether in terms of science, physics and maths, or in the fields of human behaviour such as sociology, business and marketing.

With the growth in the power of computers and the codes that have been created to model human and physical systems, many decision makers have developed an irrational degree of trust in the results and decisions from these data driven systems, often with adverse consequences, and sometimes with very costly adverse consequences.

However, we need to remember that "the map is not the territory" (Count Alfred Korzybski); the map and models are simplifications of the world around us and different people choose different factors to include in their models.

We also need to remember the counterpoint in Occam's Razor that "entities should not be multiplied without necessity"; that simplicity is preferred to complexity and that there is clearly a balance between completeness, complexity and simplicity to be found.

The context of the model or analysis is often critical in understanding the results and recommendations. It is often the case that a model that works in one situation will not work effectively in a different situation. Therefore we need to be able to identify when our models become inappropriate, preferably before they are used in such circumstances.

As an example, a predictive analytics model for insurance policy acceptance for a company with a niche product, with 5% market share, will be unlikely to be valid as the basis for a growth from niche to general market towards 50% market share, because of the changing demographics and financial behaviour of the wider market.

One of the most important consequences is that we need to test our models for validity and veracity in the intended context, especially when existing models are being extended into new scenarios.

It must be noted that this will also include consideration of the data and models used for completeness, representativeness and bias.

However, the most important consequence is that humans make decisions not machine algorithms.

For further information contact the press office at pressoffice@derby.ac.uk.