The Fallacy of Big Data

Another day, another fad disrupts IT and gets billions thrown at it. This one is #BigData – closely related to Internet of Things, which we will deconstruct in a later article. In this article we focus on BigData and why it is most popular amongst those with the lowest skills.

einstein field equation
What you see above is one of the most famous equations ever written, The Einstein Field Equation of General Relativity. There are in fact ten equations of General Relativity but the one shown above is the most common expression.

It accurately describes the large-scale structure of the Universe over forty orders of magnitude which is just about as big as data could possibly get. The equation was derived based upon little more than first principles and a great deal of thought. Einstein did not have to measure the positions of particles to arrive at this equation, yet it accurately tells us about them and allows us to predict their behaviour everywhere outside of Black Holes and the Big Bang over the last 13 billion years and far into the future.

With just an equation.

What BigData tries to do is the opposite: “We’ll measure everything and then we’ll know all the answers without having to even think about it“.

We’d like to ask them if they think data is exempt from Heisenberg’s Uncertainty Principle, then assure them that it most certainly is not. We might also introduce them to the subject of data pollution, but we’ll leave that for another day when we write our Privacy Activist’s Guide to Polluting Data.

Thinking Out of Fashion

How is it that mathematicians and scientists, working with little more than slide-rules, were able to formulate equations that so accurately describe both human behaviour and the nature of the world we live in? Because they thought about it.

What BigData is trying to tell us is that one no longer has to think. All we need to do is collect the data, query it, and then we will know everything. Such is the philosophy behind the fad and why it has such great appeal to the low-skilled for whom thinking is not their forte.

There’s a wee problem with this approach, that UK Author Douglas Adams was fully aware of half a century ago: What questions do we ask?

Enter the Non-Existant Data Scientists

Hence, BigData finds itself in the situation where they have all the data, and now just need to hire some data scientists who know how to query it. If only such data scientists existed.

The firms who want to use BigData could train them theirselves, but they are not going to because that would cost them money. And to be fair, they aren’t going to get any return on it.

Nevertheless, they will rely on public funds for the training of the staff that they and they alone require, oblivious to the myriad pressing problems the public already face and an almost certain resistance to spending millions of taxpayer’s money training people to tell us what a reasonably competent statistician could tell us anyway – without even needing to query the data.

But BigData is a fad. It’s aimed at the largely clueless fachidioten and charlatans who get hired or promoted because of who they know instead of what they know. Or those who are hired because they are cheap.


There are people working with vast datasets but these are typically in areas where there is no other choice, such as at Cern. But this work is intended to provide us with equations that will answer future questions without need for experimentation or trillion-tuple databases.

And that is what business needs to do. Find accurate methods to predict future events scientifically and tailored to their own needs, rather than blundering their way through vast sets of data that effectively rape the individual’s right to privacy when the majority of such data is useless, but nevertheless requires huge investment in hardware.

BigData is a brute-force approach to analytics that has become necessary due to the business world’s refusal to cover the costs of training and their replacement of skilled-staff with underpaid, underskilled enthusiasts.