The data always tell a story, it needs to be heard correctly and without bias. Many a times the decision makers want the data to corroborate the conclusion that they have already reached. Some others tend to just discount the outcomes, as “Ah this is obvious or just common sense. We don’t need Big data to know this.” Both these attitudes will hinder the story that the Data can tell very effectively.

But the data should also be gathered and analyzed correctly. Tested with already known outcomes to validate the process / algorithms.

Failing to do so can give really outlandish conclusions.

I would like to mention some such cases in my post today.

One very old and popular one is about the beer and diapers in retail stores. The story itself looks to be a myth so everyone uses it to their own advantage.

Second one I find very relatable is about the American study about unemployment – using twitter posts with identified keywords like jobs, vacancies etc. They found an alarming rise in unemployment during a particular time, they tried to think about what must have caused the rise? Turns out that unfortunately Steve Jobs passed away in that timeframe and the analysis went for a toss.

Insurance companies are known to use the data about its customers like prescriptions they filled or cloth size that they bought to determine the premium. But it could go wrong if the data is not accurate.

So this is a cautionary tale about not using the hypotheses without thoroughly testing it large enough set of data.

