Should I even care ?!
If you’re reading this article, then there does exist a high probability that, you’re one among those who have heard a lot the trending Buzz Words “Data Science” & “BigData”.
In fact those areas are expanding at a crazy speed, and according to some studies, the demand for Data Scientists will continue to grow, as in the following infogram from a report for SAS, on Data Analytics adoption and trends between 2012-2017, on UK market, they forecasted an increase of 243% in demand.
From SAS report, on Data Analytics adoption and trends between 2012-2017
And the following chart from a wikibon’s article shows the trend of revenue for BigData Continue reading
In this article I’ll highlight the history of BigData since Google’s MapReduce till current trends and tools.
It’s debatable what BigData means, or where the boundaries lies, there is no standard way to define it, but generally the following diagram is quite popular :
other figures will use only 3 Fronts (Variety, Volume and Velocity) , generally the more far from the center, the more close you’re to what is treated and called now “BigData”.
So, today I’ve experienced a nice experience with Hadoop’s Mapper output compression, where I had the output of the mapper as structured data (to simplify later-on calculations), but to my surprise, I’ve found that the data shuffled, way too much (about 3x) the original data size, although I’ve enabled map output compression, then I decided to try to encode the Mapper output value in text object, and to my surprise, I got about 100x improvement regarding the size of the shuffled data (because my data was easy to compress in textual format, as the entries was similar to far extent).
So, the reason here was that the custom object, I’ve created at first was serialised to binary format, which make us lose the advantage of the similar nature of the data, and didn’t compress well.
So, next time you decide to use custom object as Mapper output, and marshal it, think twice about your data nature, and experiment with encoding it in Textual format instead of custom objects.