Recently, I wrote a report for GigaOm Pro (behind Paywall) commemorating their Structure:Big Data conference and introduced a term which is going to hurt many organizations in a big way in the near future. I thought I will write about it here and get the thoughts of practitioners and vendors on the problem. In the world of punditry where every one tries to come up with a term and claim credit later, I thought I will also throw one up so that I can then make it to I told ya Hall of Shame, oops, Fame. The term I defined was Data Obesity
Data obesity can be defined as the indiscriminate accumulation of data without a proper strategy to shed the unwanted or undesirable data
As organizations accumulate data from many different sources in the hopes of gaining competitive advantage with the big data, there is a risk associated with the process. Though newer analytics tools help organizations get insights from any data (something which was not available in the past), there will come a point where indiscriminate accumulation of data becomes a headache for these organizations. Yes, data storage is getting cheap and gives organizations an opportunity to accumulate data at a scale which they could never think about in the past. However, they also face the following problems
- Even though data storage is cheap, processing the data, smart analytics tools, moving data from one place to another all cost large amounts of money. Indiscriminate accumulation of data means higher costs for organizations. It is not always clear that the ROI for indiscriminate accumulation of data is always worth.
- Data governance will become a headache with such indiscriminate accumulation of data. It will not only give regulatory headache, it will also increase the cost of maintaining the data in a big way
Do we have any solutions to this Data Obesity problem now? Nothing I know of. In fact, this blog post is an attempt to listen from the practitioners and vendors on potential solutions. I am sure the solution is going to be a mixture of tools and policies. One of the reasons I am talked about this problem in that report and, also, in this blog post is to get everybody involved start talking about the topic. Organizations looking at big data for their future should keep this in mind and develop policies accordingly and vendors wanting to differentiate themselves from the crowd should try to find smart solutions to address this problem. Left untouched, the dangers of Data Obesity could impact organizations in a big way. I would love to hear the community feedback.