While not yet having reached the cringe-factor of “cloud”, the term “big data” is certainly becoming much discussed, and, as I’ve done many times with it’s buzzword stable-mate, I find myself asking, “What exactly is it?”
Thankfully, NIST have yet to offer an insipid and face-numbingly trite definition (hooray) but unfortunately, the leading definition of big data is hardly one that will set Enterprise IT minds racing and many CxO’s reaching for their wallets.
In fact, I only have to turn to Wikipedia for my suspicions to be bolstered that we are indeed seeing yet another business-context-less overture that is, unsurprisingly, helping us collectively miss the point.
In information technology, big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools.
The Technology Purist in me longs to agree entirely with this, but the Business Practitioner in me wants to challenge the positioning and offer up a rather more palatable opinion. I’ve done just that routinely over the last few years, from cloud to mobility; so let’s take a swing at big data, chunk by chunk.
In organizations like mine, and indeed in many other industries, including manufacturing, automotive, pharmaceutical, aerospace – in fact, pretty much anything with a large supply chain – there are two distinct types of data.
Data that is used during the creation of the final product, which at various stages is turned into information, and usually accompanies that final product into its operational lifecycle.
Data that is used during the creation of the final product, which at various stages is turned into information, but once it has been used, it becomes a by-product of the overall creation process.
In organizations with multiple entities or business lines, the data landscape differs significantly based upon a number of factors, usually as a result of the combination of the business itself plus the people, process and technology that support it. However, it has been my recent observation that the hypothetical 3V’s of big data (volume, variety, velocity) is indeed a valid construct for helping map even the most complex data landscape, especially when suffixed with my additional 2V’s (value, variability) that help round out the north and south end of the initial big data trail.
The diagram below is intended to help visualize the representation of The 5V’s of the Data Landscape.
In the context of The 5V’s of the Data Landscape, the definitions of each V may be subtly different than the purists would have us believe, so, for the avoidance of doubt, let’s be sure we are all on the same page.
Volume – The size of the overall data created by the line of business in the course of normal business operations.
Variety – The number of individual systems and / or distinct data types utilized by the line of business in the course of normal business operations.
Velocity – The combined rate and which data flows in and out (internal and external vectors) of the line of business during the course of normal business operations.
Variability – The level to which source data can be unintentionally modified throughout its lifecycle, where non-immutable data affects information quality.
Value – The derived value from the overall data created by the line of business, considering both data types (from earlier definition).
It is the V for Value that I am particularly interested in, especially in the context of applying a real business case for big data, focusing very clearly on the data type that is considered the by-product or waste data. My suspicion is that within the waste, the global exabytes of by-product data, there is a significant amount of rich information waiting to be turned into applied knowledge, helping businesses of all kinds be more efficient and effective in their respective arenas.
Can big data methods and visualizations be applied to bring new levels of understanding and potentially even new revenue opportunities by driving up the Value of “crap” – well, why not – “where there’s muck, there’s brass” as the phrase goes.
(Note: You may have to Google the phrase if you’re not from England).
Over the last few months, we’ve begun to see the emergence of what I’ve called “The Face of Big Data” with companies like Edgespring, Datameer and SensePlatform (among others) who have grasped the concepts of how to bring big data to the masses, through intuitive UIs, and without the need to understand the inherently complex underbelly of Hadoop-style environments.
My view is that this is where the big data war is going to be won. If the “Average Joe” in a line of business can use tools (as easily as he or she can use a search engine or a spreadsheet application) to effortlessly ingest and correlate data and ultimately bring never-before-seen business dimensions, then we are really onto something truly remarkable.
Regular readers will know that no Loose Couple blog entry would be complete without an analogy, so let’s wrap this up with a simple example.
Natural gas that comes from oil wells is typically termed ‘associated gas’. This gas can exist separate from oil in the formation (free gas) or it can be dissolved in the crude oil. For many, many years, oil refineries and other downstream facilities simply used flare stacks to burn off this unwanted by-product of crude oil production.
A few short years ago, the clever folks at GE Jenbacher realized that there was an opportunity to re-use the flare gas and they set about creating a flare gas capture solution that was first installed in Italy in 1998.
Today more than 330 units, with a total electrical output of more than 450MW run on associated petroleum gas worldwide. These plants generate about 3.6 million MWh of electricity a year – enough to supply about 1.2 million European homes. Generating this amount of electrical power with flare gas allows for savings of approximately 900 million liters of diesel fuel per year.
Ain’t that a happy ending?