• Home
  • Blog
  • About
  • Contact
CloudAve
Software in Business. The Business of Software.
  • Business
    • Analysis
    • Entrepreneurship
    • Marketing
    • Strategy
    • Small business
  • Technology
    • Application Software
    • Infrastructure
    • Open Source
    • Mobile
    • Platforms
    • Product reviews
    • Security
  • Misc
    • Design
    • Just for fun
    • Trends & Concepts
  • Sponsors
Browse: Home / Thinking about Data Gravity

Thinking about Data Gravity

By Paul Miller on July 2, 2012

Dave McCrory introduced his idea of Data Gravity with a blog post back in 2010. The core idea was — and is — interesting, and got some traction from sites like ReadWriteWeb, ZDNet and GigaOM. More recently, Data Gravity featured in this year’s EMC World keynote.  But beyond the observation that large or valuable agglomerations of data exert a pull that tends to see them grow in size or value, what is a recognition of Data Gravity actually good for? Stefano Bertolo perhaps summed the question up best, suggesting;

@PaulMiller @adrianco @gigastacey i also think data gravity is an important idea similar in structure to Merton’s Matthew Principle

— stefano bertolo (@sclopit) June 21, 2012

(And for those who don’t know, that’s erudite Italian for the rich getting richer as the poor get poorer.)

I caught up with Dave over skype last week, just before he launched DataGravity.org and proposed a formula for getting some numbers into the discussion.

As a concept, Data Gravity seems pretty closely associated with current enthusiasm for Big Data. And, like Big Data, the term’s real-world connotations can be unhelpful almost as often as they are helpful. Big Data is generally accepted to exhibit at least three characteristics, which are Volume, Velocity and Variety. Various other V’s, including Value, also get mentioned from time to time, but with less consistency. And yet, Big Data’s name says it’s all about size. Size (volume) matters. The speed with which data must be ingested, processed or excreted is less important. The complexity and diversity of the data doesn’t matter either. And that’s nonsense, of course. On its own, the size of a data set is neither here nor there. Coping with lots of data certainly raises some not-insignificant technical challenges, but the community is actually doing a pretty good job of coming up with technically impressive solutions. The interesting aspect of a huge data set isn’t its size, but the very different modes of working that become possible when you begin to unpick the complex interrelationships between data elements. Sometimes, Big Data is the vehicle by which enough data is gathered together about enough aspects of enough things from enough places for those interrelationships to become observable against the background noise. Other times, Big Data is the background noise, and any hope of insight is drowned beneath the unending stream of petabytes.

To a degree, Data Gravity’s name falls into the same trap. More gravity must be good, right? And more mass leads to more gravity. And mass must be connected to volume, in some vague way that was explained when I was 11, and which involves STP. Therefore, bigger data sets have more gravity. Which means that bigger data sets are better data sets. QED. That assertion is clearly nonsense, but luckily it’s not actually what McCrory is suggesting. His arguments are more nuanced than that, and potentially far more useful.

Instinctively, I like that the equation attempts to move attention away from ‘the application’ toward the pools of data that support many, many applications at once. The data is where the potential lies. Applications are merely the means to unlock that potential in various ways. So maybe notions of Potential Energy from elsewhere in Physics need to figure here?

But I’m wary of the emphasis given to real numbers that are simply the underlying technology’s vital statistics; network latency, bandwidth, request sizes, numbers of requests, and the rest. I realise that these are the measurable things that we have, but feel that more abstract notions of value (and even, perhaps, tangible economics) need to figure just as prominently. As Sam Johnston commented this afternoon,

Overall I think “value” is more relevant to density than compression @mccrory @lmacvittie

— Sam Johnston (@samj) July 2, 2012

It’s much less clear to me, though, how we could set about assigning numbers to ‘value’ in a way that doesn’t simply end up with every single data provider (miraculously, of course) finding their own little pot of pointless trivia to have the biggest gravitational pull of any resource on the web. That really won’t help us. Numbers of requests may give one measure of one aspect of value, but it’s not the whole story either.

And so I’m left reaffirming my original impression that Data Gravity is “interesting”. It’s also intriguing, and I keep feeling that it should be insightful. I’m just not — yet — sure exactly how. Is a resource with a Data Gravity of 6 twice as good as a resource with a Data Gravity of 3? Does a data set with a Data Gravity of 15 require three times as much investment/infrastructure/love as a data set scoring a humble 5? It’s unlikely to be that simple, but I do look forward to seeing what happens as Dave begins to work with the parts of our industry that can lend empirical credibility to his initial dabblings in mathematics.

If real numbers show the equations to stand up, all we then need to do is work out what the numbers mean. Should an awareness of Data Gravity change our behaviour, should it validate what gut feel led us to do already, or is it just another ‘interesting’ and ultimately self-evident number that doesn’t take us anywhere?

I don’t know, but I look forward to finding out.

Related articles
  • Dispelling the “big data” myth (zdnet.com)
  • Data as a Gravity Well (brandenwilliams.com)
  • Dave McCrory Unveils Initial Formula for Principle of Data Gravity (infoq.com)

Share:

  • Twitter
  • Facebook
  • LinkedIn
  • Google +1
  • StumbleUpon

(Cross-posted @ The Cloud of Data)

Posted in Featured Posts, Trends & Concepts | Tagged big data, cloud computing, data gravity, data markets, data physics, datagravity, dave mccrory, Linked Data, open data

Paul Miller

« Previous Next »
feed mail facebook twitter linkedin

Sponsor Posts

5 Voicemail Tactics That Will Get You More Callbacks
5 Voicemail Tactics That Will Get You More Callbacks
HR Tech Vendors: Who’s Out There?
HR Tech Vendors: Who’s Out There?
5 Reasons Why Sales Reps Should Care About Content Marketing
5 Reasons Why Sales Reps Should Care About Content Marketing
The Next Revolution for Finance -- Embedded Analytics
The Next Revolution for Finance -- Embedded Analytics
  • Tags
  • Calendar
  • Comments

accy2 amazon android Apple aws briefs cloud cloud computing collaboration conferences Enterprise enterprise 2.0 Entrepreneurship facebook google humor iaas IBM innovation insights integration ipad iphone marketing microsoft netsuite open source openstack paas platform services saas salesforce.com sap Security Social Business social media software as a service Startup Advice startups Tech Market Analysis twitter vc funding venture capital vmware xero

May 2013
M T W T F S S
« Apr    
 12345
6789101112
13141516171819
20212223242526
2728293031  
  • jarretpazahanick: Hi Edita A simple google...
  • Edita Laurel: Hi Jarret, Just curious how you...
  • Frank: In the ninties when I transitioned from...
  • Jarret Pazahanick: Thanks for the comment...
  • Hiks: Thanks Jarret. It’s really a very...
  • Vijay: Good Article… I have been working...
  • jarretpazahanick: Thanks Joost for the kind...
  • joost van assen: That is very good information...
  • jarretpazahanick: Volker – Here is a...
  • Chal: Hi Jarret, Could you please advise on how...
  • Volker Kuecherer: Do you have any information...
  • Experiencia Cloud (BETA): What Makes Cloud...
  • Abhishek: I see nothing wrong with rewarding...
  • CloudAve: always insightful Mark Suster...
  • fred zimny's serve4impact: See on...

Archives

Authors

  • Adron Hall
  • Ben Kepes
  • Chirag Mehta
  • Chris Yeh
  • Christian Reilly
  • Colin Berkshire
  • Dan Morrill
  • Dan Pepper
  • Dave Michels
  • Dave Roberts
  • Hutch Carpenter
  • Jacob Morgan
  • Jarret Pazahanick
  • Jason M. Lemkin
  • Jeffrey Vocell
  • Joel York
  • John Taschek
  • Krishnan Subramanian
  • Mark Fidelman
  • Mark Suster
  • Martijn Linssen
  • Michael Krigsman
  • Ofir Nachmani
  • Paul Miller
  • Quinton Wall
  • Rakesh Malhotra
  • Randy Bias
  • Sadagopan
  • Scott Bils
  • Zoli Erdos
Sponsored by: