Data Democracy: Ready to Transform Business
The barriers that have stood between business data – and the true value it holds – are finally coming down. Access to data is finally being democratized within organizations. The access costs and operational and IT silos are coming down, and for the first time any collaborative team, virtually anyone in an organization can get […]

Chasing Qualitative Signal In Quantitative Big Data Noise
Joey Votto is one of the best hitters in the MLB who plays for Cincinnati Reds. Lately he has received a lot of criticism for not swinging on strikes when there are runners on base. Five Thirty Eight decided to analyze this criticism with the help of data. They found this criticism to be true; […]

How to Lie with Data
Back in the early Nineties, I was working on a Ph.D applying a tool called a Geographic Information System (GIS) to the challenge of modelling archaeological deposits under cities. For those of us worrying about these things, Mark Monmonier‘s then-newly published first edition of How to Lie with Maps was required reading. It wasn’t so much a handbook […]

Hacking Into The Indian Education System Reveals Score Tampering
Debarghya Das has a fascinating story on how he managed to bypass a silly web security layer to get access to the results of 150,000 ISCE (10th grade) and 65,000 ISC (12th grade) students in India. While lack of security and total ignorance to safeguard sensitive information is an interesting topic what is more fascinating […]

Unsupervised Machine Learning, Most Promising Ingredient Of Big Data
Orange (France Telecom), one of the largest mobile operators in the world, issued a challenge “Data for Development” by releasing a dataset of their subscribers in Ivory Coast. The dataset contained 2.5 billion records, calls and text messages exchanged between 5 million anonymous users in Ivory Coast, Africa. Various researchers got access to this dataset […]

Getting it right with data attribution
There have always, it seems, been people for whom attribution and citation really matter. Some of them passionately engage in arguments that last months or years, debating the merits of comma placement in written citations for the work of others. Bizarre, right? But, as we all become increasingly dependent upon data sourced from third parties, […]

Seeking Simplicity’s Sweet Spot
Albert Einstein, you may have heard, was a clever man. He scribbled equations on blackboards, thought big thoughts, and all of that. But, allegedly, he also said Everything should be made as simple as possible, but not simpler. These words have resonated with me recently, as I’ve heard pitches from one company after another, all […]

Visualisation – the key that unlocks data’s value?
As the Big Data hype machine continues its relentless attempt to gobble everything in its path, new business units and entire new domains buying into the promise find themselves faced with unanticipated data volume and complexity. They see the potential for data-based decision making, but still face (short-term?) challenges in actually managing, analysing or interpreting […]

Doing the DataBeat
For the past two years, Ben Kepes and I have helped the team at VentureBeat assemble the programme for their annual Cloud Computing event, CloudBeat. It looks as though we may end up doing something similar with them this year, as CloudBeat moves from Redwood City to downtown San Francisco, and from November to September. […]

A Data Scientist’s View On Skills, Tools, And Attitude
I recently came across this interview (thanks Dharini for the link!) with Nick Chamandy, a statistician a.k.a a data scientist at Google. I would encourage you to read it; it does have some great points. I found the following snippets interesting: Recruiting data scientists: When posting job opportunities, we are cognizant that people from different […]

Commoditizing Data Science
My ongoing conversations with several people continue to reaffirm my belief that Data Science is still perceived to be a sacred discipline and data scientists are perceived to be highly skilled statisticians who walk around wearing white lab coats. The best data scientists are not the ones who know the most about data but they […]

Hubris and the Data Scientist
ReadWriteWeb‘s Joe Brockmeier captures a recurring issue from last week’s O’Reilly Strata conference, asking “Can Big Data replace domain expertise?” According to Brockmeier, the audience (of data scientists) apparently narrowly agreed that their arsenal of tools and algorithms trumped the knowledge and experience of the meteorologists, financiers, and retailers to whose domains data scientists are increasingly […]

Top Level Domain for data answers the wrong question
Image of Stephen Wolfram via Wikipedia British-born computer scientist Stephen Wolfram sees ongoing efforts to extend the Internet’s top-level domains (TLDs) beyond the familiar .com, .org, .uk etc as an opportunity to raise the profile of machine-readable data. In a blog post published yesterday, he argues that a new .data domain would increase “exposure of data on […]