Hacking Into The Indian Education System Reveals Score Tampering
Debarghya Das has a fascinating story on how he managed to bypass a silly web security layer to get access to the results of 150,000 ISCE (10th grade) and 65,000 ISC (12th grade) students in India. While lack of security and total ignorance to safeguard sensitive information is an interesting topic what is more fascinating [...]
Unsupervised Machine Learning, Most Promising Ingredient Of Big Data
Orange (France Telecom), one of the largest mobile operators in the world, issued a challenge “Data for Development” by releasing a dataset of their subscribers in Ivory Coast. The dataset contained 2.5 billion records, calls and text messages exchanged between 5 million anonymous users in Ivory Coast, Africa. Various researchers got access to this dataset [...]
Getting it right with data attribution
There have always, it seems, been people for whom attribution and citation really matter. Some of them passionately engage in arguments that last months or years, debating the merits of comma placement in written citations for the work of others. Bizarre, right? But, as we all become increasingly dependent upon data sourced from third parties, [...]
Seeking Simplicity’s Sweet Spot
Albert Einstein, you may have heard, was a clever man. He scribbled equations on blackboards, thought big thoughts, and all of that. But, allegedly, he also said Everything should be made as simple as possible, but not simpler. These words have resonated with me recently, as I’ve heard pitches from one company after another, all [...]
Visualisation – the key that unlocks data’s value?
As the Big Data hype machine continues its relentless attempt to gobble everything in its path, new business units and entire new domains buying into the promise find themselves faced with unanticipated data volume and complexity. They see the potential for data-based decision making, but still face (short-term?) challenges in actually managing, analysing or interpreting [...]
Doing the DataBeat
For the past two years, Ben Kepes and I have helped the team at VentureBeat assemble the programme for their annual Cloud Computing event, CloudBeat. It looks as though we may end up doing something similar with them this year, as CloudBeat moves from Redwood City to downtown San Francisco, and from November to September. [...]
A Data Scientist’s View On Skills, Tools, And Attitude
I recently came across this interview (thanks Dharini for the link!) with Nick Chamandy, a statistician a.k.a a data scientist at Google. I would encourage you to read it; it does have some great points. I found the following snippets interesting: Recruiting data scientists: When posting job opportunities, we are cognizant that people from different [...]
Commoditizing Data Science
My ongoing conversations with several people continue to reaffirm my belief that Data Science is still perceived to be a sacred discipline and data scientists are perceived to be highly skilled statisticians who walk around wearing white lab coats. The best data scientists are not the ones who know the most about data but they [...]
Hubris and the Data Scientist
ReadWriteWeb‘s Joe Brockmeier captures a recurring issue from last week’s O’Reilly Strata conference, asking “Can Big Data replace domain expertise?” According to Brockmeier, the audience (of data scientists) apparently narrowly agreed that their arsenal of tools and algorithms trumped the knowledge and experience of the meteorologists, financiers, and retailers to whose domains data scientists are increasingly [...]
Top Level Domain for data answers the wrong question
Image of Stephen Wolfram via Wikipedia British-born computer scientist Stephen Wolfram sees ongoing efforts to extend the Internet’s top-level domains (TLDs) beyond the familiar .com, .org, .uk etc as an opportunity to raise the profile of machine-readable data. In a blog post published yesterday, he argues that a new .data domain would increase “exposure of data on [...]
