Here are a few of the data stories that caught my attention this week.
Big data booming
The call for speakers for Strata New York has closed, but as Edd Dumbill notes, the number of proposals are a solid indication of the booming interest in big data. The first Strata conference, held in California in 2011, elicited 255 proposals. The following event in New York elicited 230. The most recent Strata, held in California again, had 415 proposals. And the number received for Strata’s fall event in New York? That came in at 635.
“That’s some pretty amazing growth. I can thus expect two things from Strata New York. My job in putting the schedule together is going to be hard. And we’re going to have the very best content around.”
The increased popularity of the Strata conference is just one data point from the week that highlights a big data boom. Here’s another: According to a recent report by IDC, the “worldwide ecosystem for Hadoop-MapReduce software is expected to grow at a compound annual rate of 60.2 percent, from $77 million in revenue in 2011 to $812.8 million in 2016.”
“Hadoop and MapReduce are taking the software world by storm,” says IDC’s Carl Olofson. Or as GigaOm’s Derrick Harris puts it: “All aboard the Hadoop money train.”
A big data gap?
Another report released this week reins in some of the exuberance about big data. This report comes from the government IT network MeriTalk, and it points to a “big data gap” in the government — that is, a gap between the promise and the capabilities of the federal government to make use of big data. That’s interesting, no doubt, in terms of the Obama administration’s recent $200 million commitment to a federal agency big data initiative.
Among the MeriTalk report’s findings: 60% of government IT professionals say their agency is analyzing the data it collects and less than half (40%) are using data to make strategic decisions. Those responding to the survey said they felt as though it would take, on average, three years before their agencies were ready to fully take advantage of big data.
Prismatic and data-mining the news
The largest-ever healthcare fraud scheme was uncovered this past week. Arrests were made in seven cities — some 107 doctors, nurses and social workers were charged, with fraudulent Medicare claims totaling about $452 million. The discoveries about the fraudulent behavior were made thanks in part to data-mining — looking for anomalies in the Medicare filings made by various health care providers.
Prismatic penned a post in which it makes the case for more open data so that there’s “less friction” in accessing the sort of information that led to this sting operation.
“Both the recent sting and the Prime case show that you need real journalists and investigators working with technology and data to achieve good results. The challenge now is to scale this recipe and force transparency on a larger scale.
“We need to get more technically sophisticated and start analysing the data sets up front to discover the right questions to ask, not just the answer the questions we already know to ask based on up-front human investigation. If we have to discover each fraud ring or singleton abuse as a one-off case, we’ll never be able to wipe out fraud on a large enough scale to matter.”
Indeed, despite this being the largest bust ever, it’s really just a fraction of the estimated $20 to $100 billion a year in Medicare fraud.
Got data news?
Feel free to email me.