Here are some of the data stories that caught my attention this week:
Data without borders
Data is everywhere. That much we know. But the usage of and benefit from data is not evenly distributed, and this week, New York Times data scientist Jake Porway has issued a call to arms to address this. He’s asking for developers and data scientists to help build a Data Without Borders-type effort to take data — particularly NGO and non-profits’ data — and match it with people who know what to do with it.
As Porway observes:
There’s a lot of effort in our discipline put toward what I feel are sort of “bourgeois” applications of data science, such as using complex machine learning algorithms and rich datasets not to enhance communication or improve the government, but instead to let people know that there’s a 5% deal on an iPad within a 1 mile radius of where they are. In my opinion, these applications bring vanishingly small incremental improvements to lives that are arguably already pretty awesome.
Porway proposes building a program to help match data scientists with non-profits and the like who need data services. The idea is still under development, but drop Porway a line if you’re interested.
Big data and the future of journalism
The Knight Foundation announced the winners of its Knight News Challenge this week, a competition to find and support the best new ideas in journalism. The Knight Foundation selected 16 projects to fund from among hundreds of applicants.
In announcing the winners, the Knight Foundation pointed out a couple of important trends, including “the rise of the hacker/data journalist.” Indeed, several of the projects are data-related, including Swiftriver, a project that aims to make sense of crisis data; ScraperWiki, a tool for users to create their own custom scrapers; and Overview, a project that will create visualization tools to help journalists better understand large data sets.
IBM releases it first Netezza appliance
This week, IBM released its first new Netezza appliance since acquiring the company. The IBM Netezza High Capacity Appliance is designed to analyze up to 10 petabytes in just a few minutes. “With the new appliance, IBM is looking to make analysis of so-called big data sets more affordable,” Steve Mills, senior vice president and group executive of software and systems at IBM, told ZDNet.
The new Netezza appliance is part of IBM’s larger strategy of handling big data, of which its recent success with Watson on Jeopardy was just one small part.
The superhero social graph
Plenty of attention is paid to the social graph: the ways in which we are connected online through our various social networks. And while there’s still lots of work to be done making sense of that data and of those relationships, a new dataset released this week by the data marketplace Infochimps points to other social (fictional) worlds that can be analyzed.
The world, in this case, is that of the Marvel Comics universe. The Marvel dataset was constructed by Cesc Rosselló, Ricardo Alberich, and Joe Miro from the University of the Balearic Islands. Much like a real social graph, the data shows the relationships between characters, and according to the researchers “is closer to a real social graph than one might expect.”
Got data news?
Feel free to email me.