The last two months have seen some important developments in the way data is made available. First, Infochimps created a web API for publishing data. The number of datasets is relatively limited; there are five available now, of which four have to do with Twitter data, and one maps IP addresses to census data (and that one appears not to be available yet). Their site allows you to request (or vote on requests) for new datasets. Pricing is reasonable. You can do significant experimentation, or even run a useful low-volume application, without running up any charges.
“Data as a service” is not a new term, by any means. There have been any number of data services over the years. But this is something different from the many services that have sold data — or even the more recent services that have sold data via the Internet. Data as a service is another part of the cloud computing alphabet soup, on par with “infrastructure, software, or platform as a service” (IaaS/SaaS/PaaS). Infochimps makes possible applications where data lives in the cloud. Granted, you’re not going to access terabyte datasets over the Internet. But neither do you have to download (or have shipped) a giant dataset for the few Kilo- or Megabytes that interest you. Infochimps is pushing a bit beyond simple data access. Their Twitter APIs aren’t raw data, but implement trust metrics, influence metrics, and more. So perhaps it’s better to call this “algorithm as a service” (AaaS), not unlike the Prediction API (machine learning using Google’s algorithms) that was announced at Google I/O.
The second new data service that has impressed me is Google’s new Public Data Explorer. I assume that everyone reading this article has seen the latest spectacular data visualizations, in the New York Times, Nathan Yau’s Flowingdata blog, and elsewhere. Here’s one example from GE (created by Ben Fry’s Fathom Information Design). Public Data Explorer lets you create your own visualizations, based on Google’s data.
Here’s one of their examples (nicer than anything I came up with on the fly). It’s an animation of per-capita income in California counties that shows how the individual counties have fared from 1969 to 2007. I’ve highlighted a few interesting counties — let’s see how they perform:
Not surprisingly, the difference between the richest and poorest counties has drastically increased. Google provides many datasets, and gives you interesting ways to arrange and animate the data. I’ve displayed a fairly simple bar graph animation, but you can also do bubbles on a map and several kinds of Cartesian plots. You can slice and dice regions in many different ways, frequently down to the county level. They’ve got data from the European community, from Australia, the World Bank, and other sources. None of this is exactly new: the data has been around for years. What Google Data Explorer does is enable you to explore the data yourself and paste the result into your own sites and blogs.
Data books from O’Reilly:
R in a Nutshell
A quick and practical reference to learn what is becoming the standard for developing statistical software.
Statistics in a Nutshell
An introduction and reference for anyone with no previous background in statistics.
Data Analysis with Open Source Tools
This book shows you how to think about data and the results you want to achieve with it.
Programming Collective Intelligence
Learn how to build web applications that mine the data created by people on the Internet.
Learn from the best data practitioners in the field about how wide-ranging — and beautiful — working with data can be.
This book demonstrates why visualizations are beautiful not only for their aesthetic design, but also for elegant layers of detail.
Head First Statistics
This book teaches statistics through puzzles, stories, visual aids, and real-world examples.
Head First Data Analysis
Learn how to collect your data, sort the distractions from the truth, and find meaningful patterns.
Here’s an example: a simple widget to compare two stocks. You can select your own stocks or just use my defaults (Apple and Google):
It took me about five minutes to whip up this widget, starting with the simple Alpha query “APPL GOOG.” But there are many ways to look up stock prices and histories. What about something more esoteric? Alpha knows an incredible amount. The other day my wife and I couldn’t remember what a half-diminished seventh chord was. Alpha knows, and can show you a piano keyboard, guitar fingerings, and even play the chord. Here’s the result:
You’ll confuse it if you try really odd chords; don’t try fancy jazz ninths and thirteenths, and remember to specify “triad” if you want your basic three-note chord.
Alpha’s weak point is that you frequently end up playing “guess what Alpha wants.” I suppose that’s what you trade off for flexibility; but it was surprisingly difficult to build an interest calculator. Building the widget was simple enough, but coming up with the initial query was difficult. Alpha would either assume I was paying off a loan, or doing a present value calculation, or something else, until I juggled the terms into the right order, which happened to be “10% interest $100 initial value 7 years.” Not illogical, but neither were any of the other attempts.
That’s a minor problem, though. Widgets makes it fun to explore data and computation, and makes it trivial to share the results. With “data as a service” APIs like Infochimps, and embeddable data components like Google Public Data Explorer and WolframAlpha Widgets, we’re seeing the democratization of data and data visualization: new ways to access data, new ways to play with data, and new ways to communicate the results to others.