The sale of data is a venerable business, and has existed since the middle of the 19th century, when Paul Reuter began providing telegraphed stock exchange prices between Paris and London, and New York newspapers founded the Associated Press.
The web has facilitated a blossoming of information providers. As the ability to discover and exchange data improves, the need to rely on aggregators such as Bloomberg or Thomson Reuters is declining. This is a good thing: the business models of large aggregators do not readily scale to web startups, or casual use of data in analytics.
Instead, data is increasingly offered through online marketplaces: platforms that host data from publishers and offer it to consumers. This article provides an overview of the most mature data markets, and contrasts their different approaches and facilities.
What do marketplaces do?
Most of the consumers of data from today’s marketplaces are developers. By adding another dataset to your own business data, you can create insight. To take an example from web analytics: by mixing an IP address database with the logs from your website, you can understand where your customers are coming from, then if you add demographic data to the mix, you have some idea of their socio-economic bracket and spending ability.
Such insight isn’t limited to analytic use only, you can use it to provide value back to a customer. For instance, by recommending restaurants local to the vicinity of a lunchtime appointment in their calendar. While many datasets are useful, few are as potent as that of location in the way they provide context to activity.
Marketplaces are useful in three major ways. First, they provide a point of discoverability and comparison for data, along with indicators of quality and scope. Second, they handle the cleaning and formatting of the data, so it is ready for use (often 80% of the work in any data integration). Finally, marketplaces provide an economic model for broad access to data that would otherwise prove difficult to either publish or consume.
In general, one of the important barriers to the development of the data marketplace economy is the ability of enterprises to store and make use of the data. A principle of big data is that it’s often easier to move your computation to the data, rather than the reverse. Because of this, we’re seeing the increasing integration between cloud computing facilities and data markets: Microsoft’s data market is tied to its Azure cloud, and Infochimps offers hosted compute facilities. In short-term cases, it’s probably easier to export data from your business systems to a cloud platform than to try and expand internal systems to integrate external sources.
While cloud solutions offer a route forward, some marketplaces also make the effort to target end-users. Microsoft’s data marketplace can be accessed directly through Excel, and DataMarket provides online visualization and exploration tools.
The four most established data marketplaces are Infochimps, Factual, Microsoft Windows Azure Data Marketplace, and DataMarket. A table comparing these providers is presented at the end of this article, and a brief discussion of each marketplace follows.
According to founder Flip Kromer, Infochimps was created to give data life in the same way that code hosting projects such as SourceForge or GitHub give life to code. You can improve code and share it: Kromer wanted the same for data. The driving goal behind Infochimps is to connect every public and commercially available database in the world to a common platform.
Infochimps realized that there’s an important network effect of “data with the data,” that the best way to build a data commons and a data marketplace is to put them together in the same place. The proximity of other data makes all the data more valuable, because of the ease with which it can be found and combined.
The biggest challenge in the two years Infochimps has been operating is that of bootstrapping: a data market needs both supply and demand. Infochimps’ approach is to go for a broad horizontal range of data, rather than specialize. According to Kromer, this is because they view data’s value as being in the context it provides: in giving users more insight about their own data. To join up data points into a context, common identities are required (for example, a web page view can be given a geographical location by joining up the IP address of the page request with that from the IP address in an IP intelligence database). The benefit of common identities and data integration is where hosting data together really shines, as Infochimps only needs to integrate the data once for customers to reap continued benefit: Infochimps sells datasets which are pre-cleaned and integrated mash-ups of those from their providers.
By launching a big data cloud hosting platform alongside its marketplace, Infochimps is seeking to build on the importance of data locality.
Factual was envisioned by founder and CEO Gil Elbaz as an open data platform, with tools that could be leveraged by community contributors to improve data quality. The vision is very similar to that of Infochimps, but in late 2010 Factual elected to concentrate on one area of the market: geographical and place data. Rather than pursue a broad strategy, the idea is to become a proven and trusted supplier in one vertical, then expand. With customers such as Facebook, Factual’s strategy is paying off.
According to Elbaz, Factual will look to expand into verticals other than local information in 2012. It is moving one vertical at a time due to the marketing effort required in building quality community and relationships around the data.
Unlike the other main data markets, Factual does not offer reselling facilities for data publishers. Elbaz hasn’t found that the cash on offer is attractive enough for many organizations to want to share their data. Instead, he believes that the best way to get data you want is to trade other data, which could provide business value far beyond the returns of publishing data in exchange for cash. Factual offer incentives to their customers to share data back, improving the quality of the data for everybody.
Windows Azure Data Marketplace
Launched in 2010, Microsoft’s Windows Azure Data Marketplace sits alongside the company’s Applications marketplace as part of the Azure cloud platform. Microsoft’s data market is positioned with a very strong integration story, both at the cloud level and with end-user tooling.
Through use of a standard data protocol, OData, Microsoft offers a well-defined web interface for data access, including queries. As a result, programs such as Excel and PowerPivot can directly access marketplace data: giving Microsoft a strong capability to integrate external data into the existing tooling of the enterprise. In addition, OData support is available for a broad array of programming languages.
Azure Data Marketplace has a strong emphasis on connecting data consumers to publishers, and most closely approximates the popular concept of an “iTunes for Data.” Big name data suppliers such as Dun & Bradstreet and ESRI can be found among the publishers. The marketplace contains a good range of data across many commercial use cases, and tends to be limited to one provider per dataset — Microsoft has maintained a strong filter on the reliability and reputation of its suppliers.
Where the other three main data marketplaces put a strong focus on the developer and IT customers, DataMarket caters to the end-user as well. Realizing that interacting with bland tables wasn’t engaging users, founder Hjalmar Gislason worked to add interactive visualization to his platform.
The result is a data marketplace that is immediately useful for researchers and analysts. The range of DataMarket’s data follows this audience too, with a strong emphasis on country data and economic indicators. Much of the data is available for free, with premium data paid at the point of use.
DataMarket has recently made a significant play for data publishers, with the emphasis on publishing, not just selling data. Through a variety of plans, customers can use DataMarket’s platform to publish and sell their data, and embed charts in their own pages. At the enterprise end of their packages, DataMarket offers an interactive branded data portal integrated with the publisher’s own web site and user authentication system. Initial customers of this plan include Yankee Group and Lux Research.
Data markets compared
|Data sources||Broad range||Range, with a focus on country and industry stats||Geo-specialized, some other datasets||Range, with a focus on geo, social and web sources|
|Free trials of paid data||Yes||-||Yes, limited free use of APIs||-|
|Delivery||OData API||API, downloads||API, downloads for heavy users||API, downloads|
|Application hosting||Windows Azure||-||-||Infochimps Platform|
|Previewing||Service Explorer||Interactive visualization||Interactive search||-|
|Tool integration||Excel, PowerPivot, Tableau and other OData consumers||-||Developer tool integrations||-|
|Data publishing||Via database connection or web service||Upload or web/database connection.||Via upload or web service.||Upload|
|Data reselling||Yes, 20% commission on non-free datasets||Yes. Fees and commissions vary. Ability to create branded data market||-||Yes. 30% commission on non-free datasets.|
Other data suppliers
While this article has focused on the more general purpose marketplaces, several other data suppliers are worthy of note.
Wolfram Alpha — Perhaps the most prolific integrator of diverse databases, Wolfram Alpha recently added a Pro subscription level that permits the end user to download the data resulting from a computation.