Here are a few of the data stories that caught my attention this week.
IBM’s cloud-based Hadoop offering looks to make data analytics easier
At its conference in Las Vegas this week, IBM made a number of major big-data announcements, including making its Hadoop-based product InfoSphere BigInsights available immediately via the company’s SmartCloud platform. InfoSphere BigInsights was unveiled earlier this year, and it is hardly the first offering that Big Blue is making to help its customers handle big data. The last few weeks have seen other major players also move toward Hadoop offerings — namely Oracle and Microsoft — but IBM is offering its service in the cloud, something that those other companies aren’t yet doing. (For its part, Microsoft does say that a Hadoop service will come to Azure by the end of the year.)
IBM joins Amazon Web Services as the only other company currently offering Hadoop in the cloud, notes GigaOm’s Derrick Harris. “Big data — and Hadoop, in particular — has largely been relegated to on-premise deployments because of the sheer amount of data involved,” he writes, “but the cloud will be a more natural home for those workloads as companies begin analyzing more data that originates on the web.”
Harris also points out that IBM’s Hadoop offering is “fairly unique” insofar as it targets businesses rather than programmers. IBM itself contends that “bringing big data analytics to the cloud means clients can capture and analyze any data without the need for Hadoop skills, or having to install, run, or maintain hardware and software.”
Cleaning up location data with Factual Resolve
The data platform Factual launched a new API for developers this week that tackles one of the more frustrating problems with location data: incomplete records. Called Factual Resolve, the new offering is, according to a company blog post, an “entity resolution API that can complete partial records, match one entity against another, and aid in de-duping and normalizing datasets.”
Developers using Resolve tell it what they know about an entity (say, a venue name) and the API can return the rest of the information that Factual knows based on its database of U.S. places — address, category, latitude and longitude, and so on.
Tyler Bell, Factual’s director of product, discussed the intersection of location and big data at this year’s Where 2.0 conference. The full interview is contained in the following video:
Google and governments’ data requests
As part of its efforts toward better transparency, Google has updated its Government Requests tool this week with information about the number of requests the company has received for user data since the beginning of 2011.
This is the first time that Google is disclosing not just the number of requests, but the number of user accounts specified as well. It’s also made the raw data available so that interested developers and researchers can study and visualize the information.
According to Google, requests from U.S. government officials for content removal were up 70% in this reporting period (January-June 2011) versus the previous six months. And the number of user data requests was up by 29% compared to the previous reporting period. Google also says it received requests from local law enforcement agencies to take down various YouTube videos — one on police brutality, one that was allegedly defamatory — but Google says that it did not comply. But of the 5,950 user data requests (impacting some 11,000 user accounts) submitted between January and June 2011, Google says that it has complied with 93%, either fully or partially.
The U.S. was hardly the only government making an increased number of requests to Google. Spain, South Korea, and the U.K., for example, also made more requests. Several countries, including Sri Lanka and the Cook Islands, made their first requests.
Got data news?
Feel free to email me.