Here are a few of the data stories that caught my attention this week.
Open data from StatsCan
Embassy Magazine broke the news this week that all of Statistics Canada’s online data will be made available to the public for free, released under the Government of Canada’s Open Data License Agreement beginning in February 2012. Statistics Canada is the federal agency commissioned with producing statistics to help understand the Canadian economy, culture, resources, and population. (It runs the Canadian census every five years.)
The decision to make the data freely and openly available “has been in the works for years,” according to Statistics Canada spokesperson Peter Frayne. The Canadian government did launch an open-data initiative earlier this year, and the move on the part of StatsCan dovetails philosophically with that. Frayne said that the decision to make the data free was not a response to the controversial decision last summer when the agency dropped its mandatory long-form census.
Open government activist David Eaves responds with a long list of “winners” from the decision, including all of the consumers of StatsCan’s data:
Indirectly, this includes all of us, since provincial and local governments are big consumers of StatsCan data and so now — assuming it is structured in such a manner — they will have easier (and cheaper) access to it. This is also true of large companies and non-profits which have used StatsCan data to locate stores, target services and generally allocate resources more efficiently. The opportunity now opens for smaller players to also benefit.
Eaves continues, stressing the importance of these smaller players:
Indeed, this is the real hope. That a whole new category of winners emerges. That the barrier to use for software developers, entrepreneurs, students, academics, smaller companies and non-profits will be lowered in a manner that will enable a larger community to make use of the data and therefore create economic or social goods.
Open data from Whitehall
The British government also announced the availability of new open datasets this week. The Guardian reports that personal health records, transportation data, housing prices, and weather data will be included “in what promises to be the most dramatic release of public data since the 2010 election.”
The government will also form an Open Data Institute (ODI), led by Sir Tim Berners-Lee. The ODI will involve both businesses and academic institutions, and will focus on helping transform the data for commercial benefit for U.K. companies as well as for the government. The ODI will also work on the development of web standards to support the government’s open-data agenda.
The Guardian notes that the health data that’s to be released will be the largest of its kind outside of U.S. veterans’ medical records. The paper cites the move as something recommended by the Wellcome Trust earlier this year: “Integrated databases … would make England unique, globally, for such research.” Both medical researchers and pharmaceutical companies will be able to access the data for free.
Dell open sources its Hadoop deployment tool
Hadoop adoption and investment has been one of the big data trends of 2011, with stories about Hadoop appearing in almost every edition of Strata Week. GigaOm’s Derrick Harris contends that Hadoop’s good fortunes will only continue in 2012, listing six reasons why next year may actually go down as “The Year of Hadoop.”
Crowbar is an open-source deployment tool developed by Dell originally as part of its Dell OpenStack Cloud service. It started as a tool for installing Open Stack, but can deploy other software through the use of plug-in modules called ‘barclamps’ … The goal of the Hadoop barclamp is to reduce Hadoop deployment time from weeks to a single day.
Finley notes that Crowbar isn’t competition to Cloudera’s line of Hadoop management tools.
What Muncie read
“People don’t read anymore,” Steve Jobs once told The New York Times. It’s a fairly common complaint, one that certainly predates the computer age — television was to blame, then video games. But our knowledge about reading habits of the past is actually quite slight. That’s what makes the database based on ledgers from the Muncie, Ind., public library so marvelous.
The ledgers, which were discovered by historian Frank Felsenstein, chronicle every book checked out of the library, along with the name of the patron who checked it out, between November 1891 and December 1902. That information is now available in the What Middletown Read database.
In a New York Times story on the database, Anne Trubek notes that even at the turn of the 20th century, most library patrons were not reading “the classics”:
What do these records tell us Americans were reading? Mostly fluff, it’s true. Women read romances, kids read pulp and white-collar workers read mass-market titles. Horatio Alger was by far the most popular author: 5 percent of all books checked out were by him, despite librarians who frowned when boys and girls sought his rags-to-riches novels (some libraries refused to circulate Alger’s distressingly individualist books). Louisa May Alcott is the only author who remains both popular and literary today (though her popularity is far less). “Little Women” was widely read, but its sequel “Little Men” even more so, perhaps because it was checked out by boys, too.
Got data news?
Feel free to email me.