Here are a few of the data stories that caught my attention this week:
Figshare sees the upside of negative results
Using the site, researchers can now upload and publish all file formats, including videos and datasets that are often deemed “supplemental materials” or excluded from current publishing models. This is part of a larger “open science” effort. According to Figshare:
“… by opening up the peer review process, researchers can easily publish null results, avoiding the file drawer effect and helping to make scientific research more efficient. Figshare uses creative commons licensing to allow frictionless sharing of research data whilst allowing users to maintain their ownership.”
As the startup argues: “Unless we as scientists publish all of our data, we will never achieve access to the sum of all scientific knowledge.”
Accel’s $100 million data fund makes its first ($52.5 million) investment
Late last year, the investment firm Accel Partners announced a new $100 Million Big Data Fund, with a promise to invest in big data startups. This year, the first investment from that fund was revealed, with a whopping $52.5 million going to Code 42.
Founded in 2001, Code 42 is the creator of the backup software CrashPlan, and the company describes itself as building “high-performance hardware and easy-to-use software solutions that protect the world’s data.”
Describing the investment, GigaOm’s Stacey Higginbotham writes:
“With the growth in mobile devices and the data stored on corporate and consumer networks that is moving not only from device to server, but device to device, [CEO Matthew] Dornquast realized Code 42′s software could become more than just a backup and sharing service, but a way for corporations to understand what data and how data was moving between employees and the devices they use.”
Higginbotham also cites Accel Partners’ Ping Li, who notes that further investments from its Big Data Fund are unlikely to be so sizable.
LinkedIn open sources DataFu
LinkedIn has been a heavy user of Apache Pig for performing analysis with Hadoop on projects such as its People You May Know tool, among other things. For more advanced tasks like these, Pig supports User Defined Functions (UDFs), which allow the integration of custom code into scripts.
This week, LinkedIn announced the release of DataFu, the consolidation of its UDFs into a single, general-purpose library. DataFu enables users to “run PageRank on a large number of independent graphs, perform set operations such as intersect and union, compute the haversine distance between two points on the globe,” and more.
LinkedIn is making DataFu available on GitHub under the Apache 2.0 license.
Got data news?
Feel free to email me.