Here are a few of the data stories that caught my attention this week:
Prospecting for data
The data science competition site Kaggle is extending its features with a new service called Prospect. Prospect allows companies to submit a data sample to the site without having a pre-ordained plan for a contest. In turn, the data scientists using Kaggle can suggest ways in which machine learning could best uncover new insights and answer less-obvious questions — and what sorts of data competitions could be based on the data.
As GigaOm’s Derrick Harris describes it: “It’s part of a natural evolution of Kaggle from a plucky startup to an IT company with legs, but it’s actually more like a prequel to Kaggle’s flagship predictive modeling competitions than it is a sequel.” It’s certainly a good way for companies to get their feet wet with predictive modeling.
HP’s big data plans
Last year, Hewlett Packard made a move away from the personal computing business and toward enterprise software and information management. It’s a move that was marked in part by the $10 billion it paid to acquire Autonomy. Now we know a bit more about HP’s big data plans for its Information Optimization Portfolio, which has been built around Autonomy’s Intelligent Data Operating Layer (IDOL).
ReadWriteWeb’s Scott M. Fulton takes a closer look at HP’s big data plans.
The latest from Cloudera
CDH 4.0 includes:
“… high availability for the filesystem, ability to support multiple namespaces, HBase table and column level security, improved performance, HBase replication and greatly improved usability and browser support for the Hue web interface. Cloudera Manager 4 includes multi-cluster and multi-version support, automation for high availability and MapReduce2, multi-namespace support, cluster-wide heatmaps, host monitoring and automated client configurations.”
Social data platform DataSift also announced this week that it was powering its Hadoop clusters with CDH to perform the “Big Data heavy lifting to help deliver DataSift’s Historics, a cloud-computing platform that enables entrepreneurs and enterprises to extract business insights from historical public Tweets.”
Have data news to share?
Feel free to email us.