Strata Week: Real-time Hadoop

Cloudera ventures into real-time queries with Impala, data centers are the new landfill, and Jesper Andersen looks at the relationship between art and data.

Here are a few stories from the data space that caught my attention this week.

Cloudera’s Impala takes Hadoop queries into real-time

Cloudera ventured into real-time Hadoop querying this week, opening up its Impala software platform. As Derrick Harris reports at GigaOm, Impala — an SQL query engine — doesn’t rely on MapReduce, making it faster than tools such as Hive. Cloudera estimates its queries run 10 times faster than Hive, and Charles Zedlewski, Cloudera’s cloud VP of products, told Harris that “small queries can run in less than a second.”

Harris notes that Zedlewski pointed out that Impala wasn’t designed to replace business intelligence (BI) tools, and that “Cloudera isn’t interested in selling BI or other analytic applications.” Rather, Impala serves as the execution engine, still relying on software from Cloudera partners — Zedlewski told Harris, “We’re sticking to our knitting as a platform vendor.”

Joab Jackson at PC World reports that “[e]ventually, Impala will be the basis of a Cloudera commercial offering, called the Cloudera Enterprise RTQ (Real-Time Query), though the company has not specified a release date.”

Impala has plenty of competition on this playing field, which Harris also covers, and he notes the significance of all the recent Hadoop innovation:

“I can’t underscore enough how critical all of this innovation is for Hadoop, which in order to add substance to its unparalleled hype needed to become far more useful to far more users. But the sudden shift from Hadoop as a batch-processing engine built on MapReduce into an ad hoc SQL querying engine might leave industry analysts and even Hadoop users scratching their heads.”

You can read more from Harris’ piece here and Jackson’s piece here. Wired also has an interesting piece on Impala, covering the Google F1 database upon which it is based and the Googler Cloudera hired away to help build it.

(Cloudera CEO Mike Olson discussed Impala, Hadoop and the importance of real-time at this week’s Strata Conference + Hadoop World.)

Read more…

Comment: 1 |
Strata Week: Data prospecting with Kaggle

Strata Week: Data prospecting with Kaggle

Kaggle now accepting data before a contest, HP's Autonomy purchase comes into focus, Cloudera's new Hadoop distribution.

In this week's data news, Kaggle launches Prospect, HP unveils its big data plans, and Cloudera releases CDH4 (the latest version of its Hadoop distribution).

Comment |
Strata Week: A .data TLD?

Strata Week: A .data TLD?

A proposal for a .data TLD, flavors of Hadoop, and a vote for pseudonymous commenters.

In this week’s data news, Stephen Wolfram calls for a .data top-level domain and Cloudera responds to Hadoop version 1.0.

Comments: 2 |
Strata Week: Simplifying MapReduce through Java

Strata Week: Simplifying MapReduce through Java

MapReduce gets easier, a new search engine for data, and now you can monitor the universe's forces on your phone.

Cloudera's Crunch hopes to make MapReduce easier, Datafiniti launches a search engine for data, and the University of Oxford releases an Android app for monitoring CERN data.

Comments: 3 |
Strata Week: Oracle’s big data play

Strata Week: Oracle’s big data play

Oracle unveils its big data appliance, the Hadoop community gauges contributions.

In this week's data news, Oracle unveils its big data strategy, and Cloudera looks at the contributions to the Hadoop core and community.

Comment |

Strata Gems: Whirr makes Hadoop and Cassandra a snap

Get control over cloud resources

The cloud makes clusters easy, but for rapid prototyping purposes, bringing up clusters still involves quite a bit of effort. The Whirr project makes cloud control simple.

Comments Off |