ENTRIES TAGGED "MapReduce"
MapReduce gets easier, a new search engine for data, and now you can monitor the universe's forces on your phone.
Cloudera's Crunch hopes to make MapReduce easier, Datafiniti launches a search engine for data, and the University of Oxford releases an Android app for monitoring CERN data.
MapReduce crunches a million-song dataset, GPS and accident reconstruction, and WWI crowdsourcing.
This week's data stories include a guide to using MapReduce to process the Million Song Dataset, a story about how GPS data can help reconstruct lost memories (and accidents), and evidence that emergency crowdsourcing goes back further than many realize.
Cloudera CEO Mike Olson on Hadoop's architecture and its data applications.
Hadoop gets a lot of buzz in database circles, but some folks are still hazy about what it is and how it works. In this interview, Cloudera CEO and Strata speaker Mike Olson discusses Hadoop's background and its current utility.
The founder of Drawn to Scale explains how his database platform does simple things quickly.
Bradford Stephens, founder of of Drawn to Scale, discusses big data systems that work in "user time."
Digits of pi, extruding images with iPads, and mapping the past on top of the present
In this edition of Strata Week: The 2,000,000,000,000,000th digit of pi is calculated with an assist from Hadoop and MapReduce; a new technique uses iPads to extrude light paintings across a long exposure shot; Historypin links historical photos to Google Street View shots; and this is the last week for Strata Conference proposal submissions.
Storage, MapReduce and Query are ushering in data-driven products and services.
We're at the beginning of a revolution in data-driven products and services, driven by a software stack that enables big data processing on commodity hardware. Learn about the SMAQ stack, and where today's big data tools fit in.
Blue is the color, getting help with email overload.
In the latest edition of Strata Week: Google's introduction of a new search-indexing system highlights an important limitation of MapReduce and Hadoop. Can MapReduce adapt to real-time needs or will others follow Google in creating new architectures for real-time analytics?
Some organizations create their own real-time analysis tools, while others turn to specialized solutions. In a previous post, I highlighted SQL-based real-time analytic tools that can handle large amounts of data. I noted that other big data management systems such as MPP databases and MapReduce/Hadoop were too batch-oriented to deliver analysis in near real-time. At least for MapReduce/Hadoop systems things may have changed slightly. A group of researchers from UC Berkeley and Yahoo recently modified MapReduce to allow for pipelining between operators.
The growing need to manage and make sense of Big Data, has led to a surge in demand for analytic databases, which many companies are attempting to fill. As an alternative to current shared-nothing analytic databases, HadoopDB is a hybrid that combines parallel databases with scalable and fault-tolerant Hadoop/MapReduce systems.
Our belief that proficiency in managing and analyzing large amounts of data distinguishes market leading companies, led to a recent report designed to help users understand the different large-scale data management techniques. Our report on Big Data Technologies was the result of interviews with over thirty experts, including research scientists, (open-source) hackers, vendors, data analysts, and entrepreneurs. I recently sat down with my co-author, Roger Magoulas (Director of Research at O’Reilly), who agreed talk about our report and Big Data in general.