We’re publishing a new Strata Gem each day all the way through to December 24. Yesterday’s Gem: Use GPUs to speed up calculation. Early-bird pricing on Strata closes today, December 14: don’t forget to register!
Innovation in big data processing architectures is far from over. While Hadoop was there first, a second generation of systems is emerging, typified by Google’s Caffeine. The drive to real-time analysis in 2011 will only accelerate change.
Change isn’t easy for everyone. For administrators running processing clusters, a key problem is scheduling and managing the workload of big data systems. Current solutions for this aren’t really optimized for the scenario of evolving and heterogenous frameworks.
Enter Mesos, a key piece of cloud infrastructure to watch in 2011. Put simply, Mesos allows a collection of distributed applications to share a compute cluster, in the same way Linux allows multiple applications to share a single computer.
Mesos architectural overview, from Mesos presentation given to Bay Area Hadoop Users Group
Deploying processing frameworks on top of Mesos gives you exceptional flexibility: if a new version of Hadoop comes out, you no longer have the expense and worry of running a parallel cluster and switching. You simply deploy the new version inside the cluster and can phase out the old when you want – or easily roll back. If a whole new architecture comes out, you don’t have to invest in a separate cluster to run it, you can use the same cluster.
Mesos offers other benefits, including the ability to isolate frameworks using Linux containers, and data locality for frameworks that require it, such as Hadoop.
Currently at an alpha stage of maturity, Mesos has recently been proposed to the Apache Incubator. It has been developing rapidly for nearly two years now, and is set to become a major part of big data infrastructure in the coming 24 months. In 2011, Mesos may well be the new ‘M’ in the SMAQ stack.