The cloud makes clusters easy, but for rapid prototyping purposes, bringing up clusters still involves quite a bit of effort. It’s getting easier by the day though, as a variety of tools emerge to simplify the commissioning and management of cloud resources.
Whirr is one such tool: a simple utility and a Java API for running cloud services. It presents a uniform interface to cloud providers, so you don’t have to know each service’s API in order to negotiate their peculiarities. Furthermore, Whirr abstracts away the repetitive bits of setting up services such as Hadoop or Cassandra.
Whirr’s command-line tool can be used to bring up clusters in the cloud. Bringing up a Hadoop cluster is as easy as this one-liner:
whirr launch-cluster --service-name=hadoop --cluster-name=myhadoopcluster --instance-templates='1 jt+nn,1 dn+tt' --provider=ec2 --identity=$AWS_ACCESS_KEY_ID --credential=$AWS_SECRET_ACCESS_KEY --private-key-file=~/.ssh/id_rsa
When the cluster has launched, a script (~/.whirr/myhadoopcluster/hadoop-proxy.sh) is created, which will set up a secure tunnel to the remote cluster, letting the user execute regular Hadoop commands from their own machine.
Whirr’s service-name and instance-templates parameters are the key to running different services. The instance templates are a concise notation for specifying the contents of a cluster, and are defined on a per-service basis. The Hadoop example above,
1 jt+nn,1 dn+tt, specifies one node with the roles of “named node” and “job tracker”, and one node with roles of “data node” and “task tracker”.
Services currently supported by Whirr include:
- Hadoop (both Apache and Cloudera Distribution for Hadoop)
Adding new services involves providing initialization scripts, and implementing a small amount of Java code. Whirr is open source, currently hosted as an Apache Incubator project, and development is being led by Cloudera engineers.
- For in-person instruction on getting started with Hadoop or Cassandra, check out the Strata 2011 Tutorials.