Backtype is an “intelligence platform,” a suite of tools and insights that help companies quantify and understand the impact of their social media efforts. Marz works on the back end, figuring out ways to store and process terabytes of data from Twitter, Facebook, YouTube, and millions of blogs.
The platform runs on Hadoop, and makes use of Cascading, a Java API for creating complex workflows for processing data. Marz likes working with the Java-based tool for abstracting details of Hadoop because, “I find that when you’re using a custom language you end up having a lot of complexity in your program that you don’t anticipate, especially when you try to do things that are more dynamic.”
Big data tools and applications will be examined at the Strata Conference (Feb. 1-3, 2011). Save 30% on registration with the code STR11RAD.
Marz has written an abstraction on top of Cascading called Cascalog, a Clojure-based query language for Hadoop inspired by Datalog. “The cool thing about Clojure is that it fully integrates with the Java programming language,” Marz said. “I think one of the problems with Lisps in the past has been a lack of library support. But by being on top of the JVM, that problem is solved with Clojure.” He’s generally optimistic about what the functional and declarative paradigms can offer in the big data space, saying his programs are more concise and written closer to how he thinks.
Marz says he’s happy with the development activity around Cascalog since he released it in April 2010 and is working on a few enhancements, including making it more expressive by adding optimized joins as well as making the query planner more intelligent by being more aggressive with, for example, push-down filtering.
You’ll find the full interview in the following video: