Sneak peek at my upcoming session at the Strata Conference in Santa Clara
Visualizing data and extracting it from its data store are two activities that go hand in hand. Typically, when you try to use a data visualization toolkit such as Raphael, Protovis or D3 to create a non-trivial visualization, you spend a significant portion of your time writing code to extract the data. The process may involve querying an external database then transforming the resulting data to the correct structure for your visualization.
In his paper introducing plyr, a data manipulation toolkit for R, Hadley Wickham describes a framework, split-apply-combine, for expressing common data operations. The idea is that most data operations can be seen as splitting the data into a series of buckets, applying some aggregation to each bucket to get an aggregate and then combining the results by sorting and limiting. Wickham argues that most data query languages already rely on an equivalent framework whether explicitly or implicitly.