Facet: The recursive approach to visualization

Sneak peek at my upcoming session at the Strata Conference in Santa Clara

Visualizing data and extracting it from its data store are two activities that go hand in hand. Typically, when you try to use a data visualization toolkit such as Raphael, Protovis or D3 to create a non-trivial visualization, you spend a significant portion of your time writing code to extract the data. The process may involve querying an external database then transforming the resulting data to the correct structure for your visualization.

In his paper introducing plyr, a data manipulation toolkit for R, Hadley Wickham describes a framework, split-apply-combine, for expressing common data operations. The idea is that most data operations can be seen as splitting the data into a series of buckets, applying some aggregation to each bucket to get an aggregate and then combining the results by sorting and limiting. Wickham argues that most data query languages already rely on an equivalent framework whether explicitly or implicitly.

l will explore how Hadley’s split-apply-combine principle can be extended to create data visualizations. I am calling that extension “Facet”. The advantage of building a visualization description language on top of solid data manipulation principles is that any visualization created in this manner automatically takes care of loading the data itself.

I am also exploring if this would be sufficiently powerful and expressive to query and visualize the answer to any reasonable question. Furthermore, I will explore whether having a unified mental model for querying data and visualizing it will speed up development time or just cause confusion. And most importantly, would this allow business users to get more insights from their data?

In my talk at Strata Santa Clara, I will discuss the ideas behind Facet. I will talk about its structure, development, and general business use. Finally, I will also explore the potential application of Facet within the Metamarkets framework (as a query language on top of Druid), and as a tool for big data exploration. I look forward to seeing you at my talk.

tags: , , ,