ENTRIES TAGGED "data visualization"
Human judgment is at the center of successful data analysis. This statement might initially seem at odds with the current Big Data frenzy and its focus on data management and machine learning methods. But while these tools provide immense value, it is important to remember that they are just that: tools. A hammer does not a carpenter make — though it certainly helps.
Consider the words of John Tukey 1, possibly the greatest statistician of the last half-century: “Nothing — not the careful logic of mathematics, not statistical models and theories, not the awesome arithmetic power of modern computers — nothing can substitute here for the flexibility of the informed human mind. Accordingly, both approaches and techniques need to be structured so as to facilitate human involvement and intervention.” Tukey goes on to write: “Some implications for effective data analysis are: (1) that it is essential to have convenience of interaction of people and intermediate results and (2) that at all stages of data analysis the nature and detail of output need to be matched to the capabilities of the people who use it and want it.” Though Tukey and colleagues voiced these sentiments nearly 50 years ago, they ring even more true today. The interested analyst is at the heart of the Big Data question: how well do our tools help users ask better questions, formulate hypotheses, spot anomalies, correct errors and create improved models and visualizations? To “facilitate human involvement” across “all stages of data analysis” is a grand challenge for our age.
A much needed break away from data transparency and privacy issues
I could have focused on the Governments Search for Google Data visualization from Chris Canipe and Madeline Farbman of the Wall Street Journal. Or, I could have focused on Neal Ungerleider’s piece that covers Eric Fisher and MapBox for Gnip’s twitter metadata visualizations. Yet, my curiosity took over once I came across The Economist’s High Spirits graphic. Not only do I make my own bitters which qualifies me for preliminary booze nerd status, I also needed a brief break away from the transparency issues currently dominating the data-oriented conversations. Following my booze nerd curiosity led me to this interactive data visualization of common cocktail ingredients:
Notes and links from the data journalism beat
Data journalism is becoming a truly global practice. Data journalists from the UK, China, and the US are sharing data-oriented best practices, insights, and tools. Journalists in Latin America are meeting this week to push for more transparency and access to data in the region. At the same time, recent revelations about NSA domestic surveillance programs have pushed big data stories to the front pages of US papers. Here are a few links from the past week:
Transparency…or Lack Thereof
- OpenData Latinoamérica: Driving the demand side of data and scraping towards transparency (Neiman Journalism Lab)
“There’s a saying here, and I’ll translate, because it’s very much how we work,” Miguel Paz said to me over a Skype call from Chile. “But that doesn’t mean that it’s illegal. Here, it’s ‘It’s better to ask forgiveness than to ask permission.” Paz is a veteran of the digital news business. The saying has to do with his approach to scraping public data from governments that may be slow to share it.
- The real story in the NSA scandal is the collapse of journalism (zdnet.com)
On Thursday, June 6, the Washington Post published a bombshell of a story, alleging that nine giants of the tech industry had “knowingly participated” in a widespread program by the United States National Security Agency (NSA). One day later, with no acknowledgment except for a change in the timestamp, the Post revised the story, backing down from sensational claims it made originally. But the damage was already done.
- We are shocked, shocked… (davidsimon.com)
Having labored as a police reporter in the days before the Patriot Act, I can assure all there has always been a stage before the wiretap, a preliminary process involving the capture, retention and analysis of raw data. It has been so for decades now in this country. The only thing new here, from a legal standpoint, is the scale on which the FBI and NSA are apparently attempting to cull anti-terrorism leads from that data. But the legal and moral principles? Same old stuff.
- Big Data Has Big Stage at Personal Democracy Forum (pbs.org)
Engaging News Project’s Talia Stroud tackled the issue of public engagement in news organizations. Polls on websites don’t yield scientifically accurate results, nor do they get people to address difficult issues, she said. “These data are junk. We know they’re junk,” Stroud said. “City council representatives know they’re junk. Even news organizations know that the results of these data are junk. The only reason that this poll is being included on the news organization’s site is to increase interactivity and increase your time on page.”
An interview with Scott Murray, author of Interactive Data Visualization for the Web
Scott Murray, a code artist, has written Interactive Data Visualization for the Web for nonprogrammers. In this interview, Scott provides some insights on what inspired him to write an introduction to D3 for artists, graphic designers, journalists, researchers, or anyone that is looking to begin programming data visualizations.
What inspired you to become a code artist?
Scott Murray: I had designed websites for a long time, but several years ago was frustrated by web browsers’ limitations. I went back to school for an MFA to force myself to explore interactive options beyond the browser. At MassArt, I was introduced to Processing, the free programming environment for artists. It opened up a whole new world of programmatic means of manipulating and interacting with data — and not just traditional data sets, but also live “data” such as from input devices or dynamic APIs, which can then be used to manipulate the output. Processing let me start prototyping ideas immediately; it is so enjoyable to be able to build something that really works, rather than designing static mockups first, and then hopefully, one day, invest the time to program it. Something about that shift in process is both empowering and liberating — being able to express your ideas quickly in code, and watch the system carry out your instructions, ultimately creating images and experiences that are beyond what you had originally envisioned.
The Wikipedia Recent Changes Map visualizes Wikipedia edits around the world in real-time.
Stephen LaPorte and Mahmoud Hashemi have put together an addictive visualization of real-time edits on Wikipedia, mapped across the world. Every time an edit is made, the user’s location and the entry they edited are listed along with a corresponding dot on the map.
Visual analysis tools are adding advanced analytics for big data
After recently playing with SAS Visual Analytics, I’ve been thinking about tools for visual analysis. By visual analysis I mean the type of analysis most recently popularized by Tableau, QlikView, and Spotfire: you encounter a data set for the first time, conduct exploratory data analysis, with the goal of discovering interesting patterns and associations. Having used a few visualization tools myself, here’s a quick wish-list of features (culled from tools I’ve used or have seen in action).
Requires little (to no) coding
The viz tools I currently use require programming skills. Coding means switching back-and-forth between a visual (chart) and text (code). It’s nice1 to be able to customize charts via code, but when you’re in the exploratory phase not having to think about code syntax is ideal. Plus GUI-based tools allow you to collaborate with many more users.
Sneak peek at my upcoming session at the Strata Conference in Santa Clara
Visualizing data and extracting it from its data store are two activities that go hand in hand. Typically, when you try to use a data visualization toolkit such as Raphael, Protovis or D3 to create a non-trivial visualization, you spend a significant portion of your time writing code to extract the data. The process may involve querying an external database then transforming the resulting data to the correct structure for your visualization.
In his paper introducing plyr, a data manipulation toolkit for R, Hadley Wickham describes a framework, split-apply-combine, for expressing common data operations. The idea is that most data operations can be seen as splitting the data into a series of buckets, applying some aggregation to each bucket to get an aggregate and then combining the results by sorting and limiting. Wickham argues that most data query languages already rely on an equivalent framework whether explicitly or implicitly.
A sneak peek at an upcoming visualization session from the 2013 Strata Conference in Santa Clara, Calif.
Strata Editor’s Note: Over the next few weeks, the Strata Community Site will be providing sneak peeks of upcoming sessions at the Strata Conference in Santa Clara. Nicolas’ sneak peek is the first in this series.
Last year was a great year for data visualization at Twitter. Our Analytics team expanded and created a dedicated data visualization team, and some of our projects were released publicly with great feedback.
Our first public interactive of 2012 was a fun way to expose how the Eurocup was experienced at Twitter. You can see in this organic visualization how people cheered for their teams during each match, and how the tension and volume of tweets increased towards the finals.
The Washington Post developed an interactive map using data from area homicides from 2000 through 2011.
Residents in Washington D.C., or citizens considering a move to D.C., have a new tool to assess the city’s homicide rate. As part of a 15-month investigative study, The Washington Post has created an interactive map of the homicides in D.C. from 2000 through 2011. The interactive tool lets users drill down into the information by demographic, motive and manner of murder, for instance — all of which can also be isolated by neighborhood or by individual homicide.