ENTRIES TAGGED "Twitter"
In some key use cases a random sample of tweets can capture important patterns and trends
Researchers and companies who need social media data frequently turn to Twitter’s API to access a random sample of tweets. Those who can afford to pay (or have been granted access) use the more comprehensive feed (the firehose) available through a group of certified data resellers. Does the random sample of tweets allow you to capture important patterns and trends? I recently came across two papers that shed light on this question.
Systematic comparison of the Streaming API and the Firehose
A recent paper from ASU and CMU compared data from the streaming API and the firehose, and found mixed results. Let me highlight two cases addressed in the paper: identifying popular hashtags and influential users.
Of interest to many users is the list of top hashtags. Can one identify the “top n” hastags using data made available throughthe streaming API? The graph below is a comparison of the streaming API to the firehose: n (as in “top n” hashtags) vs. correlation (Kendall’s Tau). The researchers found that the streaming API provides a good list of hashtags when n is large, but is misleading for small n.
Twitter has hired Guardian Data editor Simon Rogers as its first data editor.
Twitter has hired its first data editor. Simon Rogers, one of the leading practitioners of data journalism in the world, will join Twitter in May. He will be moving his family from London to San Francisco and applying his skills to telling data-driven stories using tweets. James Ball will replace him as the Guardian’s new data editor.
As a data editor, will Rogers keep editing and producing something that we’ll recognize as journalism? Will his work at Twitter be different than what Google Think or Facebook Stories delivers? Different in terms of how he tells stories with data? Or is the difference that Twitter has a lot more revenue coming in or sees data-driven storytelling as core to driving more business? (Rogers wouldn’t comment on those counts.)
A sneak peek at an upcoming visualization session from the 2013 Strata Conference in Santa Clara, Calif.
Strata Editor’s Note: Over the next few weeks, the Strata Community Site will be providing sneak peeks of upcoming sessions at the Strata Conference in Santa Clara. Nicolas’ sneak peek is the first in this series.
Last year was a great year for data visualization at Twitter. Our Analytics team expanded and created a dedicated data visualization team, and some of our projects were released publicly with great feedback.
Our first public interactive of 2012 was a fun way to expose how the Eurocup was experienced at Twitter. You can see in this organic visualization how people cheered for their teams during each match, and how the tension and volume of tweets increased towards the finals.
Data analysis shows the structure of a network can separate true influencers from fake accounts.
There has been a lot of discussion recently about the effect fake Twitter accounts have on brands trying to keep track of social media engagement. A recent tweet spam attack offers an instructive example.
On the morning of October 1, the delegates attending the Strata Conference in London started to notice that a considerable number of spam tweets were being sent using the #strataconf hashtag. Using a tool developed by Bloom Agency, with data from DataSift, an analysis has been done that sheds light on the spam attack directed at the conference.
The following diagram shows a snapshot of the Twitter conversation after a few tweets had been received containing the #strataconf hashtag. Each red or blue line represents a connection between two Twitter accounts and shows how information flowed as a result of the tweet being sent. By 11 a.m., individual communities had started to emerge that were talking to each other about the conference, and these can clearly be seen in the diagram.
Datasift offers more access to the Twitter archive, and a proposal for a data school.
In this week's data news, Datasift will offer deeper access to old tweets, P2PU and the Open Knowledge Foundation announce a School of Data.
A new look at Yahoo's traffic, the challenge of scaling Tumblr, and a host of visualization guidelines.
In this week's data news: Yahoo visualizes its front page traffic and demographics, why Tumblr is tougher to scale than Twitter, and a look at what you need to consider as you build visualizations.
A huge visualization captures tweets from the SOPA protest.
This week's visualization comes from Fred Benenson, who ranked and mapped tweets related to the SOPA protest.
EMC study looks at the state of data science, Carrier IQ and big data, and the welcome return of old tweets.
In this week's data news: EMC's new data science study predicts a data scientist shortage, why Carrier IQ is part of a "bizarre big-data triangle," and DataSift will soon offer access to an archive of old tweets.