ENTRIES TAGGED "Twitter"

Pattern-detection and Twitter’s Streaming API

In some key use cases a random sample of tweets can capture important patterns and trends

Researchers and companies who need social media data frequently turn to Twitter’s API to access a random sample of tweets. Those who can afford to pay (or have been granted access) use the more comprehensive feed (the firehose) available through a group of certified data resellers. Does the random sample of tweets allow you to capture important patterns and trends? I recently came across two papers that shed light on this question.

Systematic comparison of the Streaming API and the Firehose
A recent paper from ASU and CMU compared data from the streaming API and the firehose, and found mixed results. Let me highlight two cases addressed in the paper: identifying popular hashtags and influential users.

Of interest to many users is the list of top hashtags. Can one identify the “top n” hastags using data made available throughthe streaming API? The graph below is a comparison of the streaming API to the firehose: n (as in “top n” hashtags) vs. correlation (Kendall’s Tau). The researchers found that the streaming API provides a good list of hashtags when n is large, but is misleading for small n.

streaming api vs firehose

Read more…

Comment |

Strata Week: Movers and shakers on the data journalism front

Reuters' Connected China, accessing Pew's datasets, Simon Rogers' move to Twitter, data privacy solutions, and Intel's shift away from chips.

Reuters launches Connected China, Pew instructs on downloading its data, and Twitter gets a data editor

Yue Qiu and Wenxiong Zhang took a look this week at a data journalism effort by Reuters, the Connected China visualization application. Qiu and Zhang report that “[o]ver the course of about 18 months, a dozen bilingual reporters based in Hong Kong dug into government websites, government reports, policy papers, Mainland major publications, English news reporting, academic texts, and think-tank reports to build up the database.”

Read more…

Comment |

Finding and telling data-driven stories in billions of tweets

Twitter has hired Guardian Data editor Simon Rogers as its first data editor.

GD*15341872

Simon Rogers

Twitter has hired its first data editor. Simon Rogers, one of the leading practitioners of data journalism in the world, will join Twitter in May. He will be moving his family from London to San Francisco and applying his skills to telling data-driven stories using tweets. James Ball will replace him as the Guardian’s new data editor.

As a data editor, will Rogers keep editing and producing something that we’ll recognize as journalism? Will his work at Twitter be different than what Google Think or Facebook Stories delivers? Different in terms of how he tells stories with data? Or is the difference that Twitter has a lot more revenue coming in or sees data-driven storytelling as core to driving more business? (Rogers wouldn’t comment on those counts.)

Read more…

Comments: 2 |

Exploring web standards for high data density visualizations

A sneak peek at an upcoming visualization session from the 2013 Strata Conference in Santa Clara, Calif.

Strata Editor’s Note: Over the next few weeks, the Strata Community Site will be providing sneak peeks of upcoming sessions at the Strata Conference in Santa Clara. Nicolas’ sneak peek is the first in this series. 

Last year was a great year for data visualization at Twitter. Our Analytics team expanded and created a dedicated data visualization team, and some of our projects were released publicly with great feedback.

Our first public interactive of 2012 was a fun way to expose how the Eurocup was experienced at Twitter. You can see in this organic visualization how people cheered for  their teams during each match, and how the tension and volume of  tweets increased towards the finals.

NB StrataSC 2013 image1

Read more…

Comment |

Deconstructing a Twitter spam attack

Data analysis shows the structure of a network can separate true influencers from fake accounts.

There has been a lot of discussion recently about the effect fake Twitter accounts have on brands trying to keep track of social media engagement. A recent tweet spam attack offers an instructive example.

On the morning of October 1, the delegates attending the Strata Conference in London started to notice that a considerable number of spam tweets were being sent using the #strataconf hashtag. Using a tool developed by Bloom Agency, with data from DataSift, an analysis has been done that sheds light on the spam attack directed at the conference.

The following diagram shows a snapshot of the Twitter conversation after a few tweets had been received containing the #strataconf hashtag. Each red or blue line represents a connection between two Twitter accounts and shows how information flowed as a result of the tweet being sent. By 11 a.m., individual communities had started to emerge that were talking to each other about the conference, and these can clearly be seen in the diagram.

Strataconf tweeting communities

Read more…

Comment: 1 |
Strata Week: Datasift lets you mine two years of Twitter data

Strata Week: Datasift lets you mine two years of Twitter data

Datasift offers more access to the Twitter archive, and a proposal for a data school.

In this week's data news, Datasift will offer deeper access to old tweets, P2PU and the Open Knowledge Foundation announce a School of Data.

Comment |
Strata Week: The data behind Yahoo’s front page

Strata Week: The data behind Yahoo’s front page

A new look at Yahoo's traffic, the challenge of scaling Tumblr, and a host of visualization guidelines.

In this week's data news: Yahoo visualizes its front page traffic and demographics, why Tumblr is tougher to scale than Twitter, and a look at what you need to consider as you build visualizations.

Comment |

Visualization of the Week: Visualizing SOPA tweets

A huge visualization captures tweets from the SOPA protest.

This week's visualization comes from Fred Benenson, who ranked and mapped tweets related to the SOPA protest.

Comment |
Strata Week: The looming data science talent shortage

Strata Week: The looming data science talent shortage

EMC study looks at the state of data science, Carrier IQ and big data, and the welcome return of old tweets.

In this week's data news: EMC's new data science study predicts a data scientist shortage, why Carrier IQ is part of a "bizarre big-data triangle," and DataSift will soon offer access to an archive of old tweets.

Comment: 1 |
Strata Week: Why ThinkUp matters

Strata Week: Why ThinkUp matters

ThinkUp and data ownership, DataSift turns on its Twitter firehose, and Google cracks opens the door to BigQuery.

Data democratization gets an important new tool with the release of ThinkUp 1.0. Also, DataSift offers another way to get the Twitter firehose, and Google offers a little more access to its BigQuery data analytics service.

Comment |