We’re publishing a new Strata Gem each day all the way through to December 24. Yesterday’s Gem: Explore and visualize graphs with Gephi.
Where better to start analyzing social networks than with your own? Using the graphing tool Gephi and a little bit of Python script, you can analyze your own Twitter network, revealing the inherent structure among those you follow. It’s also a fun way to learn more about network analysis.
Inspired by the LinkedIn Gephi graphs, I analyzed my Twitter friend network. I took everybody that I followed on Twitter, and found out who among them followed each other. I’ve shared the Python code I used to do this on gist.github.com.
To use the script, you need to create a Twitter application and use command-line OAuth authentication to get the tokens to plug into the script. Writing about that is a bit gnarly for this post, but the easiest way I’ve found to authenticate a script with OAuth is by using the
oauth command-line tool that ships with the Ruby OAuth gem.
The output of my Twitter-reading tool is a graph, in GraphML, suitable for import into Gephi. The graph has a node for each person, and an edge for each “follows” relationship. On initial load into Gephi, the graph looks a bit like a pile of spider webs, not showing much information.
I wanted to show a couple of things in the graph: cluster closely related people, and highlight who are the well-connected people. To find related groups of people, you can use Gephi to analyze the modularity of the network, and then color nodes according to the discovered communities. To find the well-connected people, run the “Degree Power Law” statistic in Gephi, which will calculate the betweenness centrality for each person, which essentially computes how much of a hub they are.
These steps are neatly laid out in a great slide deck from Sociomantic Labs on analyzing Facebook social networks. Follow the tips there and you’ll end up with a beautiful graph of your network that you can export to PDF from Gephi.
Overview of my social graph: click to view the full PDF version
The final result for my network is shown above. If you download the full PDF, you’ll notice there are several communities, which I’ll explain for interest. The mass of pink is predominantly my O’Reilly contacts, dark green shows the Strata and data community, the lime green the Mono and GNOME worlds, mustard shows the XML and open source communities. The balance of purple is assorted technologist friends.
Finally my sporting interests are revealed: the light blue are cricket fans and commentators, the red Formula 1 motor racing. Unsurprisingly, Tim O’Reilly, Stephen Fry and Miguel de Icaza are big hubs in my network. Your own graphs will reveal similar clusters of people and interests.
If this has whetted your appetite, you can discover more about mining social networks at Matthew Russell’s Strata session, Unleashing Twitter Data For Fun And Insight.