Twitter has hired Guardian Data editor Simon Rogers as its first data editor.
Twitter has hired its first data editor. Simon Rogers, one of the leading practitioners of data journalism in the world, will join Twitter in May. He will be moving his family from London to San Francisco and applying his skills to telling data-driven stories using tweets. James Ball will replace him as the Guardian’s new data editor.
As a data editor, will Rogers keep editing and producing something that we’ll recognize as journalism? Will his work at Twitter be different than what Google Think or Facebook Stories delivers? Different in terms of how he tells stories with data? Or is the difference that Twitter has a lot more revenue coming in or sees data-driven storytelling as core to driving more business? (Rogers wouldn’t comment on those counts.)
Kate Crawford argues for caution and care in data-driven decision making.
Microsoft principal researcher Kate Crawford (@katecrawford) gave a strong talk at last week’s Strata Conference in Santa Clara, Calif. about the limits of big data. She pointed out potential biases in data collection, questioned who may be excluded from it, and hammered home the constant need for context in conclusions. Video of her talk is embedded below:
Crawford explored many of these same topics in our interview, which follows.
Stephen Goldsmith on the potential of urban predictive data analytics in municipal government.
The last time I spoke with Stephen Goldsmith, he was the Deputy Mayor of New York City, advocating for increased use of “citizensourcing,” where government uses technology tools to tap into the distributed intelligence of residents to understand – and fix – issues around its streets, on its services and even within institutions. In the years since, as a professor at the Ash Center for Democratic Governance and Innovation
at the John F. Kennedy School of Government at Harvard University, the former mayor of Indianapolis has advanced the notion of “preemptive government.”
Opening data in Congress is a marathon, not a sprint. The 113th Congress is making notable, incremental progress on open government.
It was a good week for open government data in the United States Congress. On Tuesday, the Clerk of the House made House floor summaries available in bulk XML format. Yesterday, the House of Representatives announced that it will make all of its legislation available for bulk download in a machine-readable format, XML, in cooperation with the U.S. Government Printing Office. As Nick Judd observes at TechPresident, such data is catnip for developers. While full bulk data from THOMAS.gov is still not available, this incremental progress deserves mention.
Successful startups look to solve a problem first, then look for the datasets they need.
“If you go back to how we got started,” mused Josh Green, “government data really is at the heart of that story.” Green, who co-founded Panjiva with Jim Psota in 2006, was demonstrating the newest version of Panjiva.com to me over the web, thinking back to the startup’s origins in Cambridge, Mass.
At first blush, the search engine for products, suppliers and shipping services didn’t have a clear connection to the open data movement I’d been chronicling over the past several years. His account of the back story of the startup is a case study that aspiring civic entrepreneurs, Congress and the White House should take to heart.
“I think there are a lot of entrepreneurs who start with datasets,” said Green, “but it’s hard to start with datasets and build business. You’re better off starting with a problem that needs to be solved and then going hunting for the data that will solve it. That’s the experience I had.”
The problem that the founders of Panjiva wanted to help address was one that many other entrepreneurs face: how do you connect with companies in far away places? Green came to the realization that a better solution was needed in the same way that many people who come up with an innovative idea do: he had a frustrating experience and wanted to scratch his own itch. When he was working at an electronics company earlier in his career, his boss asked him to find a supplier they could do business with in China.
“I thought I could do that, but I was stunned by the lack of reliable information,” said Green. “At that moment, I realized we were talking about a problem that should be solvable. At a time when people are interested in doing business globally, there should be reliable sources of information. So, let’s build that.”
Today, Panjiva has created a higher tech way to find overseas suppliers. The way they built it, however, deserves more attention.
Predictive analytics, code sharing and distributed intelligence could improve criminal justice, cities and response to pandemics.
If you’re going to try to apply the lessons of “Moneyball” to New York City,’ you’ll need to get good data, earn the support of political leaders and build a team of data scientists. That’s precisely what Mike Flowers has done in the Big Apple, and his team has helped to save lives and taxpayers dollars. At the Strata + Hadoop World conference held in New York in October, Flowers, the director of analytics for the Office of Policy and Strategic Planning in the Office of the Mayor of New York City, gave a keynote talk about how predictive data analytics have made city government more efficient and productive.
While the story that Flowers told is a compelling one, the role of big data in the public sector was in evidence in several other sessions at the conference. Here are three more ways that big data is relevant to the public sector that stood out from my trip to New York City.
Data science played a decisive role in the 2012 election, from the campaigns to the coverage
On Tuesday night, President Barack Obama was elected to a second term in office. In a world of technology and political punditry, the big winner is Nate Silver, the New York Times blogger at Five Thirty Eight. (Break out your dictionaries: a psephologist is a national figure.)
After he correctly called all 50 states, Silver is being celebrated as the “king of the quants” by CNET and the “nerdy Chuck Norris” by Wired. The combined success of statistical models from Silver, TPM PollTracker, HuffPost Pollster, RealClearPolitics Average, and the Princeton Election Consortium all make traditional “horse race journalism” that uses insider information from the campaign trail to explain what’s really going on look a bit, well, antiquated. With the rise of political data science, the Guardian even went so far as to say that big data may sound the death knell for punditry.
This election season should serve, in general, as a wake-up call for data-illiterate journalists. It was certainly a triumph of logic over punditry. At this point, it’s fair to “predict” that Silver’s reputation and the role of data analysis will continue to endure, long after 2012.
The data campaign
The other big tech story to emerge from the electoral fray, however, is the how the campaigns themselves used technology. What social media was to 2008, data-driven campaigning was in 2012. In the wake of this election, people who understand math, programming and data science will be in even higher demand as a strategic advantage in campaigns, from getting out the vote to targeting and persuading voters.
For political scientists and campaign staff, the story of the quants and data crunchers who helped President Obama win will be pored over and analyzed for years to come. For those wondering how the first big data election played out, Sarah Lai Stirland’s analysis of how Obama’s digital infrastructure helped him win re-election is a must-read, as is Nick Judd’s breakdown of former Massachusetts governor Mitt Romney’s digital campaign. The Obama campaign found voters in battleground states that their opponents apparently didn’t know existed. The exit polls suggest that finding and turning out the winning coalition of young people, minorities and women was critical — and data-driven campaigning clearly played a role.
For added insight on the role of data in this campaign, watch O’Reilly Media’s special online conference on big data and elections, from earlier this year. (It’s still quite relevant.) The archive is embedded below:
For more resources and analysis of the growing role of big data in elections and politics, read on.
When natural disasters loom, public open government data feeds become critical infrastructure.
Just over fourteen months ago, social, mapping and mobile data told the story of Hurricane Irene. As a larger, more unusual late October storm churns its way up the East Coast, the people in its path are once again acting as sensors and media, creating crisis data as this “Frankenstorm” moves over them.As citizens look for hurricane information online, government websites are under high demand. In late 2012, media, government, the private sector and citizens all now will play an important role in sharing information about what’s happening and providing help to one another.
In that context, it’s key to understand that it’s government weather data, gathered and shared from satellites high above the Earth, that’s being used by a huge number of infomediaries to forecast, predict and instruct people about what to expect and what to do. In perhaps the most impressive mashup of social and government data now online, an interactive Google Crisis Map for Hurricane Sandy pictured below predicts the future of the ‘Frankenstorm’ in real-time, including a NYC-specific version.
With a new mobile app and API, Captricity wants to build a better bridge between analog and digital.
Unlocking data from paper forms is the problem that optical character recognition (OCR) software is supposed to solve. Two issues persist, however. First, the hardware and software involved are expensive, creating challenges for cash-strapped nonprofits and government. Second, all of the information on a given document is scanned into a system, including sensitive details like Social Security numbers and other personally identifiable information. This is a particularly difficult issue with respect to health care or bringing open government to courts: privacy by obscurity will no longer apply.
The process of converting paper forms into structured data still hasn’t been significantly disrupted by rapid growth of the Internet, distributed computing and mobile devices. Fields that range from research science to medicine to law to education to consumer finance to government all need better, cheaper bridges from the analog to the digital sphere.
“I was looking at the information systems that were available to these low-resource organizations,” Chen said in a recent phone interview. “I saw that they’re very much bound in paper. There’s actually a lot of efforts to modernize the infrastructure and put in mobile phones. Now that there’s mobile connectivity, you can run a health clinic on solar panels and long distance Wi-Fi. At the end of the day, however, business processes are still on paper because they had to be essentially fail-proof. Technology fails all the time. From that perspective, paper is going to stick around for a very long time. If we’re really going to tackle the challenge of the availability of data, we shouldn’t necessarily be trying to change the technology infrastructure first — bringing mobile phones and iPads to where there’s paper — but really to start with solving the paper problem.”
When Chen saw that data entry was a chokepoint for digitizing health indicators, he started working on developing a better, cheaper way to ingest data on forms. Read more…
Want to build a business on open data? Add value by solving a problem for your users.
Hjalmar Gislason commented earlier this year that open data has been all about apps. In the future, it should be about much more than consumer-facing tools. “Think also about the less sexy cases that can help a few people save us millions of dollars in aggregate, generate new insights and improve decision making on various levels,” he suggested.
Today, the founder and CEO of DataMarket told the audience of the first White House Energy Datapalooza that his company would make energy data more discoverable and usable. In doing so, Datamarket will be be tapping into an emerging data economy of businesses using open government data.
“We are honored to have been invited to take part in this fantastic initiative,” said Gislason in a prepared statement. “At DataMarket we focus on doing one thing well: aggregating vast amounts of heterogeneous data to help business users with their planning and decision-making. Our new energy portal applies this know-how to the US government’s energy data, for the first time enabling these valuable resources to be searched, visualized and shared through one gateway and in combination with other domestic and worldwide open data sources.”
Energy.datamarket.com, which won’t go live officially until mid-October, will offer search for 10 thousand data sets, 2 million time series and 50 million energy facts. DataMarket.com is based upon data from thirteen different data providers including the U.S. Department of Energy’s Energy Information Agency (EIA), Oak Ridge National Laboratory, Energy Efficiency and Renewable Energy program, National Renewable Energy Laboratory, the Environmental Protection Agency (EPA), the Bureau of Transportation Statistics, the World Bank and United Nations.
Last week, I interviewed Gislason about his company and why they’re focusing on energy data.