This week we follow the path of data through cities, streets, and border conflicts. We conclude our journey with a little brain work, as a programming challenge is announced to automatically identify topics and trends in Twitter and Facebook updates. And don’t forget registration is now open for Strata 2011, our conference about making data work.
Mining urban life
If you’re interested in massive sources of data, you don’t have to look further than daily life in a major urban center.
November’s Wired magazine features an in-depth look from Steven Johnson at what can be gleaned from 100 million calls to New York’s 311 service. With up to 200 representatives manning the phones, the 311 call center fields around 50,000 calls every day. Calls range from inquiries about services to complaints about street lights, road conditions, and nuisance noise.
Every call to the 311 service is logged, and the data used to help city officials plan. While some trends are obvious, such as an uptick in call volume on holidays, Johnson reports that other more subtle and helpful patterns have emerged:
For example, officials now know that the first warm day of spring will bring a surge in use of the city’s chlorofluorocarbon recycling programs. The connection is logical once you think about it: The hot weather inspires people to upgrade their air conditioners, and they don’t want to just leave the old, Freon-filled units out on the street.
It’s not just the presence of patterns in the data, but also their disruption that can provide insights, such as clusters of complaints about unsanitary conditions in restaurants enabling city health officials to take rapid action.
Though highly successful, the New York City effort is just the start, as Johnson observes:
But even a city government like [Michael] Bloomberg’s, which prides itself on entrepreneurial flair, needs to recognize the limits of its capacity to innovate. For every promising Scout map, there are hundreds of ideas for interesting civic apps lurking in the minds of citizens …
In order for others to innovate, the 311 data should be openly available to build on, perhaps more open than the city authorities can manage. As Johnson writes, one candidate solution to this is Open311, a project from OpenPlans that aims to be a national 311 system by coordinating a standardized, open-access, read/write model for citizens to report non-emergency issues.
Uncovering the social effect of traffic
Another project from OpenPlans is Streetfilms, whose mission is to encourage “livable streets” by producing films supporting community advocacy to help make roads work better for pedestrians and cyclists.
Streetfilms’ phrase “livable streets” refers to the work of David Appleyard, whose research into how people experience streets with different traffic volumes was published in 1981. Appleyard’s work advanced thought on traffic, and showed that heavy traffic has a strongly negative effect on social cohesion.
Revisiting Appleyard’s 1981 work, Streetfilms hase created animated 3-D visualizations of the data collected from the 30-year0old urban planning study.
When errors go big
Whether streets or countries, one of the first uses of data in civilized history was in mapping: a prototypical story of data, its visualization, distribution and use in matters of state and the empowerment of citizens. A recent incident reminds us that errors in such valued data can have great consequences.
One of the most influential sources of mapping data at the moment is the ubiquitous Google Maps. As reported by Search Engine Land, an error in Google Maps recently exacerbated a territorial fracas between Nicaragua and Costa Rica.
A Nicaraguan commander, Eden Pastora, camped in an area on the border between the two countries, removed the Costa Rican flag and planted that of his own country. The commander cited Google Maps as the justification for his actions, and the mapping service was found to be erroneous in its placement of the border.
Addressing the error, Google Maps revealed that the problem arose in data provided by the U.S. Department of State. The Google response helpfully provides more than 100 years of historical explanation of the territorial dispute between Nicaragua and Costa Rica.
It’s likely that the troops moved in before the error was used to justify the action, but this is a grim reminder that there is the potential for significant human cost from data errors.
Registration for Strata 2011 is now open. Save 20% with the code “STR11RAD”
Natural language processing bake-off
Joseph Turian of MetaOptimize has announced a competition in the field of natural language processing (NLP). The challenge is to construct a method for finding the top semantically-related terms over a vocabulary of several million words. The idea is to extract from a corpus a topic, which might be referred to in different ways.
What are the possible applications of such a method? Turian explains:
Increased insight into emerging topics, trends, and new products. Run this on social media updates (Facebook posts, Tweets) after collecting sufficient mentions of a topic, trend or product, and have more insight into what is being discussed.
The challenge is scheduled for the next two weeks, and those interested should head over to Turian’s blog post for more details. Those who just want to learn the results of the challenge can sign up for the mailing list.
What’s the prize for the fastest solution, aside from peer acclamation? Paying work for the winner, according to Turian.
Send us news
Email us news, tips and interesting tidbits at email@example.com.