ENTRIES TAGGED "open data"
Report from 2013 Health Privacy Summit
The timing was superb for last week’s Health Privacy Summit, held on June 5 and 6 in Washington, DC. First, it immediately followed the 2000-strong Health Data Forum (Health Datapalooza), where concern for patients rights came up repeatedly. Secondly, scandals about US government spying were breaking out and providing a good backdrop for talking about protection our most sensitive personal information–our health data.
The health privacy summit, now in its third year, provides a crucial spotlight on the worries patients and their doctors have about their data. Did you know that two out of three doctors (and probably more–this statistic cites just the ones who admit to it on a survey) have left data out of a patient’s record upon the patient’s request? I have found that the summit reveals the most sophisticated and realistic assessment of data protection in health care available, which is why I look forward to it each year. (I’m also on the planning committee for the summit.) For instance, it took a harder look than most observers at how health care would be affected by patient access to data, and the practice of sharing selected subsets of data, called segmentation.
What effect would patient access have?
An odd perceptual discontinuity exists around patient access to health records. If you go to your doctor and ask to see your records, chances are you will be turned down outright or forced to go through expensive and frustrating magical passes. One wouldn’t know that HIPAA explicitly required doctors long ago to give patients their data, or that the most recent meaningful use rules from the Department of Health and Human Services require doctors to let patients view, download, and transmit their information within four business days of its addition to the record.
U.S. opens data, Wong tapped for U.S. chief privacy officer, FBI might read your email sans warrant, and big data spells trouble for anonymity.
U.S. government data to be machine-readable, Nicole Wong may fill new White House chief privacy officer role
The U.S. government took major steps this week to open up government data to the public. U.S. President Obama signed an executive order requiring government data to be made available in machine-readable formats, and the Office of Management and Budget and the Office of Science and Technology Policy released a Open Data Policy memo (PDF) to address the order’s implementation.
The press release announcing the actions notes the benefit the U.S. economy historically has experienced with the release of government data — GPS data, for instance, sparked a flurry of innovation that ultimately contributed “tens of billions of dollars in annual value to the American economy,” according to the release. President Obama noted in a statement that he hopes a similar result will come from this open data order: “Starting today, we’re making even more government data available online, which will help launch even more new startups. And we’re making it easier for people to find the data and use it, so that entrepreneurs can build products and services we haven’t even imagined yet.”
Anonymized phone data isn't as anonymous as we thought, a CFPB API, and NYC's "geek squad of civic-minded number-crunchers."
Mobile phone mobility traces ID users with only four data points
A study published this week by Scientific Reports, Unique in the Crowd: The privacy bounds of human mobility, shows that the location data in mobile phones is posing an anonymity risk. Jason Palmer reported at the BBC that researchers at MIT and the Catholic University of Louvain reviewed data from 15 months’ worth of phone records for 1.5 million people and were able to identify “mobility traces,” or “evident paths of each mobile phone,” using only four locations and times to positively identify a particular user. Yves-Alexandre de Montjoye, the study’s lead author, told Palmer that “[t]he way we move and the behaviour is so unique that four points are enough to identify 95% of people.”
Opening data in Congress is a marathon, not a sprint. The 113th Congress is making notable, incremental progress on open government.
It was a good week for open government data in the United States Congress. On Tuesday, the Clerk of the House made House floor summaries available in bulk XML format. Yesterday, the House of Representatives announced that it will make all of its legislation available for bulk download in a machine-readable format, XML, in cooperation with the U.S. Government Printing Office. As Nick Judd observes at TechPresident, such data is catnip for developers. While full bulk data from THOMAS.gov is still not available, this incremental progress deserves mention.
Successful startups look to solve a problem first, then look for the datasets they need.
“If you go back to how we got started,” mused Josh Green, “government data really is at the heart of that story.” Green, who co-founded Panjiva with Jim Psota in 2006, was demonstrating the newest version of Panjiva.com to me over the web, thinking back to the startup’s origins in Cambridge, Mass.
At first blush, the search engine for products, suppliers and shipping services didn’t have a clear connection to the open data movement I’d been chronicling over the past several years. His account of the back story of the startup is a case study that aspiring civic entrepreneurs, Congress and the White House should take to heart.
“I think there are a lot of entrepreneurs who start with datasets,” said Green, “but it’s hard to start with datasets and build business. You’re better off starting with a problem that needs to be solved and then going hunting for the data that will solve it. That’s the experience I had.”
The problem that the founders of Panjiva wanted to help address was one that many other entrepreneurs face: how do you connect with companies in far away places? Green came to the realization that a better solution was needed in the same way that many people who come up with an innovative idea do: he had a frustrating experience and wanted to scratch his own itch. When he was working at an electronics company earlier in his career, his boss asked him to find a supplier they could do business with in China.
“I thought I could do that, but I was stunned by the lack of reliable information,” said Green. “At that moment, I realized we were talking about a problem that should be solvable. At a time when people are interested in doing business globally, there should be reliable sources of information. So, let’s build that.”
Today, Panjiva has created a higher tech way to find overseas suppliers. The way they built it, however, deserves more attention.
When natural disasters loom, public open government data feeds become critical infrastructure.
Just over fourteen months ago, social, mapping and mobile data told the story of Hurricane Irene. As a larger, more unusual late October storm churns its way up the East Coast, the people in its path are once again acting as sensors and media, creating crisis data as this “Frankenstorm” moves over them.As citizens look for hurricane information online, government websites are under high demand. In late 2012, media, government, the private sector and citizens all now will play an important role in sharing information about what’s happening and providing help to one another.
In that context, it’s key to understand that it’s government weather data, gathered and shared from satellites high above the Earth, that’s being used by a huge number of infomediaries to forecast, predict and instruct people about what to expect and what to do. In perhaps the most impressive mashup of social and government data now online, an interactive Google Crisis Map for Hurricane Sandy pictured below predicts the future of the ‘Frankenstorm’ in real-time, including a NYC-specific version.
Want to build a business on open data? Add value by solving a problem for your users.
Hjalmar Gislason commented earlier this year that open data has been all about apps. In the future, it should be about much more than consumer-facing tools. “Think also about the less sexy cases that can help a few people save us millions of dollars in aggregate, generate new insights and improve decision making on various levels,” he suggested.
Today, the founder and CEO of DataMarket told the audience of the first White House Energy Datapalooza that his company would make energy data more discoverable and usable. In doing so, Datamarket will be be tapping into an emerging data economy of businesses using open government data.
“We are honored to have been invited to take part in this fantastic initiative,” said Gislason in a prepared statement. “At DataMarket we focus on doing one thing well: aggregating vast amounts of heterogeneous data to help business users with their planning and decision-making. Our new energy portal applies this know-how to the US government’s energy data, for the first time enabling these valuable resources to be searched, visualized and shared through one gateway and in combination with other domestic and worldwide open data sources.”
Energy.datamarket.com, which won’t go live officially until mid-October, will offer search for 10 thousand data sets, 2 million time series and 50 million energy facts. DataMarket.com is based upon data from thirteen different data providers including the U.S. Department of Energy’s Energy Information Agency (EIA), Oak Ridge National Laboratory, Energy Efficiency and Renewable Energy program, National Renewable Energy Laboratory, the Environmental Protection Agency (EPA), the Bureau of Transportation Statistics, the World Bank and United Nations.
Last week, I interviewed Gislason about his company and why they’re focusing on energy data.
The common thread among the Knight Foundation's latest grants: practical application of open data.
Data, on its own, locked up or muddled with errors, does little good. Cleaned up, structured, analyzed and layered into stories, data can enhance our understanding of the most basic questions about our world, helping journalists to explain who, what, where, how and why changes are happening.
Last week, the Knight Foundation announced the winners of its first news challenge on data. These projects are each excellent examples of working on stuff that matters: they’re collective investments in our digital civic infrastructure. In the 20th century, civil society and media published the first websites. In the 21st century, civil society is creating, cleaning and publishing open data.
The grants not only support open data but validate its place in the media ecosystem of 2012. The Knight Foundation is funding data science, accelerating innovation in the journalism and media space to help inform and engage communities, a project that they consider “vital to democracy.”
Why? Consider the projects. Safecast creates networked accountability using sensors, citizen science and open source hardware. LocalData is a mobile method for communities to collect information about themselves and make sense of it. Open Elections will create a free, standardized database stream of election results. Development Seed will develop better tools to contribute to and use OpenStreetMap, the “Wikipedia of maps.” Pop Up Archive will develop an easier way to publish and archive multimedia data to the Internet. And Census.IRE.org will improve the ability of a connected nation and its data editors to access and use the work of U.S. Census Bureau.
The projects hint at a future of digital open government, journalism and society founded upon the principles that built the Internet and World Wide Web and strengthened by peer networks between data journalists and civil society. A river of open data flows through them all. The elements and code in them — small pieces, loosely joined by APIs, feeds and the social web — will extend the plumbing of digital democracy in the 21st century.
The United States National Institutes of Health (NIH) wants to tie development of mobile health apps to evidence-based research, and it hopes to do that with a new grant program. The imperative to align developers with research is urgent, given the strong interest in health IT, mobile health and health data. There are significant challenges for the space, from consumer concerns over privacy and mobile applications to the broader question of balancing health data innovation with patient rights.
To learn more about what’s happening with mobile health apps, health data, behavioral change and cancer research, I recently interviewed Dr. Abdul Sheikh. Our interview, lightly edited for content and clarity, follows.
What led you to your current work at NIH?
Dr. Abdul Sheikh: I’ve always had a strong grounding in public health and population health, but I also have a real passion for technology and informatics. What’s beautiful is, in my current position here as a program director at the National Cancer Institute (NCI), I have a chance to meld these worlds of public health, behavior and communication science with my passion for technology and informatics. Some of the work I did before coming to the NIH was related to the early telemedicine and web-based health promotion efforts that the government of Canada was involved in.
At NCI, I direct a portfolio of research on technology-mediated communication. I’ve also had the chance to get involved and provide leadership on two very cool efforts. One of them is leadership for our division’s Small Business Innovation Research Program (SBIR). I’ve led the first NIH developer challenge competitions as well.
Big data and big problems, open data monetization, Hortonworks' first year, and a new Hadoop Partner Ecosystem launches
Here are a few stories that caught my attention in the data space this week.
Big data, Big Brother, big problems
Adam Frank took a look at some of the big problems with big data this week over at NPR. Franks addresses issues in analyzing the sheer volume of complex information inherent in big data. Learning to sort through and mine vasts amounts of data to extrapolate meaning will be a “trick,” he writes, but it turns out the big problems with big data go deeper than volume.
Creating computer models to simulate complex systems with big data, Franks notes, ultimately creates something a bit different from reality: “the very act of bringing the equations over to digital form means you have changed them in subtle ways and that means you are solving a slightly different problem than the real-world version.” Analysis, therefore, “requires trained skepticism, sophistication and, remarkably, some level of intuition about the systems we study,” he writes.
Franks also raises the problem of big data becoming a threat to individuals within society:
“Everyday we are scattering ‘digital breadcrumbs’ into the data-verse. Credit card purchases, cell phone calls, Internet searches: Big Data means memory storage has become so cheap that all data about all those aspects of our lives can be harvested and put to use. And it’s exactly the use of all that harvested data that can pose a threat to society.”
The threat comes from the Big Brother aspect of being constantly monitored in ways we’ve never before imagined, and Franks writes, “It may also allows levels of manipulation that are new and truly unimaginable.” You can read more of Franks thoughts on what it means to live in the age of big data here. (We’ve covered related ethics issues with big data here on Strata.)