ENTRIES TAGGED "open data"
Dr. Stephen Friend on open science and the need for a "GitHub for scientists."
To unlock the potential of health data for the public good, balancing health privacy with innovation will rely on improving informed consent. If the power of big data is to be applied to scientific inquiry in health care, unlocking genetic secrets, finding a cure for breast cancer or “preemptive health care,” changes in scientific culture and technology will both need to occur.
One element of that change could include a health data commons. Another is open access in the research community. Dr. Stephen Friend, the founder of Sage Bionetworks, is one of the foremost advocates of what I think of as “open science.” Earlier in his career, Dr. Friend was a senior vice president at Merck & Co., Inc., where he led the pharmaceutical company’s basic cancer research program.
In a recent interview, Dr. Friend explained what open science means to him and what he’s working on today. For more on the synthesis of open source with genetics, watch Andy Oram’s interview with Dr. Friend and read his series on recombinant research and Sage Congress.
Matching the missing to the dead involves reconciling two national databases.
Javier Reveron went missing from Ohio in 2004. His wallet turned up in New York City, but he was nowhere to be found. By the time his parents arrived to search for him and hand out fliers, his remains had already been buried in an unmarked indigent grave. In New York, where coroner’s resources are precious, remains wait a few months to be claimed before they’re buried by convicts in a potter’s field on uninhabited Hart Island, just off the Bronx in Long Island Sound.
The story, reported by the New York Times last week, has as happy an ending as it could given that beginning. In 2010 Reveron’s parents added him to a national database of missing persons. A month later police in New York matched him to an unidentified body and his remains were disinterred, cremated and given burial ceremonies in Ohio.
Reveron’s ordeal suggests an intriguing, and impactful, machine-learning problem. The Department of Justice maintains separate national, public databases for missing people, unidentified people and unclaimed people. Many records are full of rich data that is almost never a perfect match to data in other databases — hair color entered by a police department might differ from how it’s remembered by a missing person’s family; weights fluctuate; scars appear. Photos are provided for many missing people and some unidentified people, and matching them is difficult. Free-text fields in many entries describe the circumstances under which missing people lived and died; a predilection for hitchhiking could be linked to a death by the side of a road.
I’ve called the Department of Justice (DOJ) to ask about the extent to which they’ve worked with computer scientists to match missing and unidentified people, and will update when I hear back. One thing that’s not immediately apparent is the public availability of the necessary training set — cases that have been successfully matched and removed from the lists. The DOJ apparently doesn’t comment on resolved cases, which could make getting this data difficult. But perhaps there’s room for a coalition to request the anonymized data and manage it to the DOJ’s satisfaction while distributing it to capable data scientists.
John Wilbanks on health data donation, contextual privacy, and open networks.
As I wrote earlier this year in an ebook on data for the public good, while the idea of data as a currency is still in its infancy, it’s important to think about where the future is taking us and our personal data.
If the Obama administration’s smart disclosure initiatives gather steam, more citizens will be able to do more than think about personal data: they’ll be able to access their financial, health, education, or energy data. In the U.S. federal government, the Blue Button initiative, which initially enabled veterans to download personal health data, is now spreading to all federal employees, and it also earned adoption at private institutions like Aetna and Kaiser Permanente. Putting health data to work stands to benefit hundreds of millions of people. The Locker Project, which provides people with the ability to move and store personal data, is another approach to watch.
The promise of more access to personal data, however, is balanced by accompanying risks. Smartphones, tablets, and flash drives, after all, are lost or stolen every day. Given the potential of mhealth, and big data and health care information technology, researchers and policy makers alike are moving forward with their applications. As they do so, conversations and rulemaking about health care privacy will need to take into account not just data collection or retention but context and use.
Put simply, businesses must confront the ethical issues tied to massive aggregation and data analysis. Given that context, Fred Trotter’s post on who owns health data is a crucial read. As Fred highlights, the real issue is not ownership, per se, but “What rights do patients have regarding health care data that refers to them?”
Would, for instance, those rights include the ability to donate personal data to a data commons, much in the same way organs are donated now for research? That question isn’t exactly hypothetical, as the following interview with John Wilbanks highlights.
Wilbanks, a senior fellow at the Kauffman Foundation and director of the Consent to Research Project, has been an advocate for open data and open access for years, including a stint at Creative Commons; a fellowship at the World Wide Web Consortium; and experience in the academic, business, and legislative worlds. Wilbanks will be speaking at the Strata Rx Conference in October.
Our interview, lightly edited for content and clarity, follows.
Dyson says it's time to focus on maintaining good health, as opposed to healthcare.
If we look ahead to the next decade, it’s worth wondering whether the way we think about health and health care will have shifted. Will health care technology be a panacea? Will it drive even higher costs, creating a broader divide between digital haves and have-nots? Will opening health data empower patients or empower companies?
As ever, there will be good outcomes and bad outcomes, and not just in the medical sense. There’s a great deal of thought around the potential for mobile applications right now, from the FDA’s potential decision to regulate them to a reported high abandonment rate. There are also significant questions about privacy, patient empowerment and meaningful use of electronic health care records.
When I’ve talked to US CTO Todd Park or Dr. Farzad Mostashari they’ve been excited about the prospect for health data to fuel better dashboards and algorithms to give frontline caregivers access to critical information about people they’re looking after, providing critical insight at the point of contact.
Kathleen Sebelius, the U.S. Secretary for Health and Human Services, said at this year’s Health Datapalooza that venture capital investment in the health care IT area is up 60% since 2009.
Rep. Issa expressed support for reforming FOIA to include personal data held by companies.
The Freedom of Information Act (FOIA), which gives the people and press the right to access information from government, is one of the pillars of open government in the modern age. In the United States, FOIA is relatively new — it was originally enacted on July 4, 1966. As other countries around the world enshrine the principle into their legal systems, new questions about FOIA are arising, particularly when private industry takes on services that previously were delivered by government.
In that context, one of the federal open government initiatives worth watching in 2012 is ‘smart disclosure,’ the targeted release of information about citizens or about services they consume by government and by private industry. Smart disclosure is notable because there’s some “there there.” It’s not just a matter of it being one of the “flagship open government initiatives” under the U.S. National Plan for open government or that a White House Smart Disclosure Summit in March featured a standing room only audience at the National Archives. When compared to other initiatives, there has been relatively strong uptake of data from government and the private sector and its use in the consumer finance sector. Citizens can download their bank records and use them to make different decisions.
Earlier this summer, I interviewed Representative Darrell Issa (R-CA) about a number of issues related to open government, including what he thought of “smart disclosure” initiatives.
If legislative efforts to standardize federal government spending data founder in the U.S. Senate, it's a missed opportunity.
The old adage that “you can’t manage what you can’t measure” is often applied to organizations in today’s data-drenched world. Given the enormity of the United States federal government, breaking down the estimated $3.7 trillion dollars in the 2012 budget into its individual allocations, much less drilling down to individual outlays to specific programs and subsequent performance, is no easy task. There are several sources for policy wonks to turn use for applying open data to journalism, but the flagship database of federal government spending at USASpending.gov simply isn’t anywhere near as accurate as it needs to be to source stories. The issues with USASpending.gov have been extensively chronicled by the Sunlight Foundation in its ClearSpending project, which found that nearly $1.3 trillion of federal spending as reported on the open data website was inaccurate.
If the people are to gain more insight into how their taxes are being spent, Congress will need to send President Obama a bill to sign to improve the quality of federal spending data. In the spring of 2012, the U.S. House passed by unanimous voice vote the DATA Act, a signature piece of legislation from Representative Darrell Issa (R-CA). H.R. 2146 requires every United States federal government agency to report its spending data in a standardized way and establish uniform reporting standards for recipients of federal funds.
In a world of big, open data, "privacy by design" will become even more important.
A few weeks ago, Tom Slee published “Seeing Like a Geek,” a thoughtful article on the dark side of open data. He starts with the story of a Dalit community in India, whose land was transferred to a group of higher cast Mudaliars through bureaucratic manipulation under the guise of standardizing and digitizing property records. While this sounds like a good idea, it gave a wealthier, more powerful group a chance to erase older, traditional records that hadn’t been properly codified. One effect of passing laws requiring standardized, digital data is to marginalize all data that can’t be standardized or digitized, and to marginalize the people who don’t control the process of standardization.
That’s a serious problem. It’s sad to see oppression and property theft riding in under the guise of transparency and openness. But the issue isn’t open data, but how data is used.
The British government further embraces open data as a means to transparency and "prosperity."
The Cabinet Office of the United Kingdom released a notable new white paper on open data and relaunched its flagship open data platfrom, Data.gov.uk. This post features interviews on open data with Cabinet Minister Francis Maude, Tim Berners-Lee and Rufus Pollock.
Michael Flowers explains why applying data science to regulatory data is necessary to use city resources better.
A predictive data analytics team in the Mayor's Office of New York City has been quietly using data science to find patterns in regulatory data that can then be applied to law enforcement, public safety, public health and better allocation of taxpayer resources.
Rockstars from music, government and industry convened around healthcare at the 2012 Health Datapalooza
Two years ago, the potential of government making health information as useful as weather data may well have felt like an abstraction to many observers. In June 2012, real health apps and services are here, holding the potential to massive disrupt healthcare for the better.