ENTRIES TAGGED "health data"
By Julie Yoo, Chief Product Officer at Kyruus
Once upon a time, a world-renowned surgeon, Dr. Michael DeBakey, was summoned by the President when the Shah of Iran, a figure of political and strategic importance, fell ill with an enlarged spleen due to cancer. Dr. DeBakey was whisked away to Egypt to meet the Shah, made a swift diagnosis, and recommended an immediate operation to remove the spleen. The surgery lasted 80 minutes; the spleen, which had grown to 10 times its normal size, was removed, and the Shah made a positive recovery in the days following the surgery – that is, until he took a turn for the worse, and ultimately died from surgical complications a few weeks later. 
Sounds like a routine surgery gone awry, yes? But consider this: Dr. DeBakey was a cardiovascular surgeon – in other words, a surgeon whose area of specialization was in the operation of the heart and blood vessels, not the spleen. He was most well-known for his open heart bypass surgery techniques, and the vast majority of his peer-reviewed articles relate to cardiology-related operating techniques. High profile or not, why was a cardiovascular surgeon selected to perform an abdominal surgery?
Exploring an upcoming Strata Rx 2013 session on big data and privacy
Databases of health data are widely shared among researchers and for commercial purposes, and they are even put online in order to promote health research and data-driven health app development, so preserving the privacy of patients is critical. But are these data sets de-identified properly? If not, it could be re-identified. Just look at the two high profile re-identification attacks that have been publicized in recent months.
The first attack involved individuals who voluntarily published their genomic data online as a way to support open data for research. Besides their genomic data, they posted their basic demographics such as date of birth and zip code. The demographic data, not their genomic data, was used to re-identify a subset of the individuals.
Researchers begin to scale up pattern recognition, machine-learning, and data management tools.
My first job after leaving academia was as a quant1 for a hedge fund, where I performed (what are now referred to as) data science tasks on financial time-series. I primarily used techniques from probability & statistics, econometrics, and optimization, with occasional forays into machine-learning (clustering, classification, anomalies). More recently, I’ve been closely following the emergence of tools that target large time series and decided to highlight a few interesting bits.
Time-series and big data:
Over the last six months I’ve been encountering more data scientists (outside of finance) who work with massive amounts of time-series data. The rise of unstructured data has been widely reported, the growing importance of time-series much less so. Sources include data from consumer devices (gesture recognition & user interface design), sensors (apps for “self-tracking”), machines (systems in data centers), and health care. In fact some research hospitals have troves of EEG and ECG readings that translate to time-series data collections with billions (even trillions) of points.
Five ways we can improve the information we collect to help us solve hard problems in health care.
I was honored to chair O’Reilly’s inaugural edition of Strata Rx, our conference on data science in health care, this past October along with Colin Hill. As we’re beginning to plan this year’s event, I find myself thinking a lot about a theme that emerged from some of the keynotes last fall: in order to solve the problems we’re facing in health care — to lower costs and provide more personal, targeted treatments to patients — we don’t just need more data; we need better data.
Much has been made about the era of big data we find ourselves in. But though the data we collect is straining the limits of our tools and models, we’re still not making the kind of headway we hoped for in areas like health care. So big data isn’t enough. We need better data.
What does it mean to have better data in health care? Here are some things on my list; perhaps you can think of others. Read more…
Which data formats should the DocGraph project support?
The DocGraph project has an interesting issue that I think will become a common one as the open data movement continues. For those that have not been keeping up, DocGraph was announced at Strata RX, described carefully on this blog, and will be featured again at Strata 2013. For those that do not care to click links, DocGraph is a crowdfunded open data set, which merges open data sources on doctors and hospitals.
As I recently described on the DocGraph mailing list, work is underway to acquire the data sets that we set out to merge. The issue deals with file formats.
The core identifier for doctors, hospitals and other healthcare entities is the National Provider Identifier (NPI). This is something like a Social Security number for doctors and hospitals. In fact it was created in part so that doctors would not need to use their Social Security numbers or other identifiers in order to participate in healthcare financial transactions (i.e. paid by insurance companies for their services). The NPI is the “one number to rule them” in healthcare and we want to map data from other sources accurately to that ID.
Each state releases none, one or several data files that can be purchased and also contain doctor data. But these file downloads are in “random file format X.” Of course we are not yet done with our full survey of the files and their formats, but I can assure you that they are mostly CSV files and a troubling number of PDF files. It is our job to take these files and merge them against the NPI, in order to provide a cohesive picture for data scientists.
But the data available from each state varies greatly. Sometimes they will have addresses, sometimes not. Sometimes they will have fax numbers, sometimes not, sometimes they will include medical school information, some will not. Sometimes they will simply include the name of the medical school, sometimes they will use a code. Sometimes when they use codes they will make up their own …
I am not complaining here. We knew what we were getting ourselves into when we took on the DocGraph project. The community at large has paid us well to do this work! But now we have a question? What data formats should we support? Read more…
At its best, 3D printing can make us more human by making us whole.
Tim O’Reilly recently asked me and some other colleagues which technology seems most like magic to us. There was a thoughtful pause as we each considered the amazing innovations we read about and interact with every day.
My reasons are different than you might think. Yes, it’s amazing that, with very little skill, we can manufacture complex objects in our homes and workshops that are made from things like plastic or wood or chocolate or even titanium. This seems an amazing act of conjuring that, just a short time ago, would have been difficult to imagine outside of the “Star Trek” set.
But the thing that makes 3D printing really special is the magic it allows us to perform: the technology is capable of making us more human. Read more…
An inside look at DocGraph, a data project that shows how the U.S. health care system delivers care.
At Strata RX in October I announced the availability of DocGraph. This is the first project of NotOnly Development, which is a Not Only For Profit Health IT micro-incubator.
The DocGraph dataset shows how doctors, hospitals, laboratories and other health care providers team together to treat Medicare patients. This data details how the health care system in the U.S. delivers care.
You can read about the basics of this data release, and you can read about my motivations for making the release. Most importantly, you can still participate in our efforts to crowdfund improvements to this dataset. We have already far surpassed our original $15,000 goal, but you can still get early and exclusive access to the data for a few more days. Once the crowdfunding has ended, the price will go up substantially.
This article will focus on this data from a technical perspective.
In a few days, the crowdfunding (hosted by Medstartr) will be over, and I will be delivering this social graph to all of the participants. We are offering a ransom license that we are calling “Open Source Eventually,” so participants in the crowdfunding will get exclusive access to the data for a full six months before the license to this dataset automatically converts to a Creative Commons license. The same data is available under a proprietary-friendly license for more money. For all of these “releases,” this article will be the go-to source for technical details about the specific contents of the file.
Voice your support for a proposed federal rule that expands patients' access to test results.
I’m convinced that there’s a wave of innovation coming in healthcare, driven by new kinds of data, new ways of extracting meaning from that data, and new business models that data can enable. That’s one of the reasons why we launched our StrataRx Conference, which focuses on the importance of data science to the future of health care.
Unfortunately, much of the data that will enable an entrepreneurial explosion is still locked up — in paper records, in proprietary data formats, and by well-intentioned but conflicting privacy regulations.
We’re making progress towards open data in healthcare, but there are still so many obstacles! Ann Waldo recently introduced me to one of these.
A 2009 law modernized patient access rights by allowing individuals to get copies of their medical records in electronic format. Unfortunately, however, these patients’ access rights surprisingly do not include lab test results – one of the types of medical records that people are most likely to find urgent and useful. Due to the interaction of HIPAA (the Federal medical privacy law), CLIA (a Federal laboratory regulatory law), and state laws, patients can only get direct access to their their test results from labs in a handful of states.
A recent New York Times story highlighted just how much pain and suffering can be caused by this inability to get access to your own lab results.
In 2011, the Department of Health and Human Services put forward a proposed Rule that would give patients the right to get their test results directly from laboratories. This Rule is still waiting to be finalized. In hopes of breaking the logjam, O’Reilly Media and a variety of other players have written a consensus letter that voices our whole-hearted support for that proposed Rule and encourages the Federal government to finalize it promptly.
We’d love to invite you to join us in signing this letter.
Patients’ rights should include direct access to their lab results, just like all their other medical records!
The United States National Institutes of Health (NIH) wants to tie development of mobile health apps to evidence-based research, and it hopes to do that with a new grant program. The imperative to align developers with research is urgent, given the strong interest in health IT, mobile health and health data. There are significant challenges for the space, from consumer concerns over privacy and mobile applications to the broader question of balancing health data innovation with patient rights.
To learn more about what’s happening with mobile health apps, health data, behavioral change and cancer research, I recently interviewed Dr. Abdul Sheikh. Our interview, lightly edited for content and clarity, follows.
What led you to your current work at NIH?
Dr. Abdul Sheikh: I’ve always had a strong grounding in public health and population health, but I also have a real passion for technology and informatics. What’s beautiful is, in my current position here as a program director at the National Cancer Institute (NCI), I have a chance to meld these worlds of public health, behavior and communication science with my passion for technology and informatics. Some of the work I did before coming to the NIH was related to the early telemedicine and web-based health promotion efforts that the government of Canada was involved in.
At NCI, I direct a portfolio of research on technology-mediated communication. I’ve also had the chance to get involved and provide leadership on two very cool efforts. One of them is leadership for our division’s Small Business Innovation Research Program (SBIR). I’ve led the first NIH developer challenge competitions as well.
The convergence of data, privacy and cost have created a unique opportunity to reshape health care.
Health care appears immune to disruption. It’s a space where the stakes are high, the incumbents are entrenched, and lessons from other industries don’t always apply.
Yet, in a recent conversation between Tim O’Reilly and Roger Magoulas it became evident that we’re approaching an unparalleled opportunity for health care change. O’Reilly and Magoulas explained how the convergence of data access, changing perspectives on privacy, and the enormous expense of care are pushing the health space toward disruption.
As always, the primary catalyst is money. The United States is facing what Magoulas called an “existential crisis in health care costs” [discussed at the 3:43 mark]. Everyone can see that the current model is unsustainable. It simply doesn’t scale. And that means we’ve arrived at a place where party lines are irrelevant and tough solutions are the only options.
“Who is it that said change happens when the pain of not changing is greater than the pain of changing?” O’Reilly asked. “We’re now reaching that point.” [3:55]
(Note: The source of that quote is hard to pin down, but the sentiment certainly applies.)
This willingness to change is shifting perspectives on health data. Some patients are making their personal data available so they and others can benefit. Magoulas noted that even health companies, which have long guarded their data, are warming to collaboration.
At the same time there’s a growing understanding that health data must be contextualized. Simply having genomic information and patient histories isn’t good enough. True insight — the kind that can improve quality of life — is only possible when datasets are combined.