ENTRIES TAGGED "health"
By Julie Yoo, Chief Product Officer at Kyruus
Once upon a time, a world-renowned surgeon, Dr. Michael DeBakey, was summoned by the President when the Shah of Iran, a figure of political and strategic importance, fell ill with an enlarged spleen due to cancer. Dr. DeBakey was whisked away to Egypt to meet the Shah, made a swift diagnosis, and recommended an immediate operation to remove the spleen. The surgery lasted 80 minutes; the spleen, which had grown to 10 times its normal size, was removed, and the Shah made a positive recovery in the days following the surgery – that is, until he took a turn for the worse, and ultimately died from surgical complications a few weeks later. 
Sounds like a routine surgery gone awry, yes? But consider this: Dr. DeBakey was a cardiovascular surgeon – in other words, a surgeon whose area of specialization was in the operation of the heart and blood vessels, not the spleen. He was most well-known for his open heart bypass surgery techniques, and the vast majority of his peer-reviewed articles relate to cardiology-related operating techniques. High profile or not, why was a cardiovascular surgeon selected to perform an abdominal surgery?
An Interview with Julie Steele
A week or two ago, I got to correspond with Danielle Brooks of Disruptive Women in Health Care about the work I do here at O’Reilly. The following interview is reprinted here with their kind permission.
Tell us about your work. What drew you to the area?
I have mostly worked as a book editor, until just a year or two ago. I was working on books about databases, machine learning, visualization, and other relevant topics when O’Reilly launched its Strata conference on data science, and so I became involved in that conference. But as Strata took off, it became apparent to us that certain communities — and certain types of data — were special. Health care is one of those areas: the insights that data analysis can give us about ourselves and the things that ail us are enormous, but the risks of over-sharing and the resulting constraints such as HIPAA also present very real challenges.
In 2012, O’Reilly decided to launch a new edition of its data science conference to focus on health care, and that’s how Strata Rx was born. I was asked to become its Program Chair, along with Colin Hill, CEO of GNS Health care, and so I have spent that last 18 months learning everything I can about the (very complicated!) health care industry. Colin and I are great partners because of the complimentary backgrounds we bring together — Colin from the health care industry side and myself from the technology side. Ultimately, that’s what Strata Rx aims to do, too: we hope that by bringing together professionals from all parts of the industry (payers, providers, researchers, analysts, advocates, developers, investors, and caregivers, just to name a few) we can begin to solve some of the large and complex problems facing us in this area.
Exploring an upcoming Strata Rx 2013 session on big data and privacy
Databases of health data are widely shared among researchers and for commercial purposes, and they are even put online in order to promote health research and data-driven health app development, so preserving the privacy of patients is critical. But are these data sets de-identified properly? If not, it could be re-identified. Just look at the two high profile re-identification attacks that have been publicized in recent months.
The first attack involved individuals who voluntarily published their genomic data online as a way to support open data for research. Besides their genomic data, they posted their basic demographics such as date of birth and zip code. The demographic data, not their genomic data, was used to re-identify a subset of the individuals.
How the field of genetics is using data within research and to evaluate researchers
Editor’s note: Earlier this week, Part 1 of this article described Sage Bionetworks, a recent Congress they held, and their way of promoting data sharing through a challenge.
Data sharing is not an unfamiliar practice in genetics. Plenty of cell lines and other data stores are publicly available from such places as the TCGA data set from the National Cancer Institute, Gene Expression Omnibus (GEO), and Array Expression (all of which can be accessed through Synapse). So to some extent the current revolution in sharing lies not in the data itself but in critical related areas.
First, many of the data sets are weakened by metadata problems. A Sage programmer told me that the famous TCGA set is enormous but poorly curated. For instance, different data sets in TCGA may refer to the same drug by different names, generic versus brand name. Provenance–a clear description of how the data was collected and prepared for use–is also weak in TCGA.
In contrast, GEO records tend to contain good provenance information (see an example), but only as free-form text, which presents the same barriers to searching and aggregation as free-form text in medical records. Synapse is developing a structured format for presenting provenance based on the W3C’s PROV standard. One researcher told me this was the most promising contribution of Synapse toward the shared used of genetic information.
Observations from Sage Congress and collaboration through its challenge
The glowing reports we read of biotech advances almost cause one’s brain to ache. They leave us thinking that medical researchers must command the latest in all technological tools. But the engines of genetic and pharmaceutical innovation are stuttering for lack of one key fuel: data. Here they are left with the equivalent of trying to build skyscrapers with lathes and screwdrivers.
Sage Congress, held this past week in San Francisco, investigated the multiple facets of data in these field: gene sequences, models for finding pathways, patient behavior and symptoms (known as phenotypic data), and code to process all these inputs. A survey of efforts by the organizers, Sage Bionetworks, and other innovations in genetic data handling can show how genetics resembles and differs from other disciplines.
An intense lesson in code sharing
At last year’s Congress, Sage announced a challenge, together with the DREAM project, intended to galvanize researchers in genetics while showing off the growing capabilities of Sage’s Synapse platform. Synapse ties together a number of data sets in genetics and provides tools for researchers to upload new data, while searching other researchers’ data sets. Its challenge highlighted the industry’s need for better data sharing, and some ways to get there.
In which the question of whether research subjects have any rights to their data is pondered.
The GET (Genomes, Environments and Traits) conference is a confluence of parties interested in the advances being made in human genomes, the measurement of how the environment impacts individuals, and how the two come together to produce traits. Sponsored by the organizers of the Personal Genome Project (PGP) at Harvard, it is a two-day event whose topics range from the appropriate amount of access that patients should have to their genetics data to the ways that Hollywood can be convinced to portray genomics more accurately.
It also is a yearly meeting place for the participants in the Personal Genome Project (one of whom is your humble narrator), people who have agreed to participate in an “open consent” research model. Among other things, this means that PGP participants agree to let their cell lines be used for any purposes (research or commercial). They also acknowledge ahead of time that because their genomes and phenotypic traits are being released publicly, there is a high likelihood that interested parties may be able to identify them from their data. The long term goal of the PGP is to enroll 100,000 participants and perform whole genome sequencing of their DNA, they currently have nearly 2,300 enrolled participants and have sequenced around 165 genomes.
Big data is shaping diverse fields, showing that past predictions from data-driven natural sciences are now coming to pass.
I find myself having conversations recently with people from increasingly diverse fields, both at Columbia and in local startups, about how their work is becoming “data-informed” or “data-driven,” and about the challenges posed by applied computational statistics or big data.
A view from health and biology in the 1990s
In discussions with, as examples, New York City journalists, physicists, or even former students now working in advertising or social media analytics, I’ve been struck by how many of the technical challenges and lessons learned are reminiscent of those faced in the health and biology communities over the last 15 years, when these fields experienced their own data-driven revolutions and wrestled with many of the problems now faced by people in other fields of research or industry.
It was around then, as I was working on my PhD thesis, that sequencing technologies became sufficient to reveal the entire genomes of simple organisms and, not long thereafter, the first draft of the human genome. This advance in sequencing technologies made possible the “high throughput” quantification of, for example,
- the dynamic activity of all the genes in an organism; or
- the set of all protein-protein interactions in an organism; or even
- statistical comparative genomics revealing how small differences in genotype correlate with disease or other phenotypes.
These advances required formation of multidisciplinary collaborations, multi-departmental initiatives, advances in technologies for dealing with massive datasets, and advances in statistical and mathematical methods for making sense of copious natural data. Read more…
A call for data scientists, technologists, health professionals, and business leaders to convene.
We are launching a conference at the intersection of health, health care, and data. Why?
Our health care system is in crisis. We are experiencing epidemic levels of obesity, diabetes, and other preventable conditions while at the same time our health care system costs are spiraling higher. Most of us have experienced increasing health care costs in our businesses or have seen our personal share of insurance premiums rise rapidly. Worse, we may be living with a chronic or life-threatening disease while struggling to obtain effective therapies and interventions — finding ourselves lumped in with “average patients” instead of receiving effective care designed to work for our specific situation.
In short, particularly in the United States, we are paying too much for too much care of the wrong kind and getting poor results. All the while our diet and lifestyle failures are demanding even more from the system. In the past few decades we’ve dropped from the world’s best health care system to the 37th, and we seem likely to drop further if things don’t change.
The very public fight over the Affordable Care Act (ACA) has brought this to the fore of our attention, but this is a situation that has been brewing for a long time. With the ACA’s arrival, increasing costs and poor outcomes, at least in part, are going to be the responsibility of the federal government. The fiscal outlook for that responsibility doesn’t look good and solving this crisis is no longer optional; it’s urgent.
There are many reasons for the crisis, and there’s no silver bullet. Health and health care live at the confluence of diet and exercise norms, destructive business incentives, antiquated care models, and a system that has severe learning disabilities. We aren’t preventing the preventable, and once we’re sick we’re paying for procedures and tests instead of results; and those interventions were designed for some non-existent average patient so much of it is wasted. Later we mostly ignore the data that could help the system learn and adapt.
It’s all too easy to be gloomy about the outlook for health and health care, but this is also a moment of great opportunity. We face this crisis armed with vast new data sources, the emerging tools and techniques to analyze them, an ACA policy framework that emphasizes outcomes over procedures, and a growing recognition that these are problems worth solving.
Michael Italia on making use of data collected in health care settings.
Michael Italia from Children's Hospital of Philadelphia discusses the tools and methods his team uses to manage health care data.
Quantifying your changes + motivational hacks = programmable self.
Taking a cue from the Quantified Self movement, the programmable self is the combination of a digital motivation hack with a digital system that tracks behavior. Here's a look at companies and projects relevant to the programmable self space.