Genomics and the Role of Big Data in Personalizing the Healthcare Experience

Increasingly available data spurs organizations to make analysis easier

Genomics is making headlines in both academia and the celebrity world. With intense media coverage of Angelina Jolie’s recent double mastectomy after genetic tests revealed that she was predisposed to breast cancer, genetic testing and genomics have been propelled to the front of many more minds.

In this new data field, companies are approaching the collection, analysis, and turning of data into usable information from a variety of angles.
A very serious game that can cure the orphan diseases

Fit2Cure taps the public's visual skills to match compounds to targets

In the inspiring tradition of Foldit, the game for determining protein shapes, Fit2Cure crowdsources the problem of finding drugs that can cure the many under-researched diseases of developing countries. Fit2Cure appeals to the player’s visual–even physical–sense of the world, and requires much less background knowledge than Foldit.

There about 7,000 rare diseases, fewer than 5% of which have cures. The number of people currently engaged in making drug discoveries is by no means adequate to study all these diseases. A recent gift to Harvard shows the importance that medical researchers attach to filling the gap. As an alternative approach, abstracting the drug discovery process into a game could empower thousands, if not millions, of people to contribute to this process and make discoveries in diseases that get little attention to scientists or pharmaceutical companies.

The biological concept behind Fit2Cure is that medicines have specific shapes that fit into the proteins of the victim’s biological structures like jig-saw puzzle pieces (but more rounded). Many cures require finding a drug that has the same jig-saw shape and can fit into the target protein molecule, thus preventing it from functioning normally.

Data sharing drives diagnoses and cures, if we can get there (part 2)

How the field of genetics is using data within research and to evaluate researchers

Editor’s note: Earlier this week, Part 1 of this article described Sage Bionetworks, a recent Congress they held, and their way of promoting data sharing through a challenge.

Data sharing is not an unfamiliar practice in genetics. Plenty of cell lines and other data stores are publicly available from such places as the TCGA data set from the National Cancer Institute, Gene Expression Omnibus (GEO), and Array Expression (all of which can be accessed through Synapse). So to some extent the current revolution in sharing lies not in the data itself but in critical related areas.

First, many of the data sets are weakened by metadata problems. A Sage programmer told me that the famous TCGA set is enormous but poorly curated. For instance, different data sets in TCGA may refer to the same drug by different names, generic versus brand name. Provenance–a clear description of how the data was collected and prepared for use–is also weak in TCGA.

In contrast, GEO records tend to contain good provenance information (see an example), but only as free-form text, which presents the same barriers to searching and aggregation as free-form text in medical records. Synapse is developing a structured format for presenting provenance based on the W3C’s PROV standard. One researcher told me this was the most promising contribution of Synapse toward the shared used of genetic information.

Data sharing drives diagnoses and cures, if we can get there (part 1)

Observations from Sage Congress and collaboration through its challenge

The glowing reports we read of biotech advances almost cause one’s brain to ache. They leave us thinking that medical researchers must command the latest in all technological tools. But the engines of genetic and pharmaceutical innovation are stuttering for lack of one key fuel: data. Here they are left with the equivalent of trying to build skyscrapers with lathes and screwdrivers.

Sage Congress, held this past week in San Francisco, investigated the multiple facets of data in these field: gene sequences, models for finding pathways, patient behavior and symptoms (known as phenotypic data), and code to process all these inputs. A survey of efforts by the organizers, Sage Bionetworks, and other innovations in genetic data handling can show how genetics resembles and differs from other disciplines.

An intense lesson in code sharing

At last year’s Congress, Sage announced a challenge, together with the DREAM project, intended to galvanize researchers in genetics while showing off the growing capabilities of Sage’s Synapse platform. Synapse ties together a number of data sets in genetics and provides tools for researchers to upload new data, while searching other researchers’ data sets. Its challenge highlighted the industry’s need for better data sharing, and some ways to get there.

Discovering genetic associations using large data

David Heckerman's research uses big datasets to tackle essential health questions.

David Heckerman from Microsoft Research presents a summary of his work in the session “Discovering Genetic Associations on Large Data.” This was part of the Strata Rx Online Conference: Personalized Medicine, a preview of O’Reilly’s conference Strata Rx, highlighting the use of data in medical research and delivery.

Heckerman’s research attempts to answer essential questions such as “What is your propensity for getting a particular disease?” and “How are you likely to react to a particular drug?”

Health records support genetics research at Children’s Hospital of Philadelphia

Michael Italia on making use of data collected in health care settings.

Michael Italia from Children's Hospital of Philadelphia discusses the tools and methods his team uses to manage health care data.

Recombinant Research: Breaking open rewards and incentives

Can open data dominate biological science as open source has in software?

To move from a hothouse environment of experimentation to the mainstream of one of the world's most lucrative and tradition-bound industries, Sage Bionetworks must aim for its nucleus: rewards and incentives. Comparisons to open source software and a summary of tasks for Sage Congress.

Recombinant Research: Sage Congress plans for patient engagement

The Vioxx problem is just one instance of the wider malaise afflicting the drug industry. Managers from major pharma companies expressed confidence that they could expand public or "pre-competitive" research in the direction Sage Congress proposed. The sector left to engage is the one that's central to all this work–the public.

Recombinant Research: Sage Congress promotes data sharing in genetics

Report from a movement that believes in open source and open data in science

Through two days of demos, keynotes, panels, and breakout sessions, Sage Congress brought its vision to a high-level cohort of 230 attendees from universities, pharmaceutical companies, government health agencies, and others who can make change in the field.

Collaborative genetics, part 5: Next steps for genetic commons

The final installment of this series, about a Sage Commons Congress on
the open-source sharing of genetic research, looks at what Sage
Bionetworks and its friends need to do.

