ENTRIES TAGGED "Sage Congress"
How the field of genetics is using data within research and to evaluate researchers
Editor’s note: Earlier this week, Part 1 of this article described Sage Bionetworks, a recent Congress they held, and their way of promoting data sharing through a challenge.
Data sharing is not an unfamiliar practice in genetics. Plenty of cell lines and other data stores are publicly available from such places as the TCGA data set from the National Cancer Institute, Gene Expression Omnibus (GEO), and Array Expression (all of which can be accessed through Synapse). So to some extent the current revolution in sharing lies not in the data itself but in critical related areas.
First, many of the data sets are weakened by metadata problems. A Sage programmer told me that the famous TCGA set is enormous but poorly curated. For instance, different data sets in TCGA may refer to the same drug by different names, generic versus brand name. Provenance–a clear description of how the data was collected and prepared for use–is also weak in TCGA.
In contrast, GEO records tend to contain good provenance information (see an example), but only as free-form text, which presents the same barriers to searching and aggregation as free-form text in medical records. Synapse is developing a structured format for presenting provenance based on the W3C’s PROV standard. One researcher told me this was the most promising contribution of Synapse toward the shared used of genetic information.
Observations from Sage Congress and collaboration through its challenge
The glowing reports we read of biotech advances almost cause one’s brain to ache. They leave us thinking that medical researchers must command the latest in all technological tools. But the engines of genetic and pharmaceutical innovation are stuttering for lack of one key fuel: data. Here they are left with the equivalent of trying to build skyscrapers with lathes and screwdrivers.
Sage Congress, held this past week in San Francisco, investigated the multiple facets of data in these field: gene sequences, models for finding pathways, patient behavior and symptoms (known as phenotypic data), and code to process all these inputs. A survey of efforts by the organizers, Sage Bionetworks, and other innovations in genetic data handling can show how genetics resembles and differs from other disciplines.
An intense lesson in code sharing
At last year’s Congress, Sage announced a challenge, together with the DREAM project, intended to galvanize researchers in genetics while showing off the growing capabilities of Sage’s Synapse platform. Synapse ties together a number of data sets in genetics and provides tools for researchers to upload new data, while searching other researchers’ data sets. Its challenge highlighted the industry’s need for better data sharing, and some ways to get there.