David Heckerman from Microsoft Research presents a summary of his work in the session “Discovering Genetic Associations on Large Data.” This was part of the Strata Rx Online Conference: Personalized Medicine, a preview of O’Reilly’s conference Strata Rx, highlighting the use of data in medical research and delivery.
Heckerman’s research attempts to answer essential questions such as “What is your propensity for getting a particular disease?” and “How are you likely to react to a particular drug?”
Key points from Heckerman’s presentation include:
- Genome-wide association studies, where you combine the genes of people suffering from a condition with those who don’t (the controls), require about a million participants because the associations (the signal) are so weak. [Segment begins 44 seconds in.]
- Datasets are getting larger, thanks to deCode, 23andMe, Kaiser, but results are misleading because of confounding data such as multiple ethnicities or other differences among populations. [5:22]
- Linear mixed models theoretically can disentangle populations, but are computationally prohibitive. Heckerman’s group has found a short-cut that is both feasible and more accurate. [7:37]
- For instance, a sample of 15,000 individuals revealed correlations between markers and common diseases in one day of massive computing. [9:30]
The full presentation follows: