ENTRIES TAGGED "wisdom of crowds"
More than algorithms, companies gain access to models that incorporate ideas generated by teams of data scientists
Data scientists were among the earliest and most enthusiastic users of crowdsourcing services. Lukas Biewald noted in a recent talk that one of the reasons he started CrowdFlower was that as a data scientist he got frustrated with having to create training sets for many of the problems he faced. More recently, companies have been experimenting with active learning (humans1 take care of uncertain cases, models handle the routine ones). Along those lines, Adam Marcus described in detail how Locu uses Crowdsourcing services to perform structured extraction (converting semi/unstructured data into structured data).
Another area where crowdsourcing is popping up is feature engineering and feature discovery. Experienced data scientists will attest that generating features is as (if not more) important than choice of algorithm. Startup CrowdAnalytix uses public/open data sets to help companies enhance their analytic models. The company has access to several thousand data scientists spread across 50 countries and counts a major social network among its customers. Its current focus is on providing “enterprise risk quantification services to Fortune 1000 companies”.
CrowdAnalytix breaks up projects in two phases: feature engineering and modeling. During the feature engineering phase, data scientists are presented with a problem (independent variable(s)) and are asked to propose features (predictors) and brief explanations for why they might prove useful. A panel of judges evaluate2 features based on the accompanying evidence and explanations. Typically 100+ teams enter this phase of the project, and 30+ teams propose reasonable features.
A doctor looks to software communities as inspiration for her own research
(The following article sprang from a collaboration between Andy Oram and Brigitte Piniewski to cover open source concepts in an upcoming book on health care. This book, titled “Wireless Health: Remaking of Medicine by Pervasive Technologies,” is edited by Professor Mehran Mehregany of Case Western Reserve University. and has an expected release date of February 2013. It is designed to provide the reader with the fundamental and practical knowledge necessary for an overall grasp of the field of wireless health. The approach is an integrated, multidisciplinary treatment of the subject by a team of leading topic experts. The selection here is part of a larger chapter by Brigitte Piniewski about personalized medicine and public health.)
Medical research and open source software have much to learn from each other. As software transforms the practice and delivery of medicine, the communities and development methods that have grown up around software–particularly free and open source software–also provide models that doctors and researchers can apply to their own work. Some of the principles that software communities can offer for spreading health throughout the population include these:
Like a living species, software evolves as code is updated and functionality is improved.
Software of low utility is dropped as users select better tools and drive forward functionality to meet new use cases.
Open source culture demonstrates how a transparent approach to sharing software practices enables problem areas to be identified and corrected accurately, cost-effectively, and at the pace of change.
Can open data dominate biological science as open source has in software?
To move from a hothouse environment of experimentation to the mainstream of one of the world's most lucrative and tradition-bound industries, Sage Bionetworks must aim for its nucleus: rewards and incentives. Comparisons to open source software and a summary of tasks for Sage Congress.
The Vioxx problem is just one instance of the wider malaise afflicting the drug industry. Managers from major pharma companies expressed confidence that they could expand public or "pre-competitive" research in the direction Sage Congress proposed. The sector left to engage is the one that's central to all this work–the public.
Report from a movement that believes in open source and open data in science
Through two days of demos, keynotes, panels, and breakout sessions, Sage Congress brought its vision to a high-level cohort of 230 attendees from universities, pharmaceutical companies, government health agencies, and others who can make change in the field.