Focusing attention on the present lets organizations pursue existing opportunities as opposed to projected ones
Slow and Unaware
It was 2005. The war in Iraq was raging. Many of us in the national security R&D community were developing responses to the deadliest threat facing U.S. soldiers: the improvised explosive device (IED). From the perspective of the U.S. military, the unthinkable was happening each and every day. The world’s most technologically advanced military was being dealt significant blows by insurgents making crude weapons from limited resources. How was this even possible?
The war exposed the limits of our unwavering faith in technology. We depended heavily on technology to provide us the advantage in an environment we did not understand. When that failed, we were slow to learn. Meanwhile the losses continued. We were being disrupted by a patient, persistent organization that rapidly experimented and adapted to conditions on the ground.
To regain the advantage, we needed to start by asking different questions. We needed to shift our focus from the devices that were destroying U.S. armored vehicles to the people responsible for building and deploying the weapons. This motivated new approaches to collect data that could expose elements of the insurgent network.
New organizations and modes of operation were also required to act swiftly when discoveries were made. By integrating intelligence and special operations capabilities into a single organization with crisp objectives and responsive leadership, the U.S. dramatically accelerated its ability to disrupt insurgent operations. Rapid orientation and action were key in this dynamic environment where opportunities persisted for an often unknown and very limited period of time.
This story holds important and under appreciated lessons that apply to the challenges numerous organizations face today. The ability to collect, store, and process large volumes of data doesn’t confer advantage by default. It’s still common to fixate on the wrong questions and fail to recover quickly when mistakes are made. To accelerate organizational learning with data, we need to think carefully about our objectives and have realistic expectations about what insights we can derive from measurement and analysis.
O'Reilly report covers major trends and tries to connect the neurons
If visualization is key to comprehending data, the field of health IT calls for better visualization. I am not talking here of pretty charts and animations. I am talking, rather, of a holistic, unified understanding of the bustle taking place in different corners of health: the collection and analysis of genetic data, the design of slim medical devices that replace refrigerator-sized pieces of equipment, the data crunching at hospitals delving into demographic data to identify at-risk patients.
There is no dearth of health reformers offering their visions for patient engagement, information exchange, better public health, and disruptive change to health industries. But they often accept too freely the promise of technology, without grasping how difficult the technical implementations of their reforms would be. Furthermore, no document I have found pulls together the various trends in technology and explores their interrelationships.
I have tried to fill this gap with a recently released report: The Information Technology Fix for Health: Barriers and Pathways to the Use of Information Technology for Better Health Care. This posting describes some of the issues it covers.
A Call for Proposals for Strata Conference + Hadoop World 2014
When we launched Strata a few years ago, our original focus was on how big data, ubiquitous computing, and new interfaces change the way we live, love, work and play. In fact, here’s a diagram we mocked up back then to describe the issues we wanted the new conference to tackle:
Insights from a business executive and law professor
If you develop software or manage databases, you’re probably at the point now where the phrase “Big Data” makes you roll your eyes. Yes, it’s hyped quite a lot these days. But, overexposed or not, the Big Data revolution raises a bunch of ethical issues related to privacy, confidentiality, transparency and identity. Who owns all that data that you’re analyzing? Are there limits to what kinds of inferences you can make, or what decisions can be made about people based on those inferences? Perhaps you’ve wondered about this yourself.
We’re obsessed by these questions. We’re a business executive and a law professor who’ve written about this question a lot, but our audience is usually lawyers. But because engineers are the ones who confront these questions on a daily basis, we think it’s essential to talk about these issues in the context of software development.
While there’s nothing particularly new about the analytics conducted in big data, the scale and ease with which it can all be done today changes the ethical framework of data analysis. Developers today can tap into remarkably varied and far-flung data sources. Just a few years ago, this kind of access would have been hard to imagine. The problem is that our ability to reveal patterns and new knowledge from previously unexamined troves of data is moving faster than our current legal and ethical guidelines can manage. We can now do things that were impossible a few years ago, and we’ve driven off the existing ethical and legal maps. If we fail to preserve the values we care about in our new digital society, then our big data capabilities risk abandoning these values for the sake of innovation and expediency.
Collecting actionable data is a challenge for today's data tools
One of the problems dragging down the US health care system is that nobody trusts one another. Most of us, as individuals, place faith in our personal health care providers, which may or may not be warranted. But on a larger scale we’re all suspicious of each other:
- Doctors don’t trust patients, who aren’t forthcoming with all the bad habits they indulge in and often fail to follow the most basic instructions, such as to take their medications.
- The payers–which include insurers, many government agencies, and increasingly the whole patient population as our deductibles and other out-of-pocket expenses ascend–don’t trust the doctors, who waste an estimated 20% or more of all health expenditures, including some thirty or more billion dollars of fraud each year.
- The public distrusts the pharmaceutical companies (although we still follow their advice on advertisements and ask our doctors for the latest pill) and is starting to distrust clinical researchers as we hear about conflicts of interest and difficulties replicating results.
- Nobody trusts the federal government, which pursues two (contradictory) goals of lowering health care costs and stimulating employment.
Yet everyone has beneficent goals and good ideas for improving health care. Doctors want to feel effective, patients want to stay well (even if that desire doesn’t always translate into action), the Department of Health and Human Services champions very lofty goals for data exchange and quality improvement, clinical researchers put their work above family and comfort, and even private insurance companies are trying moving to “fee for value” programs that ensure coordinated patient care.
More than algorithms, companies gain access to models that incorporate ideas generated by teams of data scientists
Data scientists were among the earliest and most enthusiastic users of crowdsourcing services. Lukas Biewald noted in a recent talk that one of the reasons he started CrowdFlower was that as a data scientist he got frustrated with having to create training sets for many of the problems he faced. More recently, companies have been experimenting with active learning (humans1 take care of uncertain cases, models handle the routine ones). Along those lines, Adam Marcus described in detail how Locu uses Crowdsourcing services to perform structured extraction (converting semi/unstructured data into structured data).
Another area where crowdsourcing is popping up is feature engineering and feature discovery. Experienced data scientists will attest that generating features is as (if not more) important than choice of algorithm. Startup CrowdAnalytix uses public/open data sets to help companies enhance their analytic models. The company has access to several thousand data scientists spread across 50 countries and counts a major social network among its customers. Its current focus is on providing “enterprise risk quantification services to Fortune 1000 companies”.
CrowdAnalytix breaks up projects in two phases: feature engineering and modeling. During the feature engineering phase, data scientists are presented with a problem (independent variable(s)) and are asked to propose features (predictors) and brief explanations for why they might prove useful. A panel of judges evaluate2 features based on the accompanying evidence and explanations. Typically 100+ teams enter this phase of the project, and 30+ teams propose reasonable features.
Built-in audit trails can be useful for reproducing and debugging complex data analysis projects
As I noted in a previous post, model building is just one component of the analytic lifecycle. Many analytic projects result in models that get deployed in production environments. Moreover, companies are beginning to treat analytics as mission-critical software and have real-time dashboards to track model performance.
Once a model is deemed to be underperforming or misbehaving, diagnostic tools are needed to help determine appropriate fixes. It could well be models need to be revisited and updated, but there are instances when underlying data sources1 and data pipelines are what need to be fixed. Beyond the formal systems put in place specifically for monitoring analytic products, tools for reproducing data science workflows could come in handy.
MIT workshop kicks off Obama campaign on privacy
Thrust into controversy by Edward Snowden’s first revelations last year, President Obama belatedly welcomed a “conversation” about privacy. As cynical as you may feel about US spying, that conversation with the federal government has now begun. In particular, the first of three public workshops took place Monday at MIT.
Given the locale, a focus on the technical aspects of privacy was appropriate for this discussion. Speakers cheered about the value of data (invoking the “big data” buzzword often), delineated the trade-offs between accumulating useful data and preserving privacy, and introduced technologies that could analyze encrypted data without revealing facts about individuals. Two more workshops will be held in other cities, one focusing on ethics and the other on law.
By David Andrzejewski of SumoLogic
A few weeks ago I had the pleasure of hosting the machine data track of talks at Strata Santa Clara. Like “big data”, the phrase “machine data” is associated with multiple (sometimes conflicting) definitions, two prominent ones come from Curt Monash and Daniel Abadi. The focus of the machine data track is on data which is generated and/or collected automatically by machines. This includes software logs and sensor measurements from systems as varied as mobile phones, airplane engines, and data centers. The concept is closely related to the “internet of things”, which refers to the trend of increasing connectivity and instrumentation in existing devices, like home thermostats.
More data, more problems
This data can be useful for the early detection of operational problems or the discovery of opportunities for improved efficiency. However, the decoupling of data generation and collection from human action means that the volume of machine data can grow at machine scales (i.e., Moore’s Law), an issue raised by both Monash and Abadi. This explosive growth rate amplifies existing challenges associated with “big data”. In particular two common motifs among the talks at Strata were the difficulties around:
- mechanics: the technical details of data collection, storage, and analysis
- semantics: extracting understandable and actionable information from the data deluge
How do we motivate sustained behavior change when the external motivation disappears—like it's supposed to?
If you’ve ever tried to count calories, go on a diet, start a new exercise program, change your sleep patterns, spend less time sitting, or make any other type of positive health change, then you know how difficult it is to form new habits. New habits usually require a bit of willpower to get going, and we all know that that’s a scarce resource. (Or at least, a limited one.)
Change is hard. But the real challenge comes after you’ve got a new routine going—because now you’ve got to keep it going, even though your original motivations to change may no longer apply. Why keep dieting when you no longer need to lose weight? We’ve all had the idea at some point that we really should reward ourselves for that five-pound weight loss with a cupcake, right?