Data tools are less important than the way you frame your questions.
Max Shron and Jake Porway spoke with me at Strata a few weeks ago about frameworks for making reasoned arguments with data. Max’s recent O’Reilly book, Thinking with Data, outlines the crucial process of developing good questions and creating a plan to answer them. Jake’s nonprofit, DataKind, connects data scientists with worthy causes where they can apply their skills.
A few of the things we talked about:
- The importance of publishing negative scientific results
- Give Directly, an organization that facilitates donations directly to households in Kenya and Uganda. Give Directly was able to model income using satellite data to distinguish thatched roofs from metal roofs.
- Moritz Stefaner calling for a “macroscope”
- Project Cybersyn, Salvador Allende’s plan for encompassing the entire Chilean economy in a single real-time computer system
- Seeing Like a State: How Certain Schemes to Improve the Human Condition Have Failedby James C. Scott
After we recorded this podcast episode at Strata Santa Clara, Max presided over a webcast on his book that’s archived here.
Editor’s note: this post originally appeared on O’Reilly Radar.
Software and hardware are moving together, and the combined result is a new medium.
The result is an entirely new medium that’s just beginning to emerge. We can see it in Ars Electronica Futurelab’s Spaxels, which use drones to render a three-dimensional pixel field; inBaxter, which layers emotive software onto an industrial robot so that anyone can operate it safely and efficiently; in OpenXC, which gives even hobbyist-level programmers access to the software in their cars; in SmartThings, which ties Web services to light switches.
The new medium is something broader than terms like “Internet of Things,” “Industrial Internet,” or “connected devices” suggest. It’s an entirely new discipline that’s being built by software developers, roboticists, manufacturers, hardware engineers, artists, and designers.
Networked sensors and machine learning make it easy to see when things are out of the ordinary.
Much of health care — particularly for the elderly — is about detecting change, and, as the mobile health movement would have it, computers are very good at that. Given enough sensors, software can model an individual’s behavior patterns and then figure out when things are out of the ordinary — when gait slows, posture stoops or bedtime moves earlier.
Technology already exists that lets users set parameters for households they’re monitoring. Systems are available that send an alert if someone leaves the house in the middle of the night or sleeps past a preset time. Those systems involve context-specific hardware (i.e., a bed-pressure sensor) and conscientious modeling (you have to know what time your grandmother usually wakes up).
The next step would be a generic system. One that, following simple setup, would learn the habits of the people it monitors and then detect the sorts of problems that beset elderly people living alone — falls, disorientation, and so forth — as well as more subtle changes in behavior that could signal other health problems.
A group of researchers from Austria and Turkey has developed just such a system, which they presented at the IEEE’s Industrial Electronics Society meeting in Montreal in October.*
Activity as surmised in different rooms by the researchers’ machine-learning algorithms. Source: “Activity Recognition Using a Hierarchical Model.”
In their approach, the researchers train a machine-learning algorithm with several days of routine household activity using door and motion sensors distributed through the living space. The sensors aren’t associated with any particular room at the outset: their software algorithmically determines the relative positions of the sensors, then classifies the rooms that they’re in based on activity patterns over the course of the day. Read more…
Matching the missing to the dead involves reconciling two national databases.
Javier Reveron went missing from Ohio in 2004. His wallet turned up in New York City, but he was nowhere to be found. By the time his parents arrived to search for him and hand out fliers, his remains had already been buried in an unmarked indigent grave. In New York, where coroner’s resources are precious, remains wait a few months to be claimed before they’re buried by convicts in a potter’s field on uninhabited Hart Island, just off the Bronx in Long Island Sound.
The story, reported by the New York Times last week, has as happy an ending as it could given that beginning. In 2010 Reveron’s parents added him to a national database of missing persons. A month later police in New York matched him to an unidentified body and his remains were disinterred, cremated and given burial ceremonies in Ohio.
Reveron’s ordeal suggests an intriguing, and impactful, machine-learning problem. The Department of Justice maintains separate national, public databases for missing people, unidentified people and unclaimed people. Many records are full of rich data that is almost never a perfect match to data in other databases — hair color entered by a police department might differ from how it’s remembered by a missing person’s family; weights fluctuate; scars appear. Photos are provided for many missing people and some unidentified people, and matching them is difficult. Free-text fields in many entries describe the circumstances under which missing people lived and died; a predilection for hitchhiking could be linked to a death by the side of a road.
I’ve called the Department of Justice (DOJ) to ask about the extent to which they’ve worked with computer scientists to match missing and unidentified people, and will update when I hear back. One thing that’s not immediately apparent is the public availability of the necessary training set — cases that have been successfully matched and removed from the lists. The DOJ apparently doesn’t comment on resolved cases, which could make getting this data difficult. But perhaps there’s room for a coalition to request the anonymized data and manage it to the DOJ’s satisfaction while distributing it to capable data scientists.