ENTRIES TAGGED "privacy"
Would you let people know about your dandruff problem if it might mean a cure for Lupus?
Two weeks ago, I had the privilege to attend the 2013 Genomes, Environments and Traits conference in Boston, as a participant of Harvard Medical School’s Personal Genome Project. Several hundreds of us attended the conference, eager to learn what new breakthroughs might be in the works using the data and samples we have contributed, and to network with the researchers and each other.
The Personal Genome Project (PGP) is a very different type of beast from the traditional research study model, in several ways. To begin with, it is a Open Consent study, which means that all the data that participants donate is available for research by anyone without further consent by the subject. In other words, having initially consented to participate in the PGP, anyone can download my genome sequence, look at my phenotypic traits (my physical characteristics and medical history), or even order some of my blood from a cell line that has been established at the Coriell biobank, and they do not need to gain specific consent from me to do so. By contrast, in most research studies, data and samples can only be collected for one specific study, and no other purposes. This is all in an effort to protect the privacy of the participants, as was famously violated in the establishment of the HeLa cell line.
The other big difference is that in most studies, the participants rarely receive any information back from the researchers. For example, if the researcher does a brain MRI to gather data about the structure of a part of your brain, and sees a huge tumor, they are under no obligation to inform you about it, or even to give you a copy of the scan. This is because researchers are not certified as clinical laboratories, and thus are not authorized to report medical findings. This makes sense, to a certain extent, with traditional medical tests, as the research version may not be calibrated to detect the same things, and the researcher is not qualified to interpret the results for medical purposes.
Preview of upcoming session at Strata Santa Clara
At the end of 2012, the Federal Trade Commission (“FTC”) hosted the public workshop, “The Big Picture – Comprehensive Online Data Collection,” which focused on privacy concerns relating to the comprehensive collection of consumer online data by Internet service providers (“ISPs”), operating systems, browsers, search engines, and social media. During the workshop, panelists debated the impact of service providers’ ability to collect data about computer and device users across unaffiliated websites, including when some entities have no direct relationship with such users.
As one example of the issues raised by the panelists, Professor Neil Richards, from the Washington University in St. Louis School of Law, stated that, despite its benefits, comprehensive data collection infringes on the concept of “intellectual privacy,” which is predicated on consumers’ ability to freely search, interact, and express themselves online. Professor Richards also stated that comprehensive data collection is creating a transformational power shift in which businesses can effectively persuade consumers based on their knowledge of consumer preferences. Yet, according to Professor Richards, few consumers actually understand “the basis of the bargain,” or the extent to which their information is being collected.
The biggest threat that a data-driven world presents is an ethical one.
Since the first of our ancestors chipped stone into weapon, technology has divided us. Seldom more than today, however: a connected, always-on society promises health, wisdom, and efficiency even as it threatens an end to privacy and the rise of prejudice masked as science.
On its surface, a data-driven society is more transparent, and makes better uses of its resources. By connecting human knowledge, and mining it for insights, we can pinpoint problems before they become disasters, warding off disease and shining the harsh light of data on injustice and corruption. Data is making cities smarter, watering the grass roots, and improving the way we teach.
But for every accolade, there’s a cautionary tale. It’s easy to forget that data is merely a tool, and in the wrong hands, that tool can do powerful wrong. Data erodes our privacy. It predicts us, often with unerring accuracy — and treating those predictions as fact is a new, insidious form of prejudice. And it can collect the chaff of our digital lives, harvesting a picture of us we may not want others to know.
The big data movement isn’t just about knowing more things. It’s about a fundamental shift from scarcity to abundance. Most markets are defined by scarcity — the price of diamonds, or oil, or music. But when things become so cheap they’re nearly free, a funny thing happens.
Consider the advent of steam power. Economist Stanley Jevons, in what’s known as Jevons’ Paradox, observed that as the efficiency of steam engines increased, coal consumption went up. That’s not what was supposed to happen. Jevons realized that abundance creates new ways of using something. As steam became cheap, we found new ways of using it, which created demand.
The same thing is happening with data. A report that took a month to run is now just a few taps on a tablet. An unthinkably complex analysis of competitors is now a Google search. And the global distribution of multimedia content that once required a broadcast license is now an upload. Read more…
Ann Waldo examines obstacles to patient data and offers specific reforms that can help.
Ann Waldo, a partner in Wittie, Letsche & Waldo, LLP in Washington, DC, presents a summary of her work in the webcast “Overview of Privacy Concerns and Regulatory Challenges Concerning Personalized Medicine — and Some Modest Suggestions for Change.” This was part of the Strata Rx Online Conference: Personalized Medicine, a preview of O’Reilly’s conference Strata Rx, highlighting the use of data in medical research and delivery.
Waldo highlighted how HIPAA regulations and other laws passed by federal and state governments contain restrictions that make research with patient data unnecessarily difficult. She offers several suggestions for reform.
Key points from her presentation include: Read more…
Bitsy Bentley on the work behind a good visualization and why she hopes users will take data interactions for granted.
Because of the size, complexity and density of big data, it’s not always easy to find the important insights hiding in all that information. That’s where data visualization comes into play. A great visualization creates meaning where none existed.
Bitsy Bentley (@bitsybot) is the director of data visualization at GfK Custom Research, where she works with information designers to craft meaningful data experiences for a variety of business audiences. In the following interview, she discusses the space between a “wow” response and an “aha” moment, how her team addresses privacy concerns, and why practice is vital for both visualization creators and viewers.
Bentley will explore related visualization topics during her presentation at Strata Conference + Hadoop World in New York City later this month.
Why are data visualizations an effective way to understand the underlying data?
Bitsy Bentley: There is so much beauty and richness in big datasets, and now that we have enough processing power to harness that richness, it’s little wonder that interest in data visualization is exploding. To quote John Tukey: “The greatest value of a picture is when it forces us to notice what we never expected to see.” My clients find that, whether they’re more concerned with numbers or more concerned with stories, an appropriate visual is integral to their understanding of the data.
Visualization unlocks the serendipity of data analysis. It provides a language that is less intimidating than an overwhelming array of digits. Something as simple as a set of histograms breaking down the distribution of a data store makes it easy to find irregularities and outliers in the data. Read more…
Controversy over a famous privacy research project
Daniel Barth-Jones, an epidemiologist and expert on health data privacy, has published an examination of the sensitive issue of re-identifying patients. This is worthwhile reading for anyone interested in the use of patient data for improving health care. He has blogged about his key findings, but I suggest reading his full paper for the recommendations he makes.
Kickstarter data gets mined, the UDID breach source is identified, and worldwide big data to be measured one smartphone at a time.
Here are few stories from the data space that caught my attention this week.
Data mining Kickstarter
ThingsWeStart, an interactive visualization map designed to aid in Kickstarter project discovery, launched this week. The map tracks Kickstarter projects in real time and mines Kickstarter data to allow users to drill down to very specific project search results using filter combinations, something you can’t do within Kickstarter itself. For instance, here’s a screenshot of all the projects in San Francisco:
Notice on the right, there’s a list of the top 25 projects in the designated area, buttons to sort by funding amounts reached, and a field to request an email notification about new projects that come to the specified area. Here’s that same map, drilled down into three project categories:
The geography can be isolated in an area down to the zip code, so you can isolate projects in Brooklyn as opposed to New York City, for instance. (Click here for the live map.)
Further reading and discussion on the civil rights implications of big data.
A few weeks ago, I wrote a post about big data and civil rights, which seems to have hit a nerve. It was posted on Solve for Interesting and here on Radar, and then folks like Boing Boing picked it up.
I haven’t had this kind of response to a post before (well, I’ve had responses, such as the comments to this piece for GigaOm five years ago, but they haven’t been nearly as thoughtful).
Some of the best posts have really added to the conversation. Here’s a list of those I suggest for further reading and discussion:
Nobody notices offers they don’t get
On Oxford’s Practical Ethics blog, Anders Sandberg argues that transparency and reciprocal knowledge about how data is being used will be essential. Anders captured the core of my concerns in a single paragraph, saying what I wanted to far better than I could:
… nobody notices offers they do not get. And if these absent opportunities start following certain social patterns (for example not offering them to certain races, genders or sexual preferences) they can have a deep civil rights effect
To me, this is a key issue, and it responds eloquently to some of the comments on the original post. Harry Chamberlain commented:
However, what would you say to the criticism that you are seeing lions in the darkness? In other words, the risk of abuse certainly exists, but until we see a clear case of big data enabling and fueling discrimination, how do we know there is a real threat worth fighting?
In the age of big data, Deven McGraw emphasizes trust, education and transparency in assuring health privacy.
Society is now faced with how to balance the privacy of the individual patient with the immense social good that could come through great health data sharing. Making health data more open and fluid holds both the potential to be hugely beneficial for patients and enormously harmful. As my colleague Alistair Croll put it this summer, big data may well be a civil rights issue that much of the world doesn’t know about yet.
This will likely be a tension that persists throughout my lifetime as technology spreads around the world. While big data breaches are likely to make headlines, more subtle uses of health data have the potential to enable employers, insurers or governments to discriminate — or worse. Figuring out shopping habits can also allow a company to determine a teenager was pregnant before her father did. People simply don’t realize how much about their lives can be intuited through analysis of their data exhaust.
To unlock the potential of health data for the public good, informed consent must mean something. Patients must be given the information and context for how and why their health data will be used in clear, transparent ways. To do otherwise is to duck the responsibility that comes with the immense power of big data.
In search of an informed opinion on all of these issues, I called up Deven McGraw (@HealthPrivacy), the director of the Health Privacy Project at the Center for Democracy and Technology (CDT). Our interview, lightly edited for content and clarity, follows. Read more…
In a world of big, open data, "privacy by design" will become even more important.
A few weeks ago, Tom Slee published “Seeing Like a Geek,” a thoughtful article on the dark side of open data. He starts with the story of a Dalit community in India, whose land was transferred to a group of higher cast Mudaliars through bureaucratic manipulation under the guise of standardizing and digitizing property records. While this sounds like a good idea, it gave a wealthier, more powerful group a chance to erase older, traditional records that hadn’t been properly codified. One effect of passing laws requiring standardized, digital data is to marginalize all data that can’t be standardized or digitized, and to marginalize the people who don’t control the process of standardization.
That’s a serious problem. It’s sad to see oppression and property theft riding in under the guise of transparency and openness. But the issue isn’t open data, but how data is used.