The amount of data being produced is increasing exponentially, which raises big questions about security and ownership. Do we need to be more concerned about the information many of us readily give out to join popular social networks, sign up for website community memberships, or subscribe to free online email? And what happens to that data once it’s out there?
In a recent interview, Jeff Jonas (@JeffJonas), IBM distinguished engineer and a speaker at the O’Reilly Strata Online Conference, said consumers’ willingness to give away their data is a concern, but it’s perhaps secondary to the sheer number of data copies produced.
Our interview follows.
What is the current state of data security?
Jeff Jonas: A lot of data has been created, and a boatload more is on its way — we have seen nothing yet. Organizations now wonder how they are going to protect all this data — especially how to protect it from unintended disclosure. Healthcare providers, for example, are just as determined to prevent a “wicked leak” as anyone else. Just imagine the conversation between the CIO and the board trying to explain the risk of the enemy within — the “insider threat” — and the endless and ever-changing attack vectors.
I’m thinking a lot these days about data protection, ranging from reducing the number of copies of data to data anonymization to perpetual insider threat detection.
How are advancements in data gathering, analysis, and application affecting privacy, and should we be concerned?
Jeff Jonas: When organizations only collect what they need in order to conduct business, tell the consumer what they are collecting, why and how they are going to use it, and then use it this way, most would say “fair game.” This is all in line with Fair Information Practices (FIPs).
There continues to be some progress in the area of privacy-enhancing technology. For example, tamper-resistant audit logs, which are a way to record how a system was used that even the database administrator cannot alter. On the other hand, the trend that I see involves the willingness of consumers to give up all kinds of personal data in return for some benefit — free email or a fantastic social network site, for example.
While it is hard to not be concerned about what is happening to our privacy, I have to admit that for the most part technology advances are really delivering a lot of benefit to mankind.
What are the major issues surrounding data ownership?
Jeff Jonas: If users continue to give their data away because the benefits are irresistible, then there will be fewer battles, I suppose. The truth about data is that once it is out there, it’s hard to control.
I did a back of the envelope estimate a few years ago to estimate the number of copies a single piece of data may experience. Turns out the number is roughly the same as the number of licks it takes to get to the center of a Tootsie Pop — a play on an old TV commercial that basically translates to more than you can easily count.
A well-thought-out data backup strategy alone may create more than 100 copies. Then what about the operational data stores, data warehouses, data marts, secondary systems and their backups? Thousands of copies would not be uncommon. Even if a consumer thought they could own their data — which they can’t in many settings — how could they ever do anything to affect it?