Arnab Gupta is the CEO of Opera Solutions, an international company offering big data analytics services. I had the chance to chat with him recently about the massive task of managing big data and how humans and machines intersect. Our interview follows.
Tell me a bit about your approach to big data analytics.
Arnab Gupta: Our company is a science-oriented company, and the core belief is that behavior — human or otherwise — can be mathematically expressed. Yes, people make irrational value judgments, but they are driven by common motivation factors, and the math expresses that.
I look at the so-called “big data phenomenon” as the instantiation of human experience. Previously, we could not quantitatively measure human experience, because the data wasn’t being captured. But Twitter recently announced that they now serve 350 billion tweets a day. What we say and what we do has a physical manifestation now. Once there is a physical manifestation of a phenomenon, then it can be mathematically expressed. And if you can express it, then you can shape business ideas around it, whether that’s in government or health care or business.
How do you handle rapidly increasing amounts of data?
Arnab Gupta: It’s an impossible battle when you think about it. The amount of data is going to grow exponentially every day, ever week, every year, so capturing it all can’t be done. In the economic ecosystem there is extraordinary waste. Companies spend vast amounts of money, and the ratio of investment to insight is growing, with much more investment for similar levels of insight. This method just mathematically cannot work.
So, we don’t look for data, we look for signal. What we’ve said is that the shortcut is a priori identifying the signals to know where the fish are swimming, instead of trying to dam the water to find out which fish are in it. We focus on the flow, not a static data capture.
What role does visualization play in the search for signal?
Arnab Gupta: Visualization is essential. People dumb it down sometimes by calling it “UI” and “dashboards,” and they don’t apply science to the question of how people perceive. We need understanding that feeds into the left brain through the right brain via visual metaphor. At Opera Solutions, we are increasingly trying to figure out the ways in which the mind understands and transforms the visualization of algorithms and data into insights.
If understanding is a priority, then which do you prefer: a black-box model with better predictability, or a transparent model that may be less accurate?
Arnab Gupta: People bifurcate, and think in terms of black-box machines vs. the human mind. But the question is whether you can use machine learning to feed human insight. The power lies in expressing the black box and making it transparent. You do this by stress testing it. For example, if you were looking at a model for mortgage defaults, you would say, “What happens if home prices went down by X percent, or interest rates go up by X percent?” You make your own heuristics, so that when you make a bet you understand exactly how the machine is informing your bet.
Humans can do analysis very well, but the machine does it consistently well; it doesn’t make mistakes. What the machine lacks is the ability to consider orthogonal factors, and the creativity to consider what could be. The human mind fills in those gaps and enhances the power of the machine’s solution.
So you advocate a partnership between the model and the data scientist?
Arnab Gupta: We often create false dichotomies for ourselves, but the truth is it’s never been man vs. machine; it has always been man plus machine. Increasingly, I think it’s an article of faith that the machine beats the human in most large-scale problems, even chess. But though the predictive power of machines may be better on a large-scale basis, if the human mind is trained to use it powerfully, the possibilities are limitless. In the recent Jeopardy showdown with IBM’s Watson, I would have had a three-way competition with Watson, a Jeopardy champion, and a combination of the two. Then you would have seen where the future lies.
Does this mean we need to change our approach to education, and train people to use machines differently?
Arnab Gupta: Absolutely. If you look back in time between now and the 1850s, everything in the world has changed except the classroom. But I think we are dealing with a phase-shift occurring. Like most things, the inertia of power is very hard to shift. Change can take a long time and there will be a lot of debris in the process.
One major hurdle is that the language of machine-plus-human interaction has not yet begun to be developed. It’s partly a silent language, with data visualization as a significant key. The trouble is that language is so powerful that the left brain easily starts dominating, but really almost all of our critical inputs come from non-verbal signals. We have no way of creating a new form of language to describe these things yet. We are at the beginning of trying to develop this.
Another open question is: What’s the skill set and the capabilities necessary for this? At Opera we have focused on the ability to teach machines how to learn. We have 150-160 people working in that area, which is probably the largest private concentration in that area outside IBM and Google. One of the reasons we are hiring all these scientists is to try to innovate at the level of core competencies and the science of comprehension.
The business outcome of that is simply practical. At the end of the day, much of what we do is prosaic; it makes money or it doesn’t make money. It’s a business. But the philosophical fountain from which we drink needs to be a deep one.
Associated photo on home and category pages: prd brain scan by Patrick Denker, on Flickr