In-depth Strata community profile on Kira Radinsky
Kira Radinsky started coding at the age of four, when her mother and aunt encouraged her to excel at one of her favorite computer games by writing a few simple lines of code. Since then, she’s been a firecracker in the field of predictive analytics, building algorithms to improve business interactions, and create a data-driven economy, and in the past, building systems to detect outbreaks of disease and social unrest around the world. She also gave a predictive analytics talk at the last Strata.
I had a conversation with Kira last month about her entry into the field and her most exciting moments thus far.
When did you first become interested in science?
Kira Radinsky: When I was four or five, my mom bought me a computer game. In order to go to the next level, you had solve simple math problems, which became increasingly harder with time. At one point I couldn’t solve one of the problems. Then I asked my aunt for help because she was a software engineer. She showed me how to write some very simple code in order to proceed to the next level in the game. This was my first time to actually code something.
In the army, I was a software engineer. I built big systems. I felt that I was contributing to my country and it was amazing for me. When I finished my service, I was accepted to the excellence program at the Technion [Israel Institute of Technology] because I had already started studying there when I was 15. I just continued on to a graduate degree.
I knew I wanted to do something in the field of artificial intelligence, because I really wanted to pursue the idea of using computers to make a global impact. I was really into that. I realized that the vast data amounts that we produce could be used to solve important problems.
In 2011, thousands of birds fell out of the sky on New Years Eve. People were writing “we don’t know what’s going on”. It was a conundrum. A few days later, a hundred thousand fish washed up dead on the shore. Many people were saying that it was the end of the world because it was the end of the Mayan calendar!
Apps reflect the public's pressing health concerns
Health care is migrating from the bricks-and-mortar doctor’s office or care clinic to the person him or herself at home and on-the-go–where people live, work, play, and pray. As people take on more do-it-yourself (DIY) approaches to everyday life–investing money on financial services websites, booking airline tickets and hotel rooms online, and securing dinner reservations via OpenTable–many also ask why they can’t have more convenient access to health care, like emailing doctors and looking into lab test results in digital personal health records.
The public clamor for digital outreach by health providers
85% of U.S. health consumers say that email, text messages, and voicemail are at least as helpful as in-person or phone conversations with health providers, according to the Healthy World study, Technology Beyond the Exam Room by TeleVox. Furthermore, one in three consumers admits to being more honest when talking about medical needs via automated voice response systems, emails, or texts than face-to-face with a health provider.
And three in ten consumers believe that receiving digital health care communications from providers—such as texts, voicemail, or email—would build trust with their providers. Half of people also say they’d feel more valued as a patient via digital health communications. When people look to engage in health with an organization, the most important enabling factors are trust and authenticity.
"Do you want to become a farmer?!” In a sense, yes.
Two years ago an informal group met for drinks in downtown Palo Alto: a mix of grad students, investors, and data science experts in Silicon Valley. In the back and forth of our conversation, we took turns describing planned projects. At the time, prominent VC firms were racing headlong into health care ventures. Much of our group seemed pointed in that direction.
In my turn, I mentioned one word: Agriculture.
That drew laughter, “You want to become a farmer?!”
In a sense, yes.
Impact of data science beyond silicon valley
Practices involving large-scale data, machine learning, cluster computing, etc., toppled entire sectors over the past decade. Retail (Amazon) went first, followed closely by Advertising (Google). Automotive (Tesla) may be next. Clearly, the impact of data science has moved beyond Silicon Valley, with mainstream industries leveraging data that matters… not simply to improve marketing funnels, rather to overhaul their supply chains, manufacturing, global deployments, etc. Advances in remote sensing and “Industrial Internet” accelerate that process, with IoT data rates growing orders of magnitude beyond what social networks have experienced, compelling new technologies.
Sometimes when a group of insiders starts guffawing, there is perhaps a subtle point being missed. Consider that Silicon Valley has spent the past decade extracting billions from e-commerce, ad-tech, social networks, anti-fraud, etc. Extracting is the quintessential word there. I wondered: among the industries outside of Silicon Valley undergoing disruptions due to large-scale data, where did Agriculture fit? Why did it seem laughable to experts as a data science opportunity?
General-purpose platforms can come across as hammers in search of nails
As much as I love talking about general-purpose big data platforms and data science frameworks, I’m the first to admit that many of the interesting startups I talk to are focused on specific verticals. At their core big data applications merge large amounts of real-time and static data to improve decision-making:
This simple idea can be hard to execute in practice (think volume, variety, velocity). Unlocking value from disparate data sources entails some familiarity with domain-specific1 data sources, requirements, and business problems.
It’s difficult enough to solve a specific problem, let alone a generic one. Consider the case of Guavus – a successful startup that builds big data solutions for the telecom industry (“communication service providers”). Its founder was very familiar with the data sources in telecom, and knew the types of applications that would resonate within that industry. Once they solve one set of problems for a telecom company (network optimization), they quickly leverage the same systems to solve others (marketing analytics).
This ability to address a variety of problems stems from Guavus’ deep familiarity with data and problems in telecom. In contrast, a typical general-purpose platform can come across as a hammer in search of a nail. So while I remain a fan (and user) of general-purpose platforms, the less well-known verticalized solutions are definitely on my radar.
Better tools can’t overcome poor analysis
I’m not suggesting that the criticisms raised against big data don’t apply to verticalized solutions. But many problems are due to poor analysis and not the underlying tools. A few of the more common criticisms arise from analyzing correlations: correlation is not causation, correlations are dynamic and can sometimes change drastically2, and data dredging3.
- The backlash against big data, continued
- The CFP for Strata New York + Hadoop World 2014 is now open!
- Strata Santa Clara 2014 Video Compilation
- Financial analytics as a service
(0) This post grew out of a recent conversation with Guavus founder, Anukool Lakhina.
(1) General-purpose platforms and components are helpful, but they usually need to be “tweaked” or “optimized” to solve problems in a variety of domains.
(2) When I started working as a quant at a hedge fund, traders always warned me that correlations jump to 1 during market panics.
(3) The best example comes from finance and involves the S&P 500 and butter production in Bangladesh.
Strata SC 2014 Session Postmortem
In February, GraphLab took a road trip to Strata, a Big Data conference organized by O’Reilly. It was a gathering of close to 3100 people–engineers, business folks, industry evangelists, and data scientists. We had a lot of fun meeting and socializing with our peers and customers. Amidst all the conference excitement, we presented two talks. Carlos Guestrin, our intrepid CEO, held a tutorial on large-scale machine learning. I gave a talk in the Hardcore Data Science track.
Yawn. Yet another article trashing “big data,” this time an op-ed in the Times. This one is better than most, and ends with the truism that data isn’t a silver bullet. It certainly isn’t.
I’ll spare you all the links (most of which are much less insightful than the Times piece), but the backlash against “big data” is clearly in full swing. I wrote about this more than a year ago, in my piece on data skepticism: data is heading into the trough of a hype curve, driven by overly aggressive marketing, promises that can’t be kept, and spurious claims that, if you have enough data, correlation is as good as causation. It isn’t; it never was; it never will be. The paradox of data is that the more data you have, the more spurious correlations will show up. Good data scientists understand that. Poor ones don’t.
It’s very easy to say that “big data is dead” while you’re using Google Maps to navigate downtown Boston. It’s easy to say that “big data is dead” while Google Now or Siri is telling you that you need to leave 20 minutes early for an appointment because of traffic. And it’s easy to say that “big data is dead” while you’re using Google, or Bing, or DuckDuckGo to find material to help you write an article claiming that big data is dead.
Data tools are less important than the way you frame your questions.
Max Shron and Jake Porway spoke with me at Strata a few weeks ago about frameworks for making reasoned arguments with data. Max’s recent O’Reilly book, Thinking with Data, outlines the crucial process of developing good questions and creating a plan to answer them. Jake’s nonprofit, DataKind, connects data scientists with worthy causes where they can apply their skills.
A few of the things we talked about:
- The importance of publishing negative scientific results
- Give Directly, an organization that facilitates donations directly to households in Kenya and Uganda. Give Directly was able to model income using satellite data to distinguish thatched roofs from metal roofs.
- Moritz Stefaner calling for a “macroscope”
- Project Cybersyn, Salvador Allende’s plan for encompassing the entire Chilean economy in a single real-time computer system
- Seeing Like a State: How Certain Schemes to Improve the Human Condition Have Failedby James C. Scott
After we recorded this podcast episode at Strata Santa Clara, Max presided over a webcast on his book that’s archived here.
Editor’s note: this post originally appeared on O’Reilly Radar.
Yet again, I reveal the base instincts driving my interest in big data. It’s not the science – it’s the cash. And yes, on some level, I find the idea of all that cash sexy. Yes, I know it’s a failing, but I can’t help it. Maybe in my next life I’ll develop a better appreciation of the finer things, and I will begin to understand the real purpose of the universe…
Until then, however, I’m happy to write about the odd and interesting intersection of big data and big business. As noted in my newest paper, big data is driving a renaissance in IT infrastructure spending. IDC, for example, estimates that worldwide spending for infrastructure hardware alone (servers, storage, PCs, tablets, and peripherals) will rise from $461 billion in 2013 to $468 billion in 2014. Gartner predicts that total IT spending will grow 3.1% in 2014, reaching $3.8 trillion, and forecasts “consistent four to five percent annual growth through 2017.” For a lot of people, including me, the mere thought of all that additional cash makes IT infrastructure seem sexy again.
Of course, there’s more to the story than networks, servers, and storage devices. But when people ask me, “Is this big data thing real? I mean, is it real???” the easy answer is yes, it must be real because lots of companies are spending real money on it. I don’t know if that’s enough to make IT infrastructure sexy, but it sure makes it a lot more fascinating and – dare I say it, intriguing – than it seemed last year.
In life, sex is the key to survival. In business, cash is king. Is there a connection? Read my paper, and please let me know.
Get your free digital copy of Will Big Data Make IT Infrastructure Sexy Again? — compliments of Syncsort.
Insight from a Strata Santa Clara 2014 session
When you think about what goes into winning a Nobel Prize in a field like economics, it’s a lot like machine learning. In order to make a breakthrough, you need to identify an interesting theory for explaining the world, test your theory in practice to see if it holds up, and if it does, you’ve got a potential winner. The bigger and more significant the issue addressed by your theory, the more likely you are to win the prize.
In the world of business, there’s no bigger issue than helping a company be more successful, and that usually hinges on helping it deliver its products to those that need them. This is why I like to describe my company SalesPredict as helping our customers win the Nobel Prize in business, if such a thing existed.
HBase has made inroads in companies across many industries and countries
With HBaseCon right around the corner, I wanted to take stock of one of the more popular1 components in the Hadoop ecosystem. Over the last few years, many more companies have come to rely on HBase to run key products and services. The conference will showcase a wide variety of such examples, and highlight some of the new features that HBase developers have added over the past year. In the meantime here are some things2 you may not have known about HBase:
Many companies have had HBase in production for 3+ years: Large technology companies including Trend Micro, EBay, Yahoo! and Facebook, and analytics companies RocketFuel and Flurry depend on HBase for many mission-critical services.
There are many use cases beyond advertising: Examples include communications (Facebook messages, Xiaomi), security (Trend Micro), measurement (Nielsen), enterprise collaboration (Jive Software), digital media (OCLC), DNA matching (Ancestry.com), and machine data analysis (Box.com). In particular Nielsen uses HBase to track media consumption patterns and trends, mobile handset company Xiaomi uses Hbase for messaging and other consumer mobile services, and OCLC runs the world’s largest online database of library resources on HBase.
Flurry has the largest contiguous HBase cluster: Mobile analytics company Flurry has an HBase cluster with 1,200 nodes (replicating into another 1,200 node cluster). Flurry is planning to significantly expand their large HBase cluster in the near future.