Real-time data needs to power the business side, not just tech

Theo Schlossnagle on the state of real-time data analysis and where it needs to go.

In 2005, real-time data analysis was being pioneered and predicted to “transform society.” A few short years later, the technology is a reality and indeed is changing the way people do business. But Theo Schlossnagle (@postwait), principal and CEO of OmniTI, says we’re not quite there yet.

In a recent interview, Schlossnagle said that not only does the current technology allow less-qualified people to analyze data, but that most of the analysis being done is strictly for technical benefit. The real benefit will be realized when the technology is capable of powering real-time business decisions.

Our interview follows.


How has data analysis evolved over the last few years?

TheoSchlossnagle.jpgTheo Schlossnagle: The general field of data analysis has actually devolved over the last few years because the barrier to entry is dramatically lower. You now have a lot of people attempting to analyze data with no sound mathematics background. I personally see a lot of “analysis” happening that is less mature than your run-of-the-mill graduate-level statistics course or even undergraduate-level signal analysis course.

But where does it need to evolve? Storage is cheaper and more readily available than ever before. This leads organizations to store data like its going out of style. This isn’t a bad thing, but it causes a significantly lower signal-to-noise ratio. Data analysis techniques going forward will need to evolve much better noise reduction capabilities.

What does real-time data allow that wasn’t available before?

Theo Schlossnagle: Real-time data has been around for a long time, so in a lot of ways, it isn’t offering anything new. But the tools to process data in real-time have evolved quite a bit. CEP systems now provide a much more accessible approach to dealing with data in real time and building millisecond-granularity real-time systems. In a web application, imagine being about to observe something about a user and make an intelligent decision on that data combined with a larger aggregate data stream — all before before you’ve delivered the headers back to the user.

What’s required to harness real-time analysis?

Theo Schlossnagle: Low-latency messaging infrastructure and a good CEP system. In my work we use either RabbitMQ or ZeroMQ and a whole lot of Esper.

Strata Conference New York 2011, being held Sept. 22-23, covers the latest and best tools and technologies for data science — from gathering, cleaning, analyzing, and storing data to communicating data intelligence effectively.

Save 30% on registration with the code STN11RAD

Does there need to be a single person at a company who collects data, analyzes and makes recommendations, or is that something that can be done algorithmically?

Theo Schlossnagle: You need to have analysts, and I think it is critically important to have them report into the business side — marketing, product, CFO, COO — instead of into the engineering side. We should be doing data analysis to make better business decisions. It is vital to make sure we are always supplied with intelligent and rewarding business questions.

A lot of data analysis done today is technical analysis for technical benefit. The real value is when we can take this technology and expertise and start powering better real-time business decisions. Some of the areas doing real-time analysis well in this regard include finance, stock trading, and high-frequency traders.

Related:

tags: , , ,
  • http://drcoddwasright.blogspot.com Robert Young

    While I agree that the number of knucklehead “data analysts” has mushroomed in recent times, this really isn’t new. “Data analyst” became au courant when 1-2-3 had macros. A spreadsheet was a “database”. Nonsense issued forth. It’s easy, and what’s this Central Limit Theorem?

    As to whether either under-grad or grad stats majors are the only ones who should be doing “data analysis”, well, no. An understanding of Snedecor & Cochran for sampling (totally irrelevant when dealing with population datasets), and Hoel (or similar) for applied stats suffices. You don’t need to understand Fisher or Feller.

    Math stats are interested in deriving new theorems, distributions, and other arcana. They’re not particularly interested in field work. For that, look to econometricians (humble self), psychometricians, and biostatisticians. For integrating, R has drivers for Python, Ruby, C, and most everything else. Even RDBMS with at least RODBC and RJDBC; the PostgreSQL specific one is called PL/R.

    Here’s a short piece on using RJDBC with DB2 (my personal preference), as there is no RDB2 (but there is an R2D2).
    http://www.r-bloggers.com/connecting-to-a-db2-database-from-r/

  • http://www.yourantelopevalleyrealty.com Dave

    This represents a huge opportunity for companies to fill the void – there will be lots of money to be made by businesses able to really crunch the numbers and apply the real time data analysis to relevant business problems.