ENTRIES TAGGED "operations"
Compelling large-scale data platforms originate from the world of IT Operations
I’ve been noticing that many interesting big data systems are coming out of IT operations. These are systems that go beyond the standard “capture/measure, display charts, and send alerts”. IT operations has long been a source of many interesting big data1 problems and I love that it’s beginning to attract the attention2 of many more data scientists and data engineers.
It’s not surprising that many of the interesting large-scale systems that target time-series and event data have come from ops teams: in an earlier post on time-series, several of the tools I highlighted came out of IT operations. IT operations involves monitoring many different hardware and software systems, a task that requires a variety of tools and which quickly leads to “metrics overload”. A partial list includes data captured from a wide range of application log files, network traffic, energy and power sources.
The volume of IT ops data has led to new tools like OpenTSDB and KairosDB – time series databases that leverage HBase and Cassandra. But storage, simple charts, and lookups are just the foundation of what’s needed. IT Ops track many interdependent systems, some of which might be correlated3. Not only are IT ops faced with highlighting “unknown unknowns” in their massive data sets, they often need to do so in near realtime.
Companies that employ data feedback loops are poised to dominate their industries.
We're moving beyond an information economy. The efficiencies and optimizations that come from constant and iterative feedback will soon become the norm for businesses and governments.
The growing popularity of Big Data management tools (Hadoop; MPP, real-time SQL, NoSQL databases; and others) means many more companies can handle large amounts of data. But how do companies analyze and mine their vast amounts of data? For companies that already have large amounts of data in Hadoop, there's room for even simpler tools that would allow business users to directly interact with Big Data.
On March 11 Boston will join several other cities who have host conferences on the movement broadly known as NoSQL. Cassandra, CouchDB, HBase, HypergraphDB, Hypertable, Memcached, MongoDB, Neo4j, Riak, SimpleDB, Voldemort, and probably other projects as well will be represented at the one-day affair. The interviews I had with various projects leaders for this article turned up a recurring usage pattern for NoSQL. What connects the users is that they carry out web-related data crunching, searching, and other Web 2.0 related work. I think these companies use NoSQL tools because they’re the companies who understand leading-edge technologies and are willing to take risks in those areas. As the field gets better known, usage will spread.