Making sense of the hype-cycle scuffle.
The big data world is a confusing place. We’re no longer in a market dominated mostly by relational databases, and the alternatives have multiplied in a baby boom of diversity.
These child prodigies of the data scene show great promise but spend a lot of time knocking each other around in the schoolyard. Their egos can sometimes be too big to accept that everybody has their place, and eyeball-seeking media certainly doesn’t help.
POPULAR KID: Look at me! Big data is the hotness!
HADOOP: My data’s bigger than yours!
SCIPY: Size isn’t everything, Hadoop! The bigger they come, the harder they fall. And aren’t you named after a toy elephant?
R: Backward sentences mine be, but great power contains large brain.
SQL: Oh, so you all want to be friends again now, eh?!
POPULAR KID: Yeah, what SQL said! Nobody really needs big data; it’s all about small data, dummy.
It's not about IT buying, but about making data work for you. Learn more in the Big Data in Enterprise IT program at Strata California.
In a world where technology and business are evermore intertwined, IT leaders aspire to key roles in their organizations. Sadly, industry conferences can lag behind, assuming IT is all about making the right buying decisions.
Not so at Strata.
Our approach is to take a view of data for business that centers around the problems you need to solve. The excitement around big data isn’t really about large volumes of data, it’s about smart use of data. It’s about using data to make your products better, help you be significantly more efficient, and create new products and businesses.
Getting the most from big data and data science is a lot more than a software choice. The business aims come first, and a good understanding of the problems you want to solve. Then you need to understand the capabilities of the technology and where data science can be best applied. Finally, you need to know how to run successful data projects, and how to hire and manage data teams.
Working with analytics and BI expert Mark Madsen, I’ve compiled a day-long program at Strata called Big Data in Enterprise IT that will take you through big data strategy, the issues of managing data, and how data science can be used effectively in your organization. Read more…
Diversity and manageability are big data watchwords for the next 12 months.
Here are some of the key big data themes I expect to dominate 2013, and of course will be covering in Strata.
Emergence of a big data architecture
The coming year will mark the graduation for many big data pilot projects, as they are put into production. With that comes an understanding of the practical architectures that work. These architectures will identify:
- best of breed tools for different purposes, for instance, Storm for streaming data acquisition
- appropriate roles for relational databases, Hadoop, NoSQL stores and in-memory databases
- how to combine existing data warehouses and analytical databases with Hadoop
Of course, these architectures will be in constant evolution as big data tooling matures and experience is gained.
In parallel, I expect to see increasing understanding of where big data responsibility sits within a company’s org chart. Big data is fundamentally a business problem, and some of the biggest challenges in taking advantage of it lie in the changes required to cross organizational silos and reform decision making.
One to watch: it’s hard to move data, so look for a starring architectural role for HDFS for the foreseeable future. Read more…
Why we all need to understand and use big data.
Where does all the data in “big data” come from? And why isn’t big data just a concern for companies such as Facebook and Google? The answer is that the web companies are the forerunners. Driven by social, mobile, and cloud technology, there is an important transition taking place, leading us all to the data-enabled world that those companies inhabit today.
From exoskeleton to nervous system
Until a few years ago, the main function of computer systems in society, and business in particular, was as a digital support system. Applications digitized existing real-world processes, such as word-processing, payroll and inventory. These systems had interfaces back out to the real world through stores, people, telephone, shipping and so on. The now-quaint phrase “paperless office” alludes to this transfer of pre-existing paper processes into the computer. These computer systems formed a digital exoskeleton, supporting a business in the real world.
The arrival of the Internet and web has added a new dimension, bringing in an era of entirely digital business. Customer interaction, payments and often product delivery can exist entirely within computer systems. Data doesn’t just stay inside the exoskeleton any more, but is a key element in the operation. We’re in an era where business and society are acquiring a digital nervous system.
A free handbook for anybody wanting to understand and use big data.
"Planning for Big Data" is a new book that helps you understand what big data is, why it matters, and where to get started.
A look at data market offerings from four providers.
Strata chair Edd Dumbill provides an overview of the most mature data markets (Infochimps, Factual, Windows Azure Data Marketplace, DataMarket), and contrasts their different approaches and facilities.
How do the cloud offerings from Amazon, Google and Microsoft compare?
Big data and cloud technology go hand-in-hand: but it's comparatively early days. Strata conference chair Edd Dumbill explains the cloud landscape and compares the offerings of Amazon, Google and Microsoft.
A look at the components and functions of the Hadoop ecosystem.
Apache Hadoop has been the driving force behind the growth of the big data industry. But what does it do, and why do you need all its strangely-named friends, such as Oozie, Zookeeper and Flume?
Hadoop is a central part of Microsoft's data strategy.
Strata conference chair Edd Dumbill takes a look at Microsoft's plans for big data. By embracing Hadoop, the company aims to keep Windows and Azure as a standards-friendly option for data developers.
A survey of the Hadoop big data marketplace.
In this survey, Edd Dumbill explores the Hadoop-based big data solutions available on the market, contrasts the approaches of EMC Greenplum, IBM, Microsoft and Oracle and provides an overview of Hadoop distributions.