The McKinsey Global Institute forecasts a shortage of over 140,000 data scientists in the U.S. by 2018. I forecast a shortage of 140,000 people to explain to their respective hiring managers that make it Hadoop is not an appropriate articulation of what these people can or should do. If big data is the new bubble, then here’s to the prolonged correct data recession that hopefully follows.
Correct data? Such skills used to be called unsexy names like statistics or scientific experiments, but we now prefer to spice up the job titles (and salaries!) a bit and brand ourselves as data scientists, data storytellers, data prophets, or—if my next promotion comes through—Lord High Chancellor of Data, appointed by the Sovereign on the advice of the Prime Minister to oversee Her Majesty’s Terabytes. Modesty, it sometimes feels, is low on the burgeoning list of big data skills.
If you’ve read Nate Silver’s latest bestseller, or seen the movie Moneyball, or ever bought a lottery ticket, you’ve witnessed that modesty (the kind related to not overstepping one’s confidence in a forecast) is inseparable from predictive science. In running predictive competitions for Kaggle, we’ve witnessed that solving the data problems to which McKinsey refers often doesn’t require huge clusters, long job titles, or intimidating scientific prefixes. Instead, it’s the nuanced, quiet skills that matter. It’s getting the small questions right on well-posed problems—not “deriving actionable insights” from meandering terabytes of digital exhaust—that most often creates value from data. Our upcoming session at Strata, Just the Basics: Core Data Science Skills, is a celebration of these small and accessible concepts.
We’ll start with the easy thing that always gets skipped: identifying a problem exists and deciding whether data has the right to solve it. From here, we’ll walk (and sometimes jog, but not run) through the steps of loading, analyzing, and packaging the analysis into a tell-able story. It’s the basics, as told by two data scientists through a little R, a little Python, a little Matlab, and a lot of jokes. It’s the chance to whet your numerical appetite before you dig in to Strata’s banquet of delectable detail.
We’re planning on an interactive and ego-free tutorial. If anyone speaks of functional monads, NoSQL databases, or even so much as hints at making a matrix that won’t fit in memory, we’ll politely ask them to exit the room. The final hour of the tutorial will comprise a live predictive modeling competition, in which attendees will confront a data problem and witness their performance in real time. It has a reasonable chance of being fun, and our regression models tell us you have a 16% chance of learning something. We hope you’ll take those odds, and maybe even be able to calculate them yourself.