As the Program Development Director for Strata Santa Clara 2014, I am pleased to announce that the tutorial session descriptions are now live. We’re pleased to offer several day-long immersions including the popular Data Driven Business Day and Hardcore Data Science tracks. We curated these topics as we wanted to appeal to a broad range of attendees including business users and managers, designers, data analysts/scientists, and data engineers. In the coming months we’ll have a series of guest posts from many of the instructors and communities behind the tutorials.
Analytics for Business Users
We’re offering a series of data intensive tutorials for non-programmers. John Foreman will use spreadsheets to demonstrate how data science techniques work step-by-step – a topic that should appeal to those tasked with advanced business analysis. Grammar of Graphics author, SYSTAT creator, and noted Statistician Leland Wilkinson, will teach an introductory course on analytics using an innovative expert system he helped build.
Data Science essentials
Scalding – a Scala API for Cascading – is one of the most popular open source projects in the Hadoop ecosystem. Vitaly Gordon will lead a hands-on tutorial on how to use Scalding to put together effective data processing workflows. Data analysts have long lamented the amount of time they spend on data wrangling. But what if you had access to tools and best practices that would make data wrangling less tedious? That’s exactly the tutorial that distinguished Professors and Trifacta co-founders, Joe Hellerstein and Jeff Heer, are offering.
The co-founders of Datascope Analytics are offering a glimpse into how they help clients identify the appropriate problem or opportunity to focus on by using design thinking (see the recent Datascope/IDEO post on Design Thinking and Data Science). We’re also happy to reprise the popular (Strata Santa Clara 2013) d3.js tutorial by Scott Murray.
Popular among data scientists, the IPython notebook is an emerging “standard” for delivering data workflows (unifying code, visuals and text). Embraced widely by Python programmers, it has made inroads in other communities (Julia, R, and GraphLab). IPython creator, Fernando Perez, and core developer, Brian Granger, are offering an in-depth overview to this popular platform.
If you want to learn about machine-learning, we have back-to-back tutorials from instructors who represent popular open source solutions. I’ve written about scikit-learn – an accessible, well-documented, Python machine-learning library, that’s very easy to use (thanks to IPython). Olivier Grisel (a noted instructor and one of the core contributors to scikit-learn) is offering a hands-on tutorial that will introduce users to machine-learning via scikit-learn. Carlos Guestrin, distinguished Professor and founder of the GraphLab project, will lead a hands-on tutorial in machine-learning and graph analytics using GraphLab – an extremely fast, open source package for graph analytics. As I noted in an earlier post, a new startup is focused on IPython integration, a Python API and other features designed to make GraphLab accessible to a broader user base.
Big Data infrastructure
Companies who want to build a data platform should consider attending the tutorial designed by a team from Silicon Valley Data Science. We’re also bringing back AMP Camp: a popular series of talks and hands-on sessions on the Berkeley Data Analytics Stack (BDAS). Inspired by technology developed inside Google, Mesos was the original BDAS component. Mesosphere co-founder Florian Liebert and O’Reilly author Paco Nathan, will offer a tutorial on building data workflows with Mesos and related components (Chronos and Marathon).
Finally we have two tutorials on important developments and components in the Hadoop Ecosystem. Rich Raposa of Hortonworks will give an in-depth overview of Hadoop 2.0 (“… a generational shift in the architecture of Apache Hadoop”). Ronan Stokes of Cloudera will help attendees learn how to build realtime applications using Apache HBase.