behind the scenes with datascope analytics
During a trip to Chicago for a conference on R, I had a chance to cowork at the Datascope Analytics (DsA) office. While I had worked with co-founders Mike and Dean before, this was my first time coworking at their office. It was an eye-opening experience. Why? The culture. I saw how this team of data scientists with different backgrounds connected with each other as they worked, collaborated, and joked around. I also observed how intensely present everyone was…whether they were joking or working. I completely understand how much work and commitment it takes to facilitate such a creative and collaborative environment.
Over the next few months, this initial coworking experience led to many conversations with Dean and Mike about building data science teams, Strata, design, and data both in Chicago and the SF Bay Area. I also got to know a few of the other team members such as Aaron, Bo, Gabe, and Irmak. Admittedly, the more I got to know the team, the more intensely curious I became about the human-centered design “ideation” workshops that they hold for clients. According to Aaron, the workshops “combine elements from human-centered design to diverge and converge on valuable and viable ideas, solutions, strategies for our clients. We start by creating an environment that spurs creativity and encourages wild ideas. After developing many different ideas, we cull them down and focus on the ones that are viable to add life and meaning.”
An interview with Allen Downey, the author of Think Bayes
When Mike first discussed Allen Downey’s Think Bayes book project with me, I remember nodding a lot. As the data editor, I spend a lot of time thinking about the different people within our Strata audience and how we can provide what I refer to “bridge resources”. We need to know and understand the environments that our users are the most comfortable in and provide them with the appropriate bridges in order to learn a new technique, language, tool, or …even math. I’ve also been very clear that almost everyone will need to improve their math skills should they decide to pursue a career in data science. So when Mike mentioned that Allen’s approach was to teach math not using math…but using Python, I immediately indicated my support for the project. Once the book was written, I contacted Allen about an interview and he graciously took some time away from the start of the semester to answer a few questions about his approach, teaching, and writing.
How did the “Think” series come about? What led you to start the series?
Allen Downey: A lot of it comes from my experience teaching at Olin College. All of our students take a basic programming class in the first semester, and I discovered that I could use their programming skills as a pedagogic wedge. What I mean is if you know how to program, you can use that skill to learn everything else.
I started with Think Stats because statistics is an area that has really suffered from the mathematical approach. At a lot of colleges, students take a mathematical statistics class that really doesn’t prepare them to work with real data. By taking a computational approach I was able to explain things more clearly (at least I think so). And more importantly, the computational approach lets students dive in and work with real data right away.
At this point there are four books in the series and I’m working on the fifth. Think Python covers Python programming–it’s the prerequisite for all the other books. But once you’ve got basic Python skills, you can read the others in any order.
An interview with Robert Stackowiak, co-author of Oracle Essentials 5th Edition
Robert Stackowiak and Rick Greenwald are super busy. They both handle a lot of responsibilities in their “day” jobs at Oracle. They also managed to squeeze in enough time to complete the latest update to their book, Oracle Essentials. It is currently available here as an early release. Both Robert and Rick took some time out to answer a few questions about the book, how they started working with Oracle, and writing. I’ve focused on Robert for this post but stay tuned for a later post that provides some behind the scenes insight from Rick Greenwald.
How did you get started working with Oracle technologies? How long have you been working with them?
Robert Stackowiak: I started working with the Oracle Database while at the U.S. Army Corps of Engineers in St. Paul in about 1984. The St. Paul District was one of the first Corps Districts to use the Oracle Database to write applications.
What is your day-to-day job with data entail?
Robert Stackowiak: I help guide Oracle’s sales and architecture teams and Oracle’s customers in defining and building information architectures. This includes all aspects including databases, Big Data solutions, business intelligence and data discovery tools, and integration tools and strategies.
An interview with Rick Copeland, the author of MongoDB Applied Design Patterns
At a recent MongoDB SF event, I had a chance to meet Rick Copeland. He was in town and stopped by the event to sign copies of his book, MongoDB Applied Design Patterns. While I am not Rick’s editor, I approached him to see if he would be okay with me filming the book signing as well as participating in a follow-up written interview. He agreed. It was great to catch a bit of footage of the event as well as have a chance to ask Rick about how he started working with MongoDB, why he wrote the book, and how he balances a busy schedule filled with working, writing, and speaking.
How did you get started working with MongoDB?
Rick Copeland: I started using MongoDB at Sourceforge in 2009. Just before I came on board, the decision had been made to base the next generation of SourceForge on MongoDB instead of relational databases. The driving factors behind this decision were some internally-conducted benchmarks and a developer love of the document-oriented model.
Recommended resources from a former analyst
I was pretty cranky before I spoke with Q Ethan McCallum on the phone today.
I was cranky from absorbing the NSA news dominating many data conversations. There is a lot of yammering going on. Some good. Some super bad. My crankiness dissolved a bit after speaking with Q and other Chicago-based people who are working on positive impact data science projects. You’ll be seeing more from Q and these other data science people within the Strata blog very soon. Utilizing data for positive change makes me happy.
My crankiness also dissolved when I decided to not provide summary points on a few articles covering the latest NSA leaks for the Strata Week element. Instead, I decided to pretend that I was an analyst again and think about the resources that I would have wanted to visit in order to form my own insights and analysis.
Recommended Resources for Analysts
- The Guardian. Interested in reviewing the leaked documents and forming your own insights? The Guardian’s “Read the Documents” section will be very useful.
- U.S. House of Representatives Permanent Select Committee on Intelligence. There is always more than one side to a story. The latest committee updates are available as well as videos of recent hearings.
- Office of the Director of National Intelligence. Specifically the Federal Agency Data Mining Report and finding out very quickly how the U.S. government defines “data mining”. We should all be aware of this.
- Accumulo. Are you technically-oriented and want to understand more about the database that grew up within the NSA? Then you should look at the Wired coverage on Accumolo for background and then take a look around at the open source project.
- ProPublica. I visit this investigative journalism site often and as a full disclosure, I have also donated personal money to ProPublica.
- Techmeme. While there are a lot of aggregators available, this is my go-to aggregator.
A much needed break away from data transparency and privacy issues
I could have focused on the Governments Search for Google Data visualization from Chris Canipe and Madeline Farbman of the Wall Street Journal. Or, I could have focused on Neal Ungerleider’s piece that covers Eric Fisher and MapBox for Gnip’s twitter metadata visualizations. Yet, my curiosity took over once I came across The Economist’s High Spirits graphic. Not only do I make my own bitters which qualifies me for preliminary booze nerd status, I also needed a brief break away from the transparency issues currently dominating the data-oriented conversations. Following my booze nerd curiosity led me to this interactive data visualization of common cocktail ingredients:
An interview with Kristina Chodorow, author of MongoDB: The Definitive Guide, Second Edition
We launched the second edition of Kristina Chodorow’s book, MongoDB: The Definitive Guide at a recent MongoDB conference in San Francisco. Everyone worked hard to make this happen. I filmed a little behind the scenes video with my phone in order to share it with everyone that worked on the book. After I filmed it, I decided to post the video as well as an interview with Kristina. Both the video and interview provide snippets of what it is like to work on the second edition of the MongoDB: The Definitive Guide.
What inspired you to become a software engineer?
Kristina Chodorow: In college, I took a computer science class because it would count towards my math major. I was programming a tic-tac-toe game and thought, “Why can’t I just program it to try to win?” and then I realized I could figure out the actual logic of “trying to win.” I thought that was the coolest thing ever. I took a couple more programming classes, joined the programming team, and started doing CS research. By the time I graduated, I knew I was going to be a programmer.
How did you land at 10gen?
An interview with Scott Murray, author of Interactive Data Visualization for the Web
Scott Murray, a code artist, has written Interactive Data Visualization for the Web for nonprogrammers. In this interview, Scott provides some insights on what inspired him to write an introduction to D3 for artists, graphic designers, journalists, researchers, or anyone that is looking to begin programming data visualizations.
What inspired you to become a code artist?
Scott Murray: I had designed websites for a long time, but several years ago was frustrated by web browsers’ limitations. I went back to school for an MFA to force myself to explore interactive options beyond the browser. At MassArt, I was introduced to Processing, the free programming environment for artists. It opened up a whole new world of programmatic means of manipulating and interacting with data — and not just traditional data sets, but also live “data” such as from input devices or dynamic APIs, which can then be used to manipulate the output. Processing let me start prototyping ideas immediately; it is so enjoyable to be able to build something that really works, rather than designing static mockups first, and then hopefully, one day, invest the time to program it. Something about that shift in process is both empowering and liberating — being able to express your ideas quickly in code, and watch the system carry out your instructions, ultimately creating images and experiences that are beyond what you had originally envisioned.
An interview with Alistair Croll and Benjamin Yoskovitz on using lean analytics in a startup
Alistair Croll and Benjamin Yoskovitz wrote the upcoming book Lean Analytics: Use Data to Build a Better Startup Faster. In the following interview, they discuss the inspiration behind their book, the unique aspects of using analytics in a startup environment, and more.
What inspired both of you to write your book?
A big part of the inspiration came from our work with Year One Labs, an early stage accelerator that we co-founded with two other partners in 2010. We implemented a Lean Startup program that we put the startups through and provided them with up to 12 months of hands-on mentorship. We saw with these companies as well as others that we’ve worked on ourselves, advised and invested in, that they struggled with what to measure, how to measure it, and why to measure certain things.
The core principle of Lean Startup is build, measure, and learn. While most entrepreneurs understand the “build” part since they’re often technical founders that are excellent at building stuff, they had a hard time with the measure and learn parts of the cycle. Lean Analytics is a way of codifying that further, without being overly prescriptive. We hope it provides a practical and deeper guide to implementing Lean Startup principles successfully and using analytics to genuinely affect your business.
What are some of the unique aspects to using analytics in a startup environment?
One of the biggest challenges with using analytics in a startup environment is the vast amount of unknowns that a startup faces. Startups don’t really know what they are at the beginning. In fact, they shouldn’t even be building a product to solve a problem. In many ways they’re building products to learn what to build. Learning in an environment of risk and uncertainty is hard. So tracking things is also hard. Startups are also heavily influenced by what they see around them. They see companies that seem to be growing really quickly, the latest hottest trend, competition and so on. Those influences can negatively affect a startup’s focus and the rigorous approach needed to find true insight and grow a real business. Lean Analytics is meant to poke a hole in an entrepreneur’s reality distortion field, and encourage…or force! … a level of focus and attention that can cut out the noise and help founders move as quickly as possible without doing so blindly.