Ann Spencer
Making things happen: from being a software engineer to writing a book
An interview with Kristina Chodorow, author of MongoDB: The Definitive Guide, Second Edition
We launched the second edition of Kristina Chodorow’s book, MongoDB: The Definitive Guide at a recent MongoDB conference in San Francisco. Everyone worked hard to make this happen. I filmed a little behind the scenes video with my phone in order to share it with everyone that worked on the book. After I filmed it, I decided to post the video as well as an interview with Kristina. Both the video and interview provide snippets of what it is like to work on the second edition of the MongoDB: The Definitive Guide.
What inspired you to become a software engineer?
Kristina Chodorow: In college, I took a computer science class because it would count towards my math major. I was programming a tic-tac-toe game and thought, “Why can’t I just program it to try to win?” and then I realized I could figure out the actual logic of “trying to win.” I thought that was the coolest thing ever. I took a couple more programming classes, joined the programming team, and started doing CS research. By the time I graduated, I knew I was going to be a programmer.
How did you land at 10gen?
Kristina Chodorow: After college I started a Ph.D. at Columbia and, although it was a great program, I really didn’t want to go to graduate school and left after a semester. I moved to Seattle to be with a guy and unsurprisingly that didn’t work out. After a plane ride of shame back to the East Coast, I put my resume up on Dice.com. A really excellent recruiter, Craig Collins, set me up with a bunch of interviews and I accepted an offer from 10gen. When I joined, 10gen was working on a full cloud stack (similar to Google App Engine). I worked on a JavaScript compiler for about a year before we decided to focus on the scalable storage layer: MongoDB.
On becoming a code artist
An interview with Scott Murray, author of Interactive Data Visualization for the Web
Scott Murray, a code artist, has written Interactive Data Visualization for the Web for nonprogrammers. In this interview, Scott provides some insights on what inspired him to write an introduction to D3 for artists, graphic designers, journalists, researchers, or anyone that is looking to begin programming data visualizations.
What inspired you to become a code artist?
Scott Murray: I had designed websites for a long time, but several years ago was frustrated by web browsers’ limitations. I went back to school for an MFA to force myself to explore interactive options beyond the browser. At MassArt, I was introduced to Processing, the free programming environment for artists. It opened up a whole new world of programmatic means of manipulating and interacting with data — and not just traditional data sets, but also live “data” such as from input devices or dynamic APIs, which can then be used to manipulate the output. Processing let me start prototyping ideas immediately; it is so enjoyable to be able to build something that really works, rather than designing static mockups first, and then hopefully, one day, invest the time to program it. Something about that shift in process is both empowering and liberating — being able to express your ideas quickly in code, and watch the system carry out your instructions, ultimately creating images and experiences that are beyond what you had originally envisioned.
“Startups don’t really know what they are at the beginning”
An interview with Alistair Croll and Benjamin Yoskovitz on using lean analytics in a startup
Alistair Croll and Benjamin Yoskovitz wrote the upcoming book Lean Analytics: Use Data to Build a Better Startup Faster. In the following interview, they discuss the inspiration behind their book, the unique aspects of using analytics in a startup environment, and more.
What inspired both of you to write your book?
A big part of the inspiration came from our work with Year One Labs, an early stage accelerator that we co-founded with two other partners in 2010. We implemented a Lean Startup program that we put the startups through and provided them with up to 12 months of hands-on mentorship. We saw with these companies as well as others that we’ve worked on ourselves, advised and invested in, that they struggled with what to measure, how to measure it, and why to measure certain things.
The core principle of Lean Startup is build, measure, and learn. While most entrepreneurs understand the “build” part since they’re often technical founders that are excellent at building stuff, they had a hard time with the measure and learn parts of the cycle. Lean Analytics is a way of codifying that further, without being overly prescriptive. We hope it provides a practical and deeper guide to implementing Lean Startup principles successfully and using analytics to genuinely affect your business.
What are some of the unique aspects to using analytics in a startup environment?
One of the biggest challenges with using analytics in a startup environment is the vast amount of unknowns that a startup faces. Startups don’t really know what they are at the beginning. In fact, they shouldn’t even be building a product to solve a problem. In many ways they’re building products to learn what to build. Learning in an environment of risk and uncertainty is hard. So tracking things is also hard. Startups are also heavily influenced by what they see around them. They see companies that seem to be growing really quickly, the latest hottest trend, competition and so on. Those influences can negatively affect a startup’s focus and the rigorous approach needed to find true insight and grow a real business. Lean Analytics is meant to poke a hole in an entrepreneur’s reality distortion field, and encourage…or force! … a level of focus and attention that can cut out the noise and help founders move as quickly as possible without doing so blindly.
On reading Mike Barlow’s “Real-Time Big Data Analytics: Emerging Architecture”
Barlow's distilled insights regarding the ever evolving definition of real time big data analytics
During a break in between offsite meetings that Edd and I were attending the other day, he asked me, “did you read the Barlow piece?”
“Umm, no.” I replied sheepishly. Insert a sidelong glance from Edd that said much without saying anything aloud. He’s really good at that.
In my utterly meager defense, Mike Loukides is the editor on Mike Barlow’s Real-Time Big Data Analytics: Emerging Architecture. As Loukides is one of the core drivers behind O’Reilly’s book publishing program and someone who I perceive to be an unofficial boss of my own choosing, I am not really inclined to worry about things that I really don’t need to worry about. Then I started getting not-so-subtle inquiries from additional people asking if I would consider reviewing the manuscript for the Strata community site. This resulted in me emailing Loukides for a copy and sitting in a local cafe on a Sunday afternoon to read through the manuscript.
Join me for the Strata Online Conference on data warfare on January 22nd
Learn more about potential attack vectors and how to defend against them
“Jeez, the days are flying by,” I muttered to myself the other day. The next Strata Online Conference on data warfare is just around the corner. I’ve been excited about this event for some time. How could I not be excited? There will be discussions on using data for evil, hacking cybersecurity, crowdsourcing identity theft, black hat data science, and more.
As I have referred to before, I just love thought provoking and candid discussions.
I first heard about the event when Kathy Yu, Alistair Croll, and I met at the SF Ferry Building to talk about Strata over breakfast. I’m not a morning person. It takes a few moments for the caffeine to take effect. Alistair is the opposite. I don’t know if Alistair had his dose of caffeine earlier that day or if he just generates his own energy. Whatever it is, it enables him to chair Strata, run his own business, keep up with his precocious two-year-old daughter, and co-author the forthcoming Lean Analytics. Yet, that morning, I was half-tuning Alistair out while I was sipping on my coffee and taking a picture of my crispy caramelized waffle. Yes, I’m that person. But when Alistair started talking about data warfare, he had my full attention. As we rely more upon data, we become more vulnerable to various attacks. It is important for us to learn more about what the potential attack vectors could be and how to defend against them. The speakers at the upcoming Strata Online Conference on data warfare will get us all thinking about this.
The speakers and the topics of their sessions include: Read more…
Improve your math skills
Practical advice for those considering a career in data science
When I was a youngster in college I found myself dissatisfied after I took a stats class from the math department. So I decided to take another stats class. Classmates thought I was crazy. Let’s be real, what precocious over-achieving teenager majoring in English lit seeks to retake a math class? And not because of a grade but because they were dissatisfied with what they didn’t get out of it? After a bit of research, I decided to take the stats class offered by the psych department.
It made a significant difference.
Thinking about math from the perspectives of research design methodology and how data can be used to manipulate people made quite an impact on my teenage worldview. This experience also reinforced my belief that education is what you decide it will be. There is always more than one way to learn and education doesn’t necessarily have to happen in a physical classroom. Growing up in the San Francisco Bay Area where friends and loved ones decided to forgo traditional higher ed completely to start their own companies or immediately work in jobs in technology also contributed to this belief.
While full time students who are looking at a career in data science may have the time to do seemingly nutty things like take overlapping math classes, this is not something that most people with full time jobs are able to do. When people with full time jobs ask me about what they need to do to move into data science, I probe them about the kind of job in data science they want and about their analytical and empathy skills. Then, I immediately follow up with “So, how are your math skills?.” Interestingly enough, I get a lot people saying how they don’t have time to physically go into a classroom or that it has been, like, forever since they’ve used statistics and/or linear algebra for data analysis. Even more interesting is how often people don’t realize just how many resources are available to learn math outside of the physical-attendance-in-a-classroom-model.
Huh. Read more…
How do you become a data scientist? Well, it depends
My obsession with data and user needs is now focused on the many paths toward data science.
Over Thanksgiving, Richie and Violet asked me if I preferred the iPhone or the Galaxy SIII. I have both. It is a long story. My response was, “It depends.” Richie, who would probably bleed Apple if you cut him, was very unsatisfied with my answer. Violet was more diplomatic. Yet, it does depend. It depends on what the user wants to use the device for.
I say, “It depends” a lot in my life.
Both in the personal life and the work life … well, because it really is all one life isn’t it? With my work over the past decade or so, I have been obsessive about being user-focused. I spend a lot of time thinking about whom a product, feature, or service is for and how they will use it. Not how I want them to use it — how they want to use it and what problem they are trying to solve with it.
Before I joined O’Reilly, I was obsessively focused on the audience for my data analysis. “C” level execs look for different kinds of insights than a director of engineering. A field sales rep looks for different insights than a software developer. Understanding more about who the user or audience was for a data project enabled me to map the insights to the user’s role, their priorities, and how they wanted to use the data. Because, you know what isn’t too great? When you spend a significant amount of time working on something that does not get used or is not what someone needed to help them in their job.
Read more…
Approaching ethics and big data
What to do when facing the stoic expressions that pop up during ethics discussions.
The other day I clicked on a message posted to the O’Reilly editors’ email list and the message text filled up almost the entire monitor screen. I must admit that I thought “Am I going to require another caffeine hit to read through this?”
I decided to take a chance, not take another break just then, and read the lengthy note. I didn’t need that caffeine hit after all. Apparently, neither did half a dozen other editors.
The note was about ethics.
In a previous life, I worked in the competitive intelligence field. I remember participating in a friendly confab at an industry event and then someone mentioned the word “e-t-h-i-c-s”. It was rather fascinating to see how that word elicited stoic faces. No one wanted to be the first person to say anything on that topic. Now when working at ORM, mention the word “ethics!” and folks are not shy about saying exactly what they think. Not. At. All.
During the discussion, Ethics of Big Data by Kord Davis, came up. While I was not the editor on this book, I did read it when I was in New York. It made my list of recommended books for people looking to jump into the world of big data. Why? Because I remembered the stoic poker faces from my previous life in competitive intelligence. Read more…
A change is gonna come
Join us in the data revolution.

When I told some of my friends and family that I was joining O’Reilly Media as an editor focusing on ORM’s Strata practice area, their responses reflected the diversity of my loved ones.
I’ve paraphrased some of the best ones here:
- “That is great! I have a bunch of their books. Everyone I know has the animal books.”
- “Bill O’Reilly owns a media company?”
- “I don’t get you techie people. Didn’t you already do a bunch of weird ninja-y data type stuff?”
- “Congrats! I have a lot of respect for ORM.”
- “… wait a sec, didn’t you STOP being a Java editor years ago to go work at an assessment data startup? ”
Sigh.
The people in my life have a few things in common. They are smart, articulate, really truly not afraid to say what they think, and seek to be the change they wish to see in the world. We don’t always agree [massive understatement]. Yet, our motivations are the same.
Why am I telling you this?
I believe that at our core, no matter how different we may seem, we do not actively seek to harm. Yet, everyone that works with data already has or will be facing certain choices on what to do with data. Choices that are obviously for good or for evil. Choices that are neither completely for good or completely for evil. Choices that we are reluctant to discuss because we do not want to implicate ourselves or the companies we work for. Yet, just because we are reluctant to discuss them does not mean we are not facing these challenges.
If you have the courage to speak out regarding the real everyday challenges that you experience while working with data, then I want to listen. If you have discovered solutions to these everyday challenges, then I want to publish your insight. If you engage in anything I publish, whether you agree or disagree, have suggestions for how things could be different or better, then please say something.
You can reach me at pitchstrata@oreilly.com. Read more…

Ann Spencer is an editor for O'Reilly Media, Inc. focusing primarily on products and services related to Strata and big data. She has over seven years of experience at established and startup technology companies focusing on market data analysis and competitive intelligence. She has a keen understanding of how to identify useful insights from large amounts of data. She also has over five years of editorial experience where she focused on acquiring programming language content at other media companies. She loves getting deliberately lost wandering around cities and if you need to find her when she is not in front of her multiple machine monitors, you should check the local bookstore. You'll likely find her sitting on the floor, looking through science fiction, neuroscience, or cookbooks. 




