ENTRIES TAGGED "strata"

Why is building custom recommender systems hard? Does it have to be?

guenstrin

Photo Courtesy of Carlos Guestrin

By Carlos Guestrin

Today, it’s shocking (and honestly exciting) how much of my daily experience is determined by a recommender system.  These systems drive amazing experiences everywhere, telling me where to eat, what to listen to, what to watch, what to read, and even who I should be friends with.  Furthermore, information overload is making recommender systems indispensable, since I can’t find what I want on the web simply using keyword search tools.  Recommenders are behind the success of industry leaders like Netflix, Google, Pandora, eHarmony, Facebook, and Amazon.  It’s no surprise companies want to integrate recommender systems with their own online experiences.  However, as I talk to team after team of smart industry engineers, it has become clear that building and managing these systems is usually a bit out of reach, especially given all the other demands on the team’s time.

Read more…

Comment |

Bigger Data Leaner Pharma

Open source communities to help find the next blockbuster drug

Big drug companies are not what they used to be. It is harder to find new drug candidates, to test them, and to get them approved than ever before. Drugs that are “mere chemicals” are becoming more and more complex. Frequently, new drugs require DNA interaction, which requires them to be manufactured through a mostly automated cellular process rather than just mixing the right components in the right order. Just the changes to the refrigeration requirements for these new drugs represents a challenge to drug manufacturers, pharmacies and hospitals.

Combined, these difficulties create a combustible business environment that can ignited by the pressure of expiring patents. Experts estimate that the approval process ensures that a drug company actually gets only about 12 years of exclusivity before a 20-year patent wears off. So in pharma-land, the march of popular medications to generic status forces the original developers into the famous Innovators Dilemma. Most companies face competition from the generic versions of their own previous work.
Read more…

Comment |

Make us think: a call for Strata keynote videos

Submit your suggestions for videos that make us think about how data, visualizations, and technology are changing us

Each year at Strata, we warm up the crowd in the main keynote sessions with short videos that will make people think. These videos demonstrate the ways that data, technology, and visualization are changing us. Some are funny; some are clever; some are downright disturbing.

For Strata New York + Hadoop World in October, we’re hoping you’ll join in and suggest some videos for us. If you’ve got something you feel captures the zeitgeist of technology at the fringes, then complete this form, and we’ll check it out. We’ll choose some of them as we kick off the event this fall.

Read more…

Comment |

Making things happen: from being a software engineer to writing a book

An interview with Kristina Chodorow, author of MongoDB: The Definitive Guide, Second Edition

We launched the second edition of Kristina Chodorow’s book, MongoDB: The Definitive Guide at a recent MongoDB conference in San Francisco. Everyone worked hard to make this happen. I filmed a little behind the scenes video with my phone in order to share it with everyone that worked on the book. After I filmed it, I decided to post the video as well as an interview with Kristina. Both the video and interview provide snippets of what it is like to work on the second edition of the MongoDB: The Definitive Guide.

What inspired you to become a software engineer?

Kristina Chodorow: In college, I took a computer science class because it would count towards my math major. I was programming a tic-tac-toe game and thought, “Why can’t I just program it to try to win?” and then I realized I could figure out the actual logic of “trying to win.”  I thought that was the coolest thing ever. I took a couple more programming classes, joined the programming team, and started doing CS research. By the time I graduated, I knew I was going to be a programmer.

How did you land at 10gen?

Kristina Chodorow

Kristina Chodorow

Kristina Chodorow: After college I started a Ph.D. at Columbia and, although it was a great program, I really didn’t want to go to graduate school and left after a semester.  I moved to Seattle to be with a guy and unsurprisingly that didn’t work out. After a plane ride of shame back to the East Coast, I put my resume up on Dice.com.  A really excellent recruiter, Craig Collins, set me up with a bunch of interviews and I accepted an offer from 10gen. When I joined, 10gen was working on a full cloud stack (similar to Google App Engine).  I worked on a JavaScript compiler for about a year before we decided to focus on the scalable storage layer: MongoDB.

Read more…

Comment |

Big data, cool kids

Making sense of the hype-cycle scuffle.

The big data world is a confusing place. We’re no longer in a market dominated mostly by relational databases, and the alternatives have multiplied in a baby boom of diversity.

My data is bigger than yours.

My data is bigger than yours.

These child prodigies of the data scene show great promise but spend a lot of time knocking each other around in the schoolyard. Their egos can sometimes be too big to accept that everybody has their place, and eyeball-seeking media certainly doesn’t help.

POPULAR KID: Look at me! Big data is the hotness!
HADOOP: My data’s bigger than yours!
SCIPY: Size isn’t everything, Hadoop! The bigger they come, the harder they fall. And aren’t you named after a toy elephant?
R: Backward sentences mine be, but great power contains large brain.
EVERYONE: Huh?
SQL: Oh, so you all want to be friends again now, eh?!
POPULAR KID: Yeah, what SQL said! Nobody really needs big data; it’s all about small data, dummy.

Read more…

Comment |

On the importance of imagination in data science

Strata Community Profile on Amy Heineike, Director of Mathematics

QuidAmyH_Bio

Amy Heineike

According to Amy Heineike, the Director of Mathematics at Quid, there’s nothing like having a fresh dataset in R and knowing how to use it. “You can add a few lines of code and discover all kinds of interesting information,” Heineike says. “One question leads to another, you get into a flow, and you can have an amazing exploration.”

Heineike started working with data several years ago at a consultancy in London, where “playing around” with data shed light on the impact of social networks on government policies. Part of her job was figuring out what types of data to use in order to find solutions to crucial problems, from public transportation to obesity. Her day-to-day work at Quid entails working with new data sets, prototyping analytics, and collaborating with an engineering team to improve data analysis and bring products into production.

Read more…

Comment |

why? why? why!

a lesson for data science teams

By Dean Malmgren and Mike Stringer

The other day we had a conversation with a bespectacled senior data scientist at another organization (named X to protect the innocent). The conversation went something like this:

facepalm

Many of us have had similar conversations with people like X, and many of us have even been X before. Data scientists, being curious individuals, enjoy working on problems for the sake of doing something interesting, fun, technically challenging, or because their boss heard about “big data” in the Wall Street Journal. These reasons are all distinctly different from trying to solve an important problem.

Read more…

Comment: 1 |

Pursuing data science as a second profession

Featured Strata Community Profile on Yogi Saxena

YogiSaxenaYogi Saxena is not one to back down from a challenge. The distance runner ran in his first marathon just two years ago in order to win a bet. Next month, he competes in another grueling marathon, his third. And if that were not enough, a friend’s Facebook post inspired him to train for a sprint triathalon. “I taught myself to swim when I was young,” Saxena says, revealing that his drive to learn new skills started early. “And if it wasn’t for the swim part, I’d have done an Olympic-distance triathlon instead.”

Saxena’s love of mastering new challenges is likely responsible for his decision to pursue data science as a second profession, after having a successful career as an electrical engineer. Currently at Boeing, he is responsible for developing a tool that would help visualize feeds from various classified and non-classified sources.

He is profiled here as part of the Strata community profiles.

Read more…

Comment |

On reading Mike Barlow’s “Real-Time Big Data Analytics: Emerging Architecture”

Barlow's distilled insights regarding the ever evolving definition of real time big data analytics

Reading Barlow on a Sunday Afternoon

Reading Barlow on a Sunday afternoon

During a break in between offsite meetings that Edd and I were attending the other day, he asked me, “did you read the Barlow piece?”

“Umm, no.” I replied sheepishly. Insert a sidelong glance from Edd that said much without saying anything aloud. He’s really good at that.

In my utterly meager defense, Mike Loukides is the editor on Mike Barlow’s Real-Time Big Data Analytics: Emerging Architecture. As Loukides is one of the core drivers behind O’Reilly’s book publishing program and someone who I perceive to be an unofficial boss of my own choosing, I am not really inclined to worry about things that I really don’t need to worry about. Then I started getting not-so-subtle inquiries from additional people asking if I would consider reviewing the manuscript for the Strata community site. This resulted in me emailing Loukides for a copy and sitting in a local cafe on a Sunday afternoon to read through the manuscript.

Read more…

Comment |

Big data is dead, long live big data: Thoughts heading to Strata

The biggest problems will almost always be those for which the size of the data is part of the problem.

A recent VentureBeat article argues that “Big Data” is dead. It’s been killed by marketers. That’s an understandable frustration (and a little ironic to read about it in that particular venue). As I said sarcastically the other day, “Put your Big Data in the Cloud with a Hadoop.”

You don’t have to read much industry news to get the sense that “big data” is sliding into the trough of Gartner’s hype curve. That’s natural. Regardless of the technology, the trough of the hype cycle is driven by by a familiar set of causes: it’s fed by over-agressive marketing, the longing for a silver bullet that doesn’t exist, and the desire to spout the newest buzzwords. All of these phenomena breed cynicism. Perhaps the most dangerous is the technologist who never understands the limitations of data, never understands what data isn’t telling you, or never understands that if you ask the wrong questions, you’ll certainly get the wrong answers.

Big data is not a term I’m particularly fond of. It’s just data, regardless of the size. But I do like Roger Magoulas’ definition of “big data”: big data is when the size of the data becomes part of the problem. I like that definition because it scales. It was meaningful in 1960, when “big data” was a couple of megabytes. It will be meaningful in 2030, when we all have petabyte laptops, or eyeglasses connected directly to Google’s yottabyte cloud. It’s not convenient for marketing, I admit; today’s “Big Data!!! With Hadoop And Other Essential Nutrients Added” is tomorrow’s “not so big data, small data actually.” Marketing, for better or for worse, will deal. Read more…

Comment |