ENTRIES TAGGED "strata"
Open source communities to help find the next blockbuster drug
Big drug companies are not what they used to be. It is harder to find new drug candidates, to test them, and to get them approved than ever before. Drugs that are “mere chemicals” are becoming more and more complex. Frequently, new drugs require DNA interaction, which requires them to be manufactured through a mostly automated cellular process rather than just mixing the right components in the right order. Just the changes to the refrigeration requirements for these new drugs represents a challenge to drug manufacturers, pharmacies and hospitals.
Combined, these difficulties create a combustible business environment that can ignited by the pressure of expiring patents. Experts estimate that the approval process ensures that a drug company actually gets only about 12 years of exclusivity before a 20-year patent wears off. So in pharma-land, the march of popular medications to generic status forces the original developers into the famous Innovators Dilemma. Most companies face competition from the generic versions of their own previous work.
Submit your suggestions for videos that make us think about how data, visualizations, and technology are changing us
Each year at Strata, we warm up the crowd in the main keynote sessions with short videos that will make people think. These videos demonstrate the ways that data, technology, and visualization are changing us. Some are funny; some are clever; some are downright disturbing.
For Strata New York + Hadoop World in October, we’re hoping you’ll join in and suggest some videos for us. If you’ve got something you feel captures the zeitgeist of technology at the fringes, then complete this form, and we’ll check it out. We’ll choose some of them as we kick off the event this fall.
An interview with Kristina Chodorow, author of MongoDB: The Definitive Guide, Second Edition
We launched the second edition of Kristina Chodorow’s book, MongoDB: The Definitive Guide at a recent MongoDB conference in San Francisco. Everyone worked hard to make this happen. I filmed a little behind the scenes video with my phone in order to share it with everyone that worked on the book. After I filmed it, I decided to post the video as well as an interview with Kristina. Both the video and interview provide snippets of what it is like to work on the second edition of the MongoDB: The Definitive Guide.
What inspired you to become a software engineer?
Kristina Chodorow: In college, I took a computer science class because it would count towards my math major. I was programming a tic-tac-toe game and thought, “Why can’t I just program it to try to win?” and then I realized I could figure out the actual logic of “trying to win.” I thought that was the coolest thing ever. I took a couple more programming classes, joined the programming team, and started doing CS research. By the time I graduated, I knew I was going to be a programmer.
How did you land at 10gen?
Making sense of the hype-cycle scuffle.
The big data world is a confusing place. We’re no longer in a market dominated mostly by relational databases, and the alternatives have multiplied in a baby boom of diversity.
These child prodigies of the data scene show great promise but spend a lot of time knocking each other around in the schoolyard. Their egos can sometimes be too big to accept that everybody has their place, and eyeball-seeking media certainly doesn’t help.
POPULAR KID: Look at me! Big data is the hotness!
HADOOP: My data’s bigger than yours!
SCIPY: Size isn’t everything, Hadoop! The bigger they come, the harder they fall. And aren’t you named after a toy elephant?
R: Backward sentences mine be, but great power contains large brain.
SQL: Oh, so you all want to be friends again now, eh?!
POPULAR KID: Yeah, what SQL said! Nobody really needs big data; it’s all about small data, dummy.
Strata Community Profile on Amy Heineike, Director of Mathematics
According to Amy Heineike, the Director of Mathematics at Quid, there’s nothing like having a fresh dataset in R and knowing how to use it. “You can add a few lines of code and discover all kinds of interesting information,” Heineike says. “One question leads to another, you get into a flow, and you can have an amazing exploration.”
Heineike started working with data several years ago at a consultancy in London, where “playing around” with data shed light on the impact of social networks on government policies. Part of her job was figuring out what types of data to use in order to find solutions to crucial problems, from public transportation to obesity. Her day-to-day work at Quid entails working with new data sets, prototyping analytics, and collaborating with an engineering team to improve data analysis and bring products into production.
a lesson for data science teams
The other day we had a conversation with a bespectacled senior data scientist at another organization (named X to protect the innocent). The conversation went something like this:
Many of us have had similar conversations with people like X, and many of us have even been X before. Data scientists, being curious individuals, enjoy working on problems for the sake of doing something interesting, fun, technically challenging, or because their boss heard about “big data” in the Wall Street Journal. These reasons are all distinctly different from trying to solve an important problem.
Featured Strata Community Profile on Yogi Saxena
Yogi Saxena is not one to back down from a challenge. The distance runner ran in his first marathon just two years ago in order to win a bet. Next month, he competes in another grueling marathon, his third. And if that were not enough, a friend’s Facebook post inspired him to train for a sprint triathalon. “I taught myself to swim when I was young,” Saxena says, revealing that his drive to learn new skills started early. “And if it wasn’t for the swim part, I’d have done an Olympic-distance triathlon instead.”
Saxena’s love of mastering new challenges is likely responsible for his decision to pursue data science as a second profession, after having a successful career as an electrical engineer. Currently at Boeing, he is responsible for developing a tool that would help visualize feeds from various classified and non-classified sources.
He is profiled here as part of the Strata community profiles.
Barlow's distilled insights regarding the ever evolving definition of real time big data analytics
During a break in between offsite meetings that Edd and I were attending the other day, he asked me, “did you read the Barlow piece?”
“Umm, no.” I replied sheepishly. Insert a sidelong glance from Edd that said much without saying anything aloud. He’s really good at that.
In my utterly meager defense, Mike Loukides is the editor on Mike Barlow’s Real-Time Big Data Analytics: Emerging Architecture. As Loukides is one of the core drivers behind O’Reilly’s book publishing program and someone who I perceive to be an unofficial boss of my own choosing, I am not really inclined to worry about things that I really don’t need to worry about. Then I started getting not-so-subtle inquiries from additional people asking if I would consider reviewing the manuscript for the Strata community site. This resulted in me emailing Loukides for a copy and sitting in a local cafe on a Sunday afternoon to read through the manuscript.
The biggest problems will almost always be those for which the size of the data is part of the problem.
A recent VentureBeat article argues that “Big Data” is dead. It’s been killed by marketers. That’s an understandable frustration (and a little ironic to read about it in that particular venue). As I said sarcastically the other day, “Put your Big Data in the Cloud with a Hadoop.”
You don’t have to read much industry news to get the sense that “big data” is sliding into the trough of Gartner’s hype curve. That’s natural. Regardless of the technology, the trough of the hype cycle is driven by by a familiar set of causes: it’s fed by over-agressive marketing, the longing for a silver bullet that doesn’t exist, and the desire to spout the newest buzzwords. All of these phenomena breed cynicism. Perhaps the most dangerous is the technologist who never understands the limitations of data, never understands what data isn’t telling you, or never understands that if you ask the wrong questions, you’ll certainly get the wrong answers.
Big data is not a term I’m particularly fond of. It’s just data, regardless of the size. But I do like Roger Magoulas’ definition of “big data”: big data is when the size of the data becomes part of the problem. I like that definition because it scales. It was meaningful in 1960, when “big data” was a couple of megabytes. It will be meaningful in 2030, when we all have petabyte laptops, or eyeglasses connected directly to Google’s yottabyte cloud. It’s not convenient for marketing, I admit; today’s “Big Data!!! With Hadoop And Other Essential Nutrients Added” is tomorrow’s “not so big data, small data actually.” Marketing, for better or for worse, will deal. Read more…
Preview of upcoming session at the Strata Conference
As a preview, let’s talk about two pretty pictures.
I’m running some typical distributed systems (HDFS, MapReduce, Impala, HBase, Zookeeper) on a small, seven-node cluster. The diagram above has individual processes and the TCP connections they’ve established to each other. Some processes are “masters” and they end up talking to many other processes.