ENTRIES TAGGED "data scientists"

Data Science for Business

What business leaders need to know about data and data analysis to drive their businesses forward.

A couple of years ago, Claudia Perlich introduced me to Foster Provost, her PhD adviser. Foster showed me the book he was writing with Tom Fawcett, and using in his teaching at NYU.

DataScienceForBusinessCoverFoster and Tom have a long history of applying data to practical business problems. Their book, which evolved into Data Science for Business, was different from all the other data science books I’ve seen. It wasn’t about tools: Hadoop and R are scarcely mentioned, if at all. It wasn’t about coding: business students don’t need to learn how to implement machine learning algorithms in Python. It is about business: specifically, it’s about the data analytic thinking that business people need to work with data effectively.

Data analytic thinking means knowing what questions to ask, how to ask those questions, and whether the answers you get make sense. Business leaders don’t (and shouldn’t) do the data analysis themselves. But in this data-driven age, it’s critically important for business leaders to understand how to work with the data scientists on their teams. In today’s business world, it’s essential to understand which algorithms are used for different applications, how statistics are used to create models of human and economic behavior, overfitting and its symptoms, and much more. You might not need to know how to implement a machine learning algorithm, but you do need to understand the ideas the data scientists on your team are using.

The goal of data science is putting data to work. That’s what Data Science for Business is all about, and the reason I’m excited to see us publishing it. There are many books about data science, and an increasing number of undergraduate and graduate programs in data science. But I haven’t seen anything that teaches data science for the leaders who will be using data to drive their businesses forward.

Comment |

Data Science for Social Good: A Fellowship

Training Aspiring Data Scientists in Chicago

DSSG_BW_Cropped4_larger

By Juan-Pablo Velez

The Fellowship

As technology penetrates further into everyday life, we’re creating lots of data. Businesses are scrambling to find data scientists to make sense of all this data and turn it into better decisions.

Businesses aren’t alone. Data science could transform how governments and nonprofits tackle society’s problems. The problem is, most governments and nonprofits simply don’t know what’s possible yet. There are too few data scientists out there and too many spending their days optimizing ads instead of bettering lives. To make real impact with data, we need to work on high-impact projects that show these organizations the power of analytics. And we need to expose data scientists to the problems that really matter.

DSSG_BW_Cropped2That’s exactly why we’re doing the Eric and Wendy Schmidt Data Science for Social Good summer fellowship at the University of Chicago. The program is led by Rayid Ghani, former chief data scientist for the 2012 Obama campaign, and is funded by Google Chairman Eric Schmidt.

We’ve brought three dozen aspiring data scientists from all over the world to Chicago to spend a summer working on data science projects with social impact. The fellows are working closely with governments and nonprofits (including the City of Chicago, the Chicago Transit Authority, and the Nurse-Family Partnership) to take on real-world problems in education, health, energy, transportation, and more. (To read up on our project, check out dssg.io/projects or to get involved, go to github.com/dssg.)

DSSG_BW_Cropped1bLots of folks have been asking about how we’re training data scientists.

Data scientists are a hybrid group with computer science, statistics, machine learning, data mining, and database skills. These skills take years to learn and there’s no way to teach all of them during a few weeks. Instead of starting from scratch, we decided to start with students in computational and quantitative fields – folks that already have some of these skills and use them daily in an academic setting. And we gave them the opportunity to apply their abilities to solve real-world problems and to pick up the skills they’re missing.

Read more…

Comment |

Leading Indicators

In a conversation with Q Ethan McCallum (who should be credited as co-author), we wondered how to evaluate data science groups. If you’re looking at an organization’s data science group from the outside, possibly as a potential employee, what can you use to evaluate it? It’s not a simple problem under the best of conditions: you’re not an insider, so you don’t know the full story of how many projects it has tried, whether they have succeeded or failed, relations between the data group, management, and other departments, and all the other stuff you’d like to know but will never be told.

Our starting point was remote: Q told me about Tyler Brulé’s travel writing for Financial Times (behind a paywall, unfortunately), in which he says that a club sandwich is a good proxy for hotel quality: you go into the restaurant and order a club sandwich. A club sandwich isn’t hard to make: there’s no secret recipe or technique that’s going to make Hotel A’s sandwich significantly better than B’s. But it’s easy to cut corners on ingredients and preparation. And if a hotel is cutting corners on their club sandwiches, they’re probably cutting corners in other places.

Read more…

Comment |

Data’s missing ingredient? Rhetoric.

Arguments are the glue that connects data to decisions

Data is key to decision making. Yet we are rarely faced with a situation where things can be put in to such a clear logical form that we have no choice but to accept the force of evidence before us. In practice, we should always be weighing alternatives, looking for missed possibilities, and considering what else we need to figure out before we can proceed.

Arguments are the glue that connects data to decisions. And if we want good decisions to prevail, both as decision makers and as data scientists, we need to better understand how arguments function. We need to understand the best ways that arguments and data interact. The statistical tools we learn in classrooms are not sufficient alone to deal with the messiness of practical decision-making.

Examples of this fill the headlines. You can see evidence of rigid decision-making in how the American medical establishment decides what constitutes a valid study result. By custom and regulation, there is an official statistical breaking point for all studies. Below this point, a result will be acted upon. Above, it won’t be. Cut and dry, but dangerously brittle.

Read more…

Comment |

How do you become a data scientist? Well, it depends

My obsession with data and user needs is now focused on the many paths toward data science.

Thanksgiving 2012

Over Thanksgiving, Richie and Violet asked me if I preferred the iPhone or the Galaxy SIII. I have both. It is a long story. My response was, “It depends.” Richie, who would probably bleed Apple if you cut him, was very unsatisfied with my answer. Violet was more diplomatic. Yet, it does depend. It depends on what the user wants to use the device for.

I say, “It depends” a lot in my life.

Both in the personal life and the work life … well, because it really is all one life isn’t it?  With my work over the past decade or so, I have been obsessive about being user-focused. I spend a lot of time thinking about whom a product, feature, or service is for and how they will use it. Not how I want them to use it — how they want to use it and what problem they are trying to solve with it.

Before I joined O’Reilly, I was obsessively focused on the audience for my data analysis. “C” level execs look for different kinds of insights than a director of engineering. A field sales rep looks for different insights than a software developer. Understanding more about who the user or audience was for a data project enabled me to map the insights to the user’s role, their priorities, and how they wanted to use the data. Because, you know what isn’t too great? When you spend a significant amount of time working on something that does not get used or is not what someone needed to help them in their job.
Read more…

Comment: 1 |

Data science in the natural sciences

Big data is shaping diverse fields, showing that past predictions from data-driven natural sciences are now coming to pass.

I find myself having conversations recently with people from increasingly diverse fields, both at Columbia and in local startups, about how their work is becoming “data-informed” or “data-driven,” and about the challenges posed by applied computational statistics or big data.

A view from health and biology in the 1990s

In discussions with, as examples, New York City journalists, physicists, or even former students now working in advertising or social media analytics, I’ve been struck by how many of the technical challenges and lessons learned are reminiscent of those faced in the health and biology communities over the last 15 years, when these fields experienced their own data-driven revolutions and wrestled with many of the problems now faced by people in other fields of research or industry.

It was around then, as I was working on my PhD thesis, that sequencing technologies became sufficient to reveal the entire genomes of simple organisms and, not long thereafter, the first draft of the human genome. This advance in sequencing technologies made possible the “high throughput” quantification of, for example,

  • the dynamic activity of all the genes in an organism; or
  • the set of all protein-protein interactions in an organism; or even
  • statistical comparative genomics revealing how small differences in genotype correlate with disease or other phenotypes.

These advances required formation of multidisciplinary collaborations, multi-departmental initiatives, advances in technologies for dealing with massive datasets, and advances in statistical and mathematical methods for making sense of copious natural data. Read more…

Comments: 2 |
Now available: “Planning for Big Data”

Now available: “Planning for Big Data”

A free handbook for anybody wanting to understand and use big data.

"Planning for Big Data" is a new book that helps you understand what big data is, why it matters, and where to get started.

Comments: 2 |

Visualization of the Week: Visualizing the Strata Conference

The Information Lab visualizes the Strata Conference's attendees.

This week's visualization comes from The Information Lab and shows who was at the Strata Conference, how far they traveled, and the data their companies produce.

Comment |
Strata Week: Datasift lets you mine two years of Twitter data

Strata Week: Datasift lets you mine two years of Twitter data

Datasift offers more access to the Twitter archive, and a proposal for a data school.

In this week's data news, Datasift will offer deeper access to old tweets, P2PU and the Open Knowledge Foundation announce a School of Data.

Comment |
Strata Week: Your personal automated data scientist

Strata Week: Your personal automated data scientist

Wolfram releases a pro tool, protecting data during times of need, and new doubts about dating services.

Wolfram|Alpha launches a pro version of its computational knowledge engine, guidelines emerge for protecting the data of people in crisis, and researchers cast doubt on dating sites' matchmaking algorithms.

Comment |