Lessons from the design community for developing data-driven applications
When you hear someone say, “that is a nice infographic” or “check out this sweet dashboard,” many people infer that they are “well-designed.” Creating accessible (or for the cynical, “pretty”) content is only part of what makes good design powerful. The design process is geared toward solving specific problems. This process has been formalized in many ways (e.g., IDEO’s Human Centered Design, Marc Hassenzahl’s User Experience Design, or Braden Kowitz’s Story-Centered Design), but the basic idea is that you have to explore the breadth of the possible before you can isolate truly innovative ideas. We, at Datascope Analytics, argue that the same is true of designing effective data science tools, dashboards, engines, etc — in order to design effective dashboards, you must know what is possible.
By Dean Malmgren and Jon Wettersten
There’s a lot of hype around “Big Data” these days. Don’t believe us? None other than the venerable Harvard Business Review named “data scientist” the “Sexiest Job of the 21st Century” only 13 years into it. Seriously. Some of these accolades are deserved. It’s decidedly cheaper to store data now than it is to analyze it, which is considerably different than 10 or 20 years ago. Other aspects, however, are less deserved. In isolation, big data and data scientists don’t hold some magic formula that’s going to save the world, radically transform businesses, or eliminate poverty. The act of solving problems is decidedly different than amassing a data set the size of 200 trillion Moby Dicks or setting a team of nerds loose on the data. Problem solving not only requires a high-level conceptual understanding of the challenge, but also a deep understanding of the nuances of a challenge, how those nuances affect businesses, governments, and societies, and—don’t forget—the creativity to address these challenges.
In our experience, solving problems with data necessitates a diversity of thought and an approach that balances number crunching with thoughtful design to solve targeted problems. Ironically, we don’t believe this means that it’s important to have an army of PhDs with deep knowledge on every topic under the sun. Rather, we find it’s important to have multi-disciplinary teams of curious, thoughtful, and motivated learners with a broad range of interests who aren’t afraid to immerse themselves in a totally ambiguous topic. With this common vision, IDEO and Datascope Analytics decided to embark on an experiment and integrate our teams to collaborate on a few big data projects over the last year. We thought we’d share a few things here we’ve learned along the way.
By John Russell
When I came to work on the Cloudera Impala project, I found many things that were familiar from my previous experience with relational databases, UNIX systems, and the open source world. Yet other aspects were all new to me. I know from documenting both enterprise software and open source projects that it’s a special challenge when those two aspects converge. A lot of new users come in with 95% of the information they need, but they don’t know where the missing or outdated 5% is. One mistaken assumption or unfamiliar buzzword can make someone feel like a complete beginner. That’s why I was happy to have the opportunity to write this overview article, with room to explore how users from all kinds of backgrounds can understand and start using the Cloudera Impala product.
For database users, the Apache Hadoop ecosystem can feel like a new world:
- Sysadmins don’t bat an eye when you say you want to work on terabytes or petabytes of data.
- A networked cluster of machines isn’t a complicated or scary proposition. Instead, it’s the standard environment you ask an intern to set up on their first day as a training exercise.
- All the related open source projects aren’t an either-or proposition. You work with a dozen components that all interoperate, stringing them together like a UNIX toolchain.
By Julie Yoo, Chief Product Officer at Kyruus
Once upon a time, a world-renowned surgeon, Dr. Michael DeBakey, was summoned by the President when the Shah of Iran, a figure of political and strategic importance, fell ill with an enlarged spleen due to cancer. Dr. DeBakey was whisked away to Egypt to meet the Shah, made a swift diagnosis, and recommended an immediate operation to remove the spleen. The surgery lasted 80 minutes; the spleen, which had grown to 10 times its normal size, was removed, and the Shah made a positive recovery in the days following the surgery – that is, until he took a turn for the worse, and ultimately died from surgical complications a few weeks later. 
Sounds like a routine surgery gone awry, yes? But consider this: Dr. DeBakey was a cardiovascular surgeon – in other words, a surgeon whose area of specialization was in the operation of the heart and blood vessels, not the spleen. He was most well-known for his open heart bypass surgery techniques, and the vast majority of his peer-reviewed articles relate to cardiology-related operating techniques. High profile or not, why was a cardiovascular surgeon selected to perform an abdominal surgery?
A game changer for a marketer to pinpoint what a customer wants, when they want it, and how they want to hear about it
My 2 and a half year old daughter loves the Mickey Mouse Clubhouse. She watches episodes on TV and our iPad. She wears Minnie Mouse flip flops and giggles just about every time she sees anything with Mickey, Daisy, Goofy…you get the idea. And when she’s old enough to go to Disney World, Minnie might walk right up to her and say “Hi Jemma!” and give her a big hug.
Creating a personal interaction between a child and a beloved Disney character exemplifies the company’s recent initiative to deliver a personalized, hassle-free experience at their theme parks. 1 With the wireless tracking wristband ‘MagicBand,’ families are able to reserve spots in lines for popular attractions, purchase items at the parks, and unlock their hotel rooms. The MagicBand is part of the MyMagic+ system, which enables Disney to collect data on visitors’ purchasing habits and real-time location, among other things. Disney will use this vast trove of information to deliver a personalized experience at the parks and tailor marketing messages and promotions.
Results from a survey of analytics professionals
Analyzing the Analyzers: An Introspective Survey of Data Scientists and their Work is the result of applying the methods of data science to our own professional community. My co-authors (Sean Murphy and Marck Vaisman) and I run professional Meetup groups for statistical and analytics professionals in the Washington, DC area. In the course of organizing Data Science DC, Data Business DC, Statistical Programming DC, and serving on the board of Data Community DC, we meet a lot of people, many of whom either call themselves “data scientists” or aspire to do so. But these people have substantially different education, experiences, aptitudes, and attitudes. Why are they all using the same label?
We believe that this new job title or career path of “data science” came about because people were dissatisfied with existing ways of describing their roles and their work. But is everyone converging on “data scientist” progress, or is it just a source of confusion?
In the Spring of 2012, we observed that this new, vaguely-defined career, although tremendously exciting and fulfilling for all of us, was impaired by unclear communication, unrealistic expectations, and missed opportunities. Something had to be done. As data scientists, we thought that a natural way to bring more clarity to the issue would be to collect some data, so we developed a survey and recruited hundreds of participants. Our analysis focused on finding underlying explanatory structure in the results that would let us help to improve communication, expectations, and opportunities for and about data scientists.
Opportunity to share your data stories with Brett Goldstein and Q. Ethan McCallum
On Goldstein, McCallum, and their upcoming book, Making Analytics Work: Case by Case
By Alex Howard
People have been crunching numbers to understand government since the first time an official used an abacus to compare one season’s grain harvest against another. Tracking and comparing data is part of how we’ve been understanding our world for millennia. In the 21st century, organizations in all sectors are transitioning from paper records to massive databases. Instead of inscribing tablets, we’re browsing real-time data dashboards on them. Using modern data analytics to make sense of all of those numbers is now the task of scientists, journalists and, intriguingly, public officials. That’s the context in which I first encountered Brett Goldstein, when I talked with him about his work as Chicago’s chief data officer. Goldstein has been a key part of Chicago’s data-driven approach to open government since Mayor Rahm Emanuel was elected in February 2011. He and Chicago CTO John Tolva have been breaking new ground in an emerging global discussion around how cities understand, govern and regulate themselves.
I saw Goldstein share his ideas for data analytics in person at last year’s Strata Conference in New York City, where he and Q Ethan McCallum, the author of the Bad Data Handbook, talked about text mining and civic engagement. Their thinking on big data in the public sector is helping to inform other cities that want to follow in Chicago’s footsteps. Urban predictive analytics are making sense of what residents are doing, where and when — and what they want from their governments. Both men have steadily built and earned excellent reputations as a public servant and a trusted authority in in the field.