ENTRIES TAGGED "data analysis"

The Role of Big Data in Personalizing the Healthcare Experience: Mobile

Sensors, games, and social networking all create change in health and fitness

This article was written with Ellen M. Martin and Tobi Skotnes. Dr. Feldman will deliver a webinar on this topic on September 18 and will speak about it at the Strata Rx conference.

Cheaper, faster, better technology is enabling nearly one in four people around the world to connect with each other anytime, anywhere, as online social networks have changed the way we live, work and play. In healthcare, the data generated by mobile phones and sensors can give us new information about ourselves, extend the reach of our healers and help to accelerate a societal shift towards greater personal engagement in healthcare.

Read more…

Comment |

Cancer and Clinical Trials: The Role of Big Data In Personalizing the Health Experience

Big Data and analytics are the foundation of personalized medicine

This article was written with Ellen M. Martin and Tobi Skotnes. Dr. Feldman will deliver a webinar on this topic on September 18 and will speak about it at the Strata Rx conference.

Despite considerable progress in prevention and treatment, cancer remains the second leading cause of death in the United States. Even with the $50 billion pharmaceutical companies spend on research and development every year, any given cancer drug is ineffective in 75% of the patients receiving it. Typically, oncologists start patients on the cheapest likely chemotherapy (or the one their formulary suggests first) and in the 75% likelihood of non-response, iterate with increasingly expensive drugs until they find one that works, or until the patient dies. This process is inefficient and expensive, and subjects patients to unnecessary side effects, as well as causing them to lose precious time in their fight against a progressive disease. The vision is to enable oncologists to prescribe the right chemical the first time–one that will kill the target cancer cells with the least collateral damage to the patient.

How data can improve cancer treatment

Big data is enabling a new understanding of the molecular biology of cancer. The focus has changed over the last 20 years from the location of the tumor in the body (e.g., breast, colon or blood), to the effect of the individual’s genetics, especially the genetics of that individual’s cancer cells, on her response to treatment and sensitivity to side effects. For example, researchers have to date identified four distinct cell genotypes of breast cancer; identifying the cancer genotype allows the oncologist to prescribe the most effective available drug first.

Herceptin, the first drug developed to target a particular cancer genotype (HER2), rapidly demonstrated both the promise and the limitations of this approach. (Among the limitations, HER2 is only one of four known and many unknown breast cancer genotypes, and treatment selects for populations of resistant cancer cells, so the cancer can return in a more virulent form.)

Read more…

Comment |

Data Science for Business

What business leaders need to know about data and data analysis to drive their businesses forward.

A couple of years ago, Claudia Perlich introduced me to Foster Provost, her PhD adviser. Foster showed me the book he was writing with Tom Fawcett, and using in his teaching at NYU.

DataScienceForBusinessCoverFoster and Tom have a long history of applying data to practical business problems. Their book, which evolved into Data Science for Business, was different from all the other data science books I’ve seen. It wasn’t about tools: Hadoop and R are scarcely mentioned, if at all. It wasn’t about coding: business students don’t need to learn how to implement machine learning algorithms in Python. It is about business: specifically, it’s about the data analytic thinking that business people need to work with data effectively.

Data analytic thinking means knowing what questions to ask, how to ask those questions, and whether the answers you get make sense. Business leaders don’t (and shouldn’t) do the data analysis themselves. But in this data-driven age, it’s critically important for business leaders to understand how to work with the data scientists on their teams. In today’s business world, it’s essential to understand which algorithms are used for different applications, how statistics are used to create models of human and economic behavior, overfitting and its symptoms, and much more. You might not need to know how to implement a machine learning algorithm, but you do need to understand the ideas the data scientists on your team are using.

The goal of data science is putting data to work. That’s what Data Science for Business is all about, and the reason I’m excited to see us publishing it. There are many books about data science, and an increasing number of undergraduate and graduate programs in data science. But I haven’t seen anything that teaches data science for the leaders who will be using data to drive their businesses forward.

Comment |

Genomics and the Role of Big Data in Personalizing the Healthcare Experience

Increasingly available data spurs organizations to make analysis easier

This article was written with Ellen M. Martin and Tobi Skotnes. Dr. Feldman will deliver a webinar on this topic on September 18 and will speak about it at the Strata Rx conference.

Genomics is making headlines in both academia and the celebrity world. With intense media coverage of Angelina Jolie’s recent double mastectomy after genetic tests revealed that she was predisposed to breast cancer, genetic testing and genomics have been propelled to the front of many more minds.

In this new data field, companies are approaching the collection, analysis, and turning of data into usable information from a variety of angles.
Read more…

Comment |

Pricing decisions are going to be made whether you have analytics behind it or not

Strata Community Profile on Jon Higbie, Managing Partner and Chief Scientist of Revenue Analytics

Jon Higbie

Jon Higbie

In his role as chief scientist at Atlanta-based consulting firm Revenue Analytics, Jon Higbie helps clients make sound pricing decisions for everything from hotel rooms, to movie theater popcorn, to that carton of OJ in the fridge.

And in the ever-growing field of data science where start-ups dominate much of the conversation, the 7-year-old company has a longevity that few others can claim just yet. They’ve been around the block a few times, and count behemoth companies like Coca-Cola and IHG among their clients.

We spoke recently about how revenue and pricing strategies have changed in recent years in response to the greater transparency of the internet, and the complex data algorithms that go into creating a simple glass of orange juice.

Read more…

Comment |

Big data comes to the big screen

Using data science to predict the Oscars

By Michael GoldFarsite

Sophisticated algorithms are not going to write the perfect script or crawl YouTube to find the next Justin Beiber (that last one I think we can all be thankful for!). But a model can predict the probability of a nominee winning the Oscar, and recently our model has Argo overtaking Lincoln as the likely winner of Best Picture. Every day on FarsiteForecast.com we’ve been describing applications of data science for the media and entertainment industry, illustrating how our models work, and updating the likely winners based on the outcomes of the Awards Season leading up to the Oscars. 

Just as predictive analytics provides valuable decision-making tools in sectors from retail to healthcare to advocacy, data science can also empower smarter decisions for entertainment executives, which led us to launch the Oscar forecasting project. While the potential for data science to impact any organization is as unique as each company itself, we thought we’d offer a few use cases that have wide application for media and entertainment organizations.

Read more…

Comment: 1 |

Narrative reports vs dashboards

A deconstructed web analytics report shows what the dashboard missed.

We can all agree that in 2013 web analytics is still a nightmare, right?

The last few years have brought about an enormous expansion in the top of the web analytics information overload funnel, and today I can discover just about any aspect of my web traffic that piques my curiosity.

I know how much traffic I’m getting, who told them to come here, how they got here, how long they’re staying, what they’re looking at, what they’re using to look at it, where they’re from, and just about anything else I want to know about them. If I don’t like what I’m looking at, I can customize everything from my dashboard to reports to parameters within those reports.

What none of this tells me is how I can be more successful at turning the words I put on the Internet into dollars in my pocket.

Now, I know what you’re thinking: “It’s all there! More information than you could ever figure out what to do with.”

The problem with that is that it’s all there. It’s more information than I could ever figure out what to do with. Read more…

Comment: 1 |

Four data themes to watch from Strata + Hadoop World 2012

In-memory data storage, SQL, data preparation and asking the right questions all emerged as key trends at Strata + Hadoop World.

At our successful Strata + Hadoop World conference (including successfully avoiding Sandy), a few themes emerged that resonated with my interests and experience as a hands-on data analyst and as a researcher who tracks technology adoption trends. Keep in mind that these themes reflect my personal biases. Others will have a different take on their own key takeaways from the conference.

1. In-memory data storage for faster queries and visualization

Interactive or real-time query for large datasets is seen as a key to analyst productivity (real-time as in query times fast enough to keep the user in the flow of analysis, from sub-second to less than a few minutes). The existing large-scale data management schemes aren’t fast enough and reduce analytical effectiveness when users can’t explore the data by quickly iterating through various query schemes. We see companies with large data stores building out their own in-memory tools, e.g., Dremel at Google, Druid at Metamarkets, and Sting at Netflix, and new tools, like Cloudera’s Impala announcement at the conference, UC Berkeley’s AMPLab’s Spark, SAP Hana, and Platfora.

We saw this coming a few years ago when analysts we pay attention to started building their own in-memory data store sandboxes, often in key/value data management tools like Redis, when trying to make sense of new, large-scale data stores. I know from my own work that there’s no better way to explore a new or unstructured data set than to be able to quickly run off a series of iterative queries, each informed by the last. Read more…

Comment |

Now available: Big Data Now 2012 Edition

O'Reilly's annual data anthology explores the maturation of big data and data science.

Big Data Now 2012 EditionIn the first edition of our free Big Data Now anthology, the O’Reilly team tracked the birth and early development of data tools and data science. Now, with the second edition, we’re seeing what happens when big data grows up: how it’s being applied, where it’s playing a role, and the consequences — good and bad alike — of data’s ascendance.

We’ve organized the 2012 edition of Big Data Now into five areas:

Getting Up to Speed With Big Data — Essential information on the structures and definitions of big data.

Big Data Tools, Techniques, and Strategies — Expert guidance for turning big data theories into big data products.

The Application of Big Data — Examples of big data in action, including a look at the downside of data.

What to Watch for in Big Data — Thoughts on how big data will evolve and the role it will play across industries and domains.

Big Data and Health Care — A special section exploring the possibilities that arise when data and health care come together.

You can download free editions of Big Data Now 2012 in PDF, Mobi and EPUB formats here. The 2011 edition is also available.

Comment |

Deconstructing a Twitter spam attack

Data analysis shows the structure of a network can separate true influencers from fake accounts.

There has been a lot of discussion recently about the effect fake Twitter accounts have on brands trying to keep track of social media engagement. A recent tweet spam attack offers an instructive example.

On the morning of October 1, the delegates attending the Strata Conference in London started to notice that a considerable number of spam tweets were being sent using the #strataconf hashtag. Using a tool developed by Bloom Agency, with data from DataSift, an analysis has been done that sheds light on the spam attack directed at the conference.

The following diagram shows a snapshot of the Twitter conversation after a few tweets had been received containing the #strataconf hashtag. Each red or blue line represents a connection between two Twitter accounts and shows how information flowed as a result of the tweet being sent. By 11 a.m., individual communities had started to emerge that were talking to each other about the conference, and these can clearly be seen in the diagram.

Strataconf tweeting communities

Read more…

Comment: 1 |