Big Data may seem like a familiar concept to those working in IT, but for most executives it’s difficult to imagine just how much Big Data impacts business on a daily basis. Most companies already collect customer data, ranging from purchase habits to social media interactions, but few translate their data into actionable business insights. By applying advanced analytics to Big Data, companies can identify patterns and make predictions from huge amounts of information that a single human analyst could never see, let alone understand.
Machine Learning – the core technology behind this type of Big Data analytics – involves a collection of algorithms that are designed to uncover patterns that classical statistical algorithms often fail to detect. Procedures like k-means clustering, support vector machines, Bayes nets, and decision trees are flexible and adapt themselves to nonlinear and high-dimensional data structures. This flexibility comes with a price, however. Expert users must decide in advance on a host of parameter settings – kernel types, cluster numbers, prior probabilities, and so on. The complexity of these decisions necessarily eludes the average analyst. Furthermore, Machine Learning algorithms rest on certain assumptions that are similar to those required for classical statistical analysis. Outliers, missing values, and unusual distributions can invalidate the conclusions drawn from Machine Learning applications.
However, Machine Learning software currently on the market can address this general problem – how to help analysts to draw conclusions that are supportable and understandable in real-world environments. The technology not only chooses appropriate models for analysis, but can also evaluate those models with unique algorithms that detect miss-specifications, outliers and other anomalies. In basic terms, Machine Learning does not require data modelers or business executives to know what to ask from their data. As a result, marketers, executives and students – or relative data novices – have been able to create specialized models such as ordinal logistic regression, zero-inflated Poisson regression, or spectral clustering without worrying about the underlying mathematics. At the other end of the expertise scale, Machine Learning algorithms offers statisticians and data scientists a second opinion on their analyses, uncovering anomalies and violations of assumptions they may not have noticed.
The implications of these technical findings are immense for any business or organization collecting Big Data. For example, Machine Learning helps The SETI Institute (a Skytree customer) analyze an enormous amount of radio signals emitted from the universe and identify anomalies that could signal the presence of intelligent alien life. For more worldly applications, the same anomaly-detection technology SETI uses can predict fraudulent credit behavior as well as elicit the most actionable leads from sales data.
Additionally, Machine Learning has enormous ROI for subscription-based companies and online retailers that generate massive amounts of data on user’s web behavior and purchasing patterns. Machine Learning can segment customers into factions, predict those most likely to churn, and serve up the most relevant marketing message based on a customer’s social graph and past web behavior.
To see Machine Learning in action and learn how Skytree Adviser and its simple interface benefits a wide range of audiences looking to get more from their Big Data, check out my tutorial coming up at Strata Santa Clara on February 11.
Leland Wilkinson, Ph.D., is vice president of data visualization for Skytree, the machine learning company, and also serves as an adjunct professor of computer science at the University of Illinois at Chicago. Previously, Wilkinson was adjunct professor of statistics at Northwestern University and President of SYSTAT Inc., a statistical software company he founded in 1984.