ENTRIES TAGGED "algorithms"
The importance of data science tools that let organizations easily combine, deploy, and maintain algorithms
Data science often depends on data pipelines, that involve acquiring, transforming, and loading data. (If you’re fortunate most of the data you need is already in usable form.) Data needs to be assembled and wrangled, before it can be visualized and analyzed. Many companies have data engineers (adept at using workflow tools like Azkaban and Oozie), who manage1 pipelines for data scientists and analysts.
A workflow tool for data analysts: Chronos from airbnb
A raw bash scheduler written in Scala, Chronos is flexible, fault-tolerant2, and distributed (it’s built on top of Mesos). What’s most interesting is that it makes the creation and maintenance of complex workflows more accessible: at least within airbnb, it’s heavily used by analysts.
Job orchestration and scheduling tools contain features that data scientists would appreciate. They make it easy for users to express dependencies (start a job upon the completion of another job), and retries (particularly in cloud computing settings, jobs can fail for a variety of reasons). Chronos comes with a web UI designed to let business analysts3 define, execute, and monitor workflows: a zoomable DAG highlights failed jobs and displays stats that can be used to identify bottlenecks. Chronos lets you include asynchronous jobs – a nice feature for data science pipelines that involve long-running calculations. It also lets you easily define repeating jobs over a finite time interval, something that comes in handy for short-lived4 experiments (e.g. A/B tests or multi-armed bandits).
Alyona Medelyan and Anna Divoli on the opportunities in chaotic data.
Alyona Medelyan and Anna Divoli are inventing tools to help companies contend with vast quantities of fuzzy data. They discuss their work and what lies ahead for big data in this interview.
Sentiment analysis gives algorithmic trading an edge
Sorting through thousands of news stories and categorizing information based on mood and tone creates useful data points for financial systems.
Algorithms go awry on Amazon, the future of Hadoop at Yahoo, and the Supreme Court mulls data mining
In this Strata Week: Algorithm pricing on Amazon pushes the price of a biology book to astronomical levels, Yahoo weighs the future of Hadoop, and the Supreme Court hears arguments about a Vermont law restricting the data mining of prescription records.
One of the largest gatherings of mathematicians, the joint meetings of the AMS/MAA/SIAM, took place last week in San Francisco. Knowing that there were going to be over 6,000 pure and applied mathematicians at Moscone West, I took some time off from work and attended several sessions. Below are a few (somewhat technical) highlights. It’s the only conference I’ve attended where the person managing the press room, was also working on some equations in-between helping the media.