Here’s a look at the latest data news and developments that caught my eye.
Algorithms to sniff out disloyal troops
The US Army is vexed by the problem of troops who become disaffected and, by extension, a risk to the operations they’re involved in. Through a DARPA-sponsored research project, the Army hopes to use big data analytics to identify individuals likely to pose a threat.
In an announcement for an investigative “industry day” for the ADAMS (Anomaly Detection at Multiple Scales ) project, DARPA framed the problem:
Each time we see an incident like a soldier in good mental health becoming homicidal or suicidal or an innocent insider becoming malicious we wonder why we didn’t see it coming. When we look through the evidence after the fact, we often find a trail — sometimes even an “obvious” one.
… The focus is on malevolent insiders that started out as “good guys.” The specific goal of ADAMS is to detect anomalous behaviors before or shortly after they turn. Operators in the counter- intelligence community are the target end-users for ADAMS insider threat detection technology.
Reporting on ADAMS, Wired notes that there’s still much to consider about here:
All this suggests the blind are still leading the blind when it comes to stopping internal military subversion. It’s far from clear what kind of data — troops’ e-mail? web trails? book orders? — DARPA would use to ferret out troops who pose a risk to themselves or others.
CouchDB in the movies
A NoSQL column-store database, CouchDB is noted for its support of replication and synchronization. Notably used in Ubuntu’s personal cloud technology for synchronization, CouchDB provides a great substrate for replicating, periodically disconnected, data services.
These features for synchronization and replication made CouchDB an attractive solution for the needs of Novacut, a new open source video editing solution inspired by the decentralized version control systems available to programmers. By using cloud storage and sharing metadata documenting changes made to a video, many people can work on a single video in a distributed fashion.
In the words of the Novacut home page:
Such an editor will help artists reduce costs, work faster, and collaborate with the right people. Such an editor will help independent TV and film succeed. We want artists to win!
Riak adds full-text searching
A NoSQL database modeled on Amazon’s Dynamo architecture, Riak is maturing rapidly. Riak provides a decentralized key-value store, a MapReduce engine, and an HTTP/JSON query interface. Aimed at deployment in web applications, Riak’s architecture is scalable and fault tolerant.
Basho, the creators of Riak, announced the release of Riak 0.13, including a full-text search engine, Riak Search. Like the database itself, Riak Search is scalable and fault tolerant, operating in real time.
Riak’s 0.13 release also improves the performance of its MapReduce functionality, and is generally less resource-hungry than previous versions. Both open source and enterprise editions are available.
Hadoop World announcements
This week saw Cloudera’s Hadoop World conference in New York, and a burst of product announcements from companies in the Hadoop ecosystem.
- SD Times reports that Revolution Analytics, the makers of R, have hired the author of the RHIPE package, which integrates R with Hadoop. In hiring Saptarshi Guha, Revolution Analytics joins the trend of analytics and data vendors getting behind Hadoop as a common substrate for their platforms.
- Karmasphere announced the release of the Professional Edition of their Karmasphere Studio product, a graphical environment that eases the development, debugging, deployment and monitoring of Hadoop jobs. The Professional Edition adds graphical instrumentation and rule-based diagnostic tools for monitoring performance.
- Datameer announced the first formal release of their Datameer Analytics Solution (DAS), a spreadsheet-driven interface for data analysis with Hadoop. DAS facilitates end-to-end big data processing, from import through to reporting.
- Quest Software announced OraOop, a connector that allows rapid and scalable data transfer between Oracle databases and Hadoop. OraOop takes the form of a plugin to Cloudera’s Hadoop database connector, Sqoop, offering improved performance when used with Oracle.
- Also connecting their database to Hadoop is Membase (formerly Northscale). Membase is a fast key-value data store. Through integration with Cloudera’s Hadoop Distribution, data can be accumulated in Membase and streamed through to Hadoop for processing. A Sqoop-derived connector also allows the rapid loading of data between the two systems
Send us news
Email us news, tips and interesting tidbits at email@example.com.