ENTRIES TAGGED "data storage"
In-memory data storage, SQL, data preparation and asking the right questions all emerged as key trends at Strata + Hadoop World.
At our successful Strata + Hadoop World conference (including successfully avoiding Sandy), a few themes emerged that resonated with my interests and experience as a hands-on data analyst and as a researcher who tracks technology adoption trends. Keep in mind that these themes reflect my personal biases. Others will have a different take on their own key takeaways from the conference.
1. In-memory data storage for faster queries and visualization
Interactive or real-time query for large datasets is seen as a key to analyst productivity (real-time as in query times fast enough to keep the user in the flow of analysis, from sub-second to less than a few minutes). The existing large-scale data management schemes aren’t fast enough and reduce analytical effectiveness when users can’t explore the data by quickly iterating through various query schemes. We see companies with large data stores building out their own in-memory tools, e.g., Dremel at Google, Druid at Metamarkets, and Sting at Netflix, and new tools, like Cloudera’s Impala announcement at the conference, UC Berkeley’s AMPLab’s Spark, SAP Hana, and Platfora.
We saw this coming a few years ago when analysts we pay attention to started building their own in-memory data store sandboxes, often in key/value data management tools like Redis, when trying to make sense of new, large-scale data stores. I know from my own work that there’s no better way to explore a new or unstructured data set than to be able to quickly run off a series of iterative queries, each informed by the last. Read more…
A Canadian startup aspires to be the GitHub of datasets.
BuzzData looks to tap the gravitational pull of data, then keep people around through conversation and collaboration.
IBM is building a massive 120-petabyte array and Infochimps releases a unified geo schema.
IBM takes data storage to a whole new level (120 petabytes, to be exact), Infochimps' new API tries to make life easier for geo developers, and the "Internet of people" keeps an eye on Hurricane Irene.
Theo Schlossnagle on the state of real-time data analysis and where it needs to go.
Real-time data analysis has come a long way, but Theo Schlossnagle, principal and CEO of OmniTI, says some technology improvements are actually causing a data analysis devolution.
Jeff Jonas on data ownership, security concerns, and privacy trade offs.
In a recent interview, Jeff Jonas, IBM distinguished engineer and chief scientist at IBM Entity Analytics, discussed the willingness of consumers to give away their data and the issues around data replication.
Companies are looking to help business clients store and analyze data.
IBM Netezza and Revolution R Enterprise announced a new partnership, which together with recent moves by Microsoft and HP signal a growing realization that integrating data storage and analysis provides a better client experience.