ENTRIES TAGGED "data tools"
We're launching an investigation into in-memory data technologies.
In a forthcoming report we will highlight technologies and solutions that take advantage of the decline in prices of RAM, the popularity of distributed and cloud computing systems, and the need for faster queries on large, distributed data stores. Established technology companies have had interesting offerings, but what initially caught our attention were open source projects that started gaining traction last year.
An example we frequently hear about is the demand for tools that support interactive query performance. Faster query response times translate to more engaged and productive analysts, and real-time reports. Over the past two years several in-memory solutions emerged to deliver 5X-100X faster response times. A recent paper from Microsoft Research noted that even in this era of big data and Hadoop, many MapReduce jobs fit in the memory of a single server. To scale to extremely large datasets several new systems use a combination of distributed computing (in-memory grids), compression, and (columnar) storage technologies.
Another interesting aspect of in-memory technologies is that they seem to be everywhere these days. We’re looking at tools aimed at analysts (Tableau, Qlikview, Tibco Spotfire, Platfora), databases that target specific workloads or data types (VoltDB, SAP HANA, Hekaton, Redis, Druid, Kognitio, and Yarcdata), frameworks for analytics (Spark/Shark, GraphLab, GridGain, Asterix/Hyracks), and the data center (RAMCloud, memory Iocality).
We’ll be talking to companies and hackers to get a sense of how in-memory solutions fit into their planning. Along these lines, we would love to hear what you think about the rise of these technologies, as well as applications, companies and projects we should look at. Feel free to reach out to us on Twitter (Ben is @bigdata and Roger is @rogerm) or leave a comment on this post. Read more…
Michael Italia on making use of data collected in health care settings.
Michael Italia from Children's Hospital of Philadelphia discusses the tools and methods his team uses to manage health care data.
MetaLayer's Jonathan Gosier on data tools and the data divide.
MetaLayer's Jonathan Gosier talks about the need to democratize data tools because everyone has a big data problem.
The best data visualizations expose something new.
Effective data visualizations go beyond aesthetics; they also allow organizations to make quick and correct decisions from massive amounts of information.
Pete Warden on the upside of unstructured data.
Data scientists, it's time to welcome errors and uncertainty into your data projects. In this interview, Jetpac CTO Pete Warden discusses the advantages of unstructured data.
The Global Adaptation Index combines development indicators from 161 countries.
Speed, accessibility and open data have come together in the Global Adaptation Index, a new data browser that rates a given country's vulnerability to environmental shifts.
A new media startup tries to mine the social web for stories.
The newly launched Daily Dot is trying an experiment in community journalism, where the community is the Internet. To support their goal, they’re applying the lens of data journalism to the social web.
Twitter plans to open source its Hadoop-like data processing tool, Storm.
This week's data news includes Twitter's plans to open-source its Hadoop-like data processing tool and some of the various mapping and real-time data efforts tracking the London riots.
Questions surround the Aaron Swartz case and Microsoft wants to help scholars with big data.
Aaron Swartz faces felony charges for downloading "big data" (more than 4 million academic journals) from the MIT library, Microsoft's new data tool is aimed at scholars, and David Eaves looks at open data efforts in Canada.
Best practices for evaluating Hadoop and setting up an initial cluster (updated March 2012)
Focusing on the Hadoop Distributed File System (HDFS) and MapReduce, this in-depth piece — updated March 2012 — offers tips for organizations that are looking to evaluate Hadoop and deploy an initial cluster.