ENTRIES TAGGED "data tools"
Need speed for big data? Think in-memory data management
We're launching an investigation into in-memory data technologies.
By Ben Lorica and Roger Magoulas
In a forthcoming report we will highlight technologies and solutions that take advantage of the decline in prices of RAM, the popularity of distributed and cloud computing systems, and the need for faster queries on large, distributed data stores. Established technology companies have had interesting offerings, but what initially caught our attention were open source projects that started gaining traction last year.
An example we frequently hear about is the demand for tools that support interactive query performance. Faster query response times translate to more engaged and productive analysts, and real-time reports. Over the past two years several in-memory solutions emerged to deliver 5X-100X faster response times. A recent paper from Microsoft Research noted that even in this era of big data and Hadoop, many MapReduce jobs fit in the memory of a single server. To scale to extremely large datasets several new systems use a combination of distributed computing (in-memory grids), compression, and (columnar) storage technologies.
Another interesting aspect of in-memory technologies is that they seem to be everywhere these days. We’re looking at tools aimed at analysts (Tableau, Qlikview, Tibco Spotfire, Platfora), databases that target specific workloads or data types (VoltDB, SAP HANA, Hekaton, Redis, Druid, Kognitio, and Yarcdata), frameworks for analytics (Spark/Shark, GraphLab, GridGain, Asterix/Hyracks), and the data center (RAMCloud, memory Iocality).
We’ll be talking to companies and hackers to get a sense of how in-memory solutions fit into their planning. Along these lines, we would love to hear what you think about the rise of these technologies, as well as applications, companies and projects we should look at. Feel free to reach out to us on Twitter (Ben is @bigdata and Roger is @rogerm) or leave a comment on this post. Read more…
Health records support genetics research at Children’s Hospital of Philadelphia
Michael Italia on making use of data collected in health care settings.
Michael Italia from Children's Hospital of Philadelphia discusses the tools and methods his team uses to manage health care data.
Everyone has a big data problem
MetaLayer's Jonathan Gosier on data tools and the data divide.
MetaLayer's Jonathan Gosier talks about the need to democratize data tools because everyone has a big data problem.
Why data visualization matters
The best data visualizations expose something new.
Effective data visualizations go beyond aesthetics; they also allow organizations to make quick and correct decisions from massive amounts of information.
Embracing the chaos of data
Pete Warden on the upside of unstructured data.
Data scientists, it's time to welcome errors and uncertainty into your data projects. In this interview, Jetpac CTO Pete Warden discusses the advantages of unstructured data.
Global Adaptation Index enables better data-driven decisions
The Global Adaptation Index combines development indicators from 161 countries.
Speed, accessibility and open data have come together in the Global Adaptation Index, a new data browser that rates a given country's vulnerability to environmental shifts.
The Daily Dot wants to tell the web’s story with social data journalism
A new media startup tries to mine the social web for stories.
The newly launched Daily Dot is trying an experiment in community journalism, where the community is the Internet. To support their goal, they’re applying the lens of data journalism to the social web.
Strata Week: Twitter’s coming Storm, data and maps from the London riots
Twitter plans to open source its Hadoop-like data processing tool, Storm.
This week's data news includes Twitter's plans to open-source its Hadoop-like data processing tool and some of the various mapping and real-time data efforts tracking the London riots.
Strata Week: When does data access become data theft?
Questions surround the Aaron Swartz case and Microsoft wants to help scholars with big data.
Aaron Swartz faces felony charges for downloading "big data" (more than 4 million academic journals) from the MIT library, Microsoft's new data tool is aimed at scholars, and David Eaves looks at open data efforts in Canada.
Get started with Hadoop: From evaluation to your first production cluster
Best practices for evaluating Hadoop and setting up an initial cluster (updated March 2012)
Focusing on the Hadoop Distributed File System (HDFS) and MapReduce, this in-depth piece — updated March 2012 — offers tips for organizations that are looking to evaluate Hadoop and deploy an initial cluster.





