ENTRIES TAGGED "data tools"
Tools, Trends, What Pays (and What Doesn't) for Data Professionals
There is no shortage of news about the importance of data or the career opportunities within data. Yet a discussion of modern data tools can help us understand what the current data evolution is all about, and it can also be used as a guide for those considering stepping into the data space or progressing within it.
In our report, 2013 Data Science Salary Survey, we make our own data-driven contribution to the conversation. We collected a survey from attendees of the Strata Conference in New York and Santa Clara, California, about tool usage and salary.
Strata attendees span a wide spectrum within the data world: Hadoop experts and business leaders, software developers and analysts. By no means does everyone use data on a “Big” scale, but almost all attendees have some technical aspect to their role. Strata attendees may not represent a random sample of all professionals working with data, but they do represent a broad slice of the population. If there is a bias, it is likely toward the forefront of the data space, with attendees using the newest tools (or being very interested in learning about them).
We're launching an investigation into in-memory data technologies.
In a forthcoming report we will highlight technologies and solutions that take advantage of the decline in prices of RAM, the popularity of distributed and cloud computing systems, and the need for faster queries on large, distributed data stores. Established technology companies have had interesting offerings, but what initially caught our attention were open source projects that started gaining traction last year.
An example we frequently hear about is the demand for tools that support interactive query performance. Faster query response times translate to more engaged and productive analysts, and real-time reports. Over the past two years several in-memory solutions emerged to deliver 5X-100X faster response times. A recent paper from Microsoft Research noted that even in this era of big data and Hadoop, many MapReduce jobs fit in the memory of a single server. To scale to extremely large datasets several new systems use a combination of distributed computing (in-memory grids), compression, and (columnar) storage technologies.
Another interesting aspect of in-memory technologies is that they seem to be everywhere these days. We’re looking at tools aimed at analysts (Tableau, Qlikview, Tibco Spotfire, Platfora), databases that target specific workloads or data types (VoltDB, SAP HANA, Hekaton, Redis, Druid, Kognitio, and Yarcdata), frameworks for analytics (Spark/Shark, GraphLab, GridGain, Asterix/Hyracks), and the data center (RAMCloud, memory Iocality).
We’ll be talking to companies and hackers to get a sense of how in-memory solutions fit into their planning. Along these lines, we would love to hear what you think about the rise of these technologies, as well as applications, companies and projects we should look at. Feel free to reach out to us on Twitter (Ben is @bigdata and Roger is @rogerm) or leave a comment on this post. Read more…
Michael Italia on making use of data collected in health care settings.
Michael Italia from Children's Hospital of Philadelphia discusses the tools and methods his team uses to manage health care data.
MetaLayer's Jonathan Gosier on data tools and the data divide.
MetaLayer's Jonathan Gosier talks about the need to democratize data tools because everyone has a big data problem.
The best data visualizations expose something new.
Effective data visualizations go beyond aesthetics; they also allow organizations to make quick and correct decisions from massive amounts of information.
Pete Warden on the upside of unstructured data.
Data scientists, it's time to welcome errors and uncertainty into your data projects. In this interview, Jetpac CTO Pete Warden discusses the advantages of unstructured data.
The Global Adaptation Index combines development indicators from 161 countries.
Speed, accessibility and open data have come together in the Global Adaptation Index, a new data browser that rates a given country's vulnerability to environmental shifts.
A new media startup tries to mine the social web for stories.
The newly launched Daily Dot is trying an experiment in community journalism, where the community is the Internet. To support their goal, they’re applying the lens of data journalism to the social web.
Twitter plans to open source its Hadoop-like data processing tool, Storm.
This week's data news includes Twitter's plans to open-source its Hadoop-like data processing tool and some of the various mapping and real-time data efforts tracking the London riots.
Questions surround the Aaron Swartz case and Microsoft wants to help scholars with big data.
Aaron Swartz faces felony charges for downloading "big data" (more than 4 million academic journals) from the MIT library, Microsoft's new data tool is aimed at scholars, and David Eaves looks at open data efforts in Canada.