ENTRIES TAGGED "data tools"

2013 Data Science Salary Survey

Tools, Trends, What Pays (and What Doesn't) for Data Professionals

salary_survey_coverThere is no shortage of news about the importance of data or the career opportunities within data. Yet a discussion of modern data tools can help us understand what the current data evolution is all about, and it can also be used as a guide for those considering stepping into the data space or progressing within it.

In our report, 2013 Data Science Salary Survey, we make our own data-driven contribution to the conversation. We collected a survey from attendees of the Strata Conference in New York and Santa Clara, California, about tool usage and salary.

Strata attendees span a wide spectrum within the data world: Hadoop experts and business leaders, software developers and analysts.  By no means does everyone use data on a “Big” scale, but almost all attendees have some technical aspect to their role.  Strata attendees may not represent a random sample of all professionals working with data, but they do represent a broad slice of the population.  If there is a bias, it is likely toward the forefront of the data space, with attendees using the newest tools (or being very interested in learning about them).

Read more…

Comment |

Need speed for big data? Think in-memory data management

We're launching an investigation into in-memory data technologies.

By Ben Lorica and Roger Magoulas

In a forthcoming report we will highlight technologies and solutions that take advantage of the decline in prices of RAM, the popularity of distributed and cloud computing systems, and the need for faster queries on large, distributed data stores. Established technology companies have had interesting offerings, but what initially caught our attention were open source projects that started gaining traction last year.

An example we frequently hear about is the demand for tools that support interactive query performance. Faster query response times translate to more engaged and productive analysts, and real-time reports. Over the past two years several in-memory solutions emerged to deliver 5X-100X faster response times. A recent paper from Microsoft Research noted that even in this era of big data and Hadoop, many MapReduce jobs fit in the memory of a single server. To scale to extremely large datasets several new systems use a combination of distributed computing (in-memory grids), compression, and (columnar) storage technologies.

Another interesting aspect of in-memory technologies is that they seem to be everywhere these days. We’re looking at tools aimed at analysts (Tableau, Qlikview, Tibco Spotfire, Platfora), databases that target specific workloads or data types (VoltDB, SAP HANA, Hekaton, Redis, Druid, Kognitio, and Yarcdata), frameworks for analytics (Spark/Shark, GraphLab, GridGain, Asterix/Hyracks), and the data center (RAMCloud, memory Iocality).

We’ll be talking to companies and hackers to get a sense of how in-memory solutions fit into their planning. Along these lines, we would love to hear what you think about the rise of these technologies, as well as applications, companies and projects we should look at. Feel free to reach out to us on Twitter (Ben is @bigdata and Roger is @rogerm) or leave a comment on this post. Read more…

Comment |

Health records support genetics research at Children’s Hospital of Philadelphia

Michael Italia on making use of data collected in health care settings.

Michael Italia from Children's Hospital of Philadelphia discusses the tools and methods his team uses to manage health care data.

Comment: 1 |

Everyone has a big data problem

MetaLayer's Jonathan Gosier on data tools and the data divide.

MetaLayer's Jonathan Gosier talks about the need to democratize data tools because everyone has a big data problem.

Comment: 1 |
Why data visualization matters

Why data visualization matters

The best data visualizations expose something new.

Effective data visualizations go beyond aesthetics; they also allow organizations to make quick and correct decisions from massive amounts of information.

Comments: 12 |
Embracing the chaos of data

Embracing the chaos of data

Pete Warden on the upside of unstructured data.

Data scientists, it's time to welcome errors and uncertainty into your data projects. In this interview, Jetpac CTO Pete Warden discusses the advantages of unstructured data.

Comment |
Global Adaptation Index enables better data-driven decisions

Global Adaptation Index enables better data-driven decisions

The Global Adaptation Index combines development indicators from 161 countries.

Speed, accessibility and open data have come together in the Global Adaptation Index, a new data browser that rates a given country's vulnerability to environmental shifts.

Comment |
The Daily Dot wants to tell the web’s story with social data journalism

The Daily Dot wants to tell the web’s story with social data journalism

A new media startup tries to mine the social web for stories.

The newly launched Daily Dot is trying an experiment in community journalism, where the community is the Internet. To support their goal, they’re applying the lens of data journalism to the social web.

Comment |
Strata Week: Twitter’s coming Storm, data and maps from the London riots

Strata Week: Twitter’s coming Storm, data and maps from the London riots

Twitter plans to open source its Hadoop-like data processing tool, Storm.

This week's data news includes Twitter's plans to open-source its Hadoop-like data processing tool and some of the various mapping and real-time data efforts tracking the London riots.

Comment |
Strata Week: When does data access become data theft?

Strata Week: When does data access become data theft?

Questions surround the Aaron Swartz case and Microsoft wants to help scholars with big data.

Aaron Swartz faces felony charges for downloading "big data" (more than 4 million academic journals) from the MIT library, Microsoft's new data tool is aimed at scholars, and David Eaves looks at open data efforts in Canada.

Comments: 3 |