Strata Week: Dueling views on data center efficiency

The New York Times questions the environmental impact of data centers. Also, big data as hiring manager and inside Foursquare's data science.

The NYT investigates data center pollution, Google buys wind power

The New York Times (NYT) has conducted a year-long investigation into data centers and their environmental impact, and the first reports from the investigation were published this week. NYT writer James Glanz reports that the study showed the tens of thousands of data centers required around the world to process the vast amounts of data produced by billions of users each day “is sharply at odds with its image of sleek efficiency and environmental friendliness.” Glanz says that through interviews and research, the NYT found data centers to be wasteful with electricity. Glanz reports:

“Online companies typically run their facilities at maximum capacity around the clock, whatever the demand. As a result, data centers can waste 90 percent or more of the electricity they pull off the grid, The Times found. To guard against a power failure, they further rely on banks of generators that emit diesel exhaust. The pollution from data centers has increasingly been cited by the authorities for violating clean air regulations, documents show. … Worldwide, the digital warehouses use about 30 billion watts of electricity, roughly equivalent to the output of 30 nuclear power plants, according to estimates industry experts compiled for The Times. Data centers in the United States account for one-quarter to one-third of that load, the estimates show.”

Glanz also notes the findings showed that only about 6 to 12% of the electricity data centers are consuming for servers is actually being used to perform computations — the remaining 88+% is being used to maintain idling servers standing at the ready for surges in site activity. You can find Glanz’s full report, along with analysis and industry interviews, here.

Some have criticized the NYT investigation for lumping all data centers together and for relying on old information without looking at the advances taking place in the industry. Those advances were highlighted this week as Google announced it will be powering one of its data centers with wind-generated power. Google’s director of energy and sustainability, Rick Needham, told Robert McMillan at Wired that Google has committed to a 10-year agreement with the Grand River Dam Authority utility company for 48 megawatts of wind power for its data center in Mayes County, Oklahoma. McMillan reports that construction on a 300-megawatt facility to provide the wind energy is underway. The facility is expected to go online later this year.

Strata Conference + Hadoop World — The O’Reilly Strata Conference, being held Oct. 23-25 in New York City, explores the changes brought to technology and business by big data, data science, and pervasive computing. This year, Strata has joined forces with Hadoop World.

Save 20% on registration with the code RADAR20

Bringing big data to work

Two reports at the Wall Street Journal this week took a look at emerging roles of big data in the workplace. Rachel Emma Silverman looked at how big data can help inform companies on the most effective strategies for employee retention and turnover reduction. Interestingly, predictive modeling often shows that employee pay has little to do with the overall problem. Talking with Mercer senior partner Haig Nalbantian, Silverman highlights an example from a large regional bank client of Nalbantian’s that struggled with high turnover in its customer service areas:

“The bank gathered data on turnover, promotions, job changes and external pay to create a statistical model predicting why workers quit. Though the bank had used frequent pay raises to keep staff, the results showed that raising pay across the board by 10% might only shave a half point off the turnover rate. Workers felt dissatisfied, not underpaid. More rapid job changes, even without promotions or corresponding rises in pay, made it much more likely that high-performing employees would stay, Mr. Nalbantian says.”

Even when pay is the issue, big data can shed light on the most efficient solution. Las Vegas casino chain Caesars looked into the effect of pay on corporate employee retention, Silverman writes, and found turnover at around 16% for those who earned less than the midpoint in their salary ranges. “Bringing an employee’s salary up to the midpoint, the analysis found, reduced attrition to 9%,” Silverman reports. “Going higher than the midpoint, it turned out, had no benefit.”

Big data not only is helping companies to retain employees, but also to find the right employees. The WSJ’s Joseph Walker reports that companies are using algorithms to assess job candidates, a process that depends more on data analysis and personality tests than work history and interview hunches. Walker describes how the process is being used at Xerox call centers:

“Xerox is being advised by Evolv Inc., a San Francisco start-up that helps companies hire and manage hourly workers. By putting applicants through a battery of tests and then tracking their job performance, Evolv has developed a model for the ideal call-center worker. The data say that person lives near the job, has reliable transportation and uses one or more social networks, but not more than four. He or she tends not to be overly inquisitive or empathetic, but is creative.

“Applicants for the job take a 30-minute test that screens them for personality traits and puts them through scenarios they might encounter on the job. Then the program spits out a score: red for low potential, yellow for medium potential or green for high potential. Xerox accepts some yellows if it thinks it can train them, but mostly hires greens.”

Walker also looks at how smaller companies are making use of data analysis and he examines the legal risks data-based hiring can present. You can read his full report here.

A look inside the data science at Foursquare

Blake Shaw, a data scientist at Foursquare, recently spoke at DataGotham about how Foursquare data can be used to analyze the behavior of cities. Shaw looked at data gathered from millions of users in New York City to understand attributes of New York that haven’t previously been observed.

By plotting aggregated check-in points, popular city landmarks and airports can be recognized without a map, for instance. Plot points can be used on a time scale as well to ascertain the busy times for areas, such as coffee shops or even a specific popular restaurant. Shaw says Foursquare wants to use these data points to build their recommendation engine. He also talks about how the data informs personalities of neighborhoods, how it shows how the city behaves under certain circumstances (for every degree the temperature rises above 60 degrees, ice cream shop check-ins rise 2.1%), and how social graphs developed from the data can show the interconnected nature of people in a given area.

Shaw also talks about tools Foursquare is developing to make cities in general more user-friendly. You can watch his presentation in the following video:

Tip us off

News tips and suggestions are always welcome, so please send them along.

Related:

tags: , , , , , , ,