ENTRIES TAGGED "data product"
Electronic Arts CTO Rajat Taneja on big data's growing role in the video game world.
Electronic Arts (EA) isn’t the first company that comes to mind when you think of big data. Yet the gaming company is collecting increasing amounts of data about its online players, and as this data accumulates and gains steam, it falls under the big data category.
If a game maker like EA is considered a big data company, it could have implications for other companies we might not think of as typical big data generators. With that in mind, I got in touch with Rajat Taneja, chief technology officer at EA and a keynote speaker at the upcoming Strata Conference in California. Since Taneja came on board with EA in 2011, he’s helped steer the company’s technological initiatives, including understanding the impact this growing data store will have on the firm — both from a processing standpoint and how to use it to provide games and services customers want most. He says no matter what your company does, if you have constantly connected online services, you are very likely going to be dealing with lots of data.
Our interview follows. Read more…
O'Reilly's annual data anthology explores the maturation of big data and data science.
In the first edition of our free Big Data Now anthology, the O’Reilly team tracked the birth and early development of data tools and data science. Now, with the second edition, we’re seeing what happens when big data grows up: how it’s being applied, where it’s playing a role, and the consequences — good and bad alike — of data’s ascendance.
We’ve organized the 2012 edition of Big Data Now into five areas:
Getting Up to Speed With Big Data — Essential information on the structures and definitions of big data.
Big Data Tools, Techniques, and Strategies — Expert guidance for turning big data theories into big data products.
The Application of Big Data — Examples of big data in action, including a look at the downside of data.
What to Watch for in Big Data — Thoughts on how big data will evolve and the role it will play across industries and domains.
Big Data and Health Care — A special section exploring the possibilities that arise when data and health care come together.
The biggest threat that a data-driven world presents is an ethical one.
Since the first of our ancestors chipped stone into weapon, technology has divided us. Seldom more than today, however: a connected, always-on society promises health, wisdom, and efficiency even as it threatens an end to privacy and the rise of prejudice masked as science.
On its surface, a data-driven society is more transparent, and makes better uses of its resources. By connecting human knowledge, and mining it for insights, we can pinpoint problems before they become disasters, warding off disease and shining the harsh light of data on injustice and corruption. Data is making cities smarter, watering the grass roots, and improving the way we teach.
But for every accolade, there’s a cautionary tale. It’s easy to forget that data is merely a tool, and in the wrong hands, that tool can do powerful wrong. Data erodes our privacy. It predicts us, often with unerring accuracy — and treating those predictions as fact is a new, insidious form of prejudice. And it can collect the chaff of our digital lives, harvesting a picture of us we may not want others to know.
The big data movement isn’t just about knowing more things. It’s about a fundamental shift from scarcity to abundance. Most markets are defined by scarcity — the price of diamonds, or oil, or music. But when things become so cheap they’re nearly free, a funny thing happens.
Consider the advent of steam power. Economist Stanley Jevons, in what’s known as Jevons’ Paradox, observed that as the efficiency of steam engines increased, coal consumption went up. That’s not what was supposed to happen. Jevons realized that abundance creates new ways of using something. As steam became cheap, we found new ways of using it, which created demand.
The same thing is happening with data. A report that took a month to run is now just a few taps on a tablet. An unthinkably complex analysis of competitors is now a Google search. And the global distribution of multimedia content that once required a broadcast license is now an upload. Read more…
Watch live keynotes from this week's Strata Rx Conference in San Francisco.
The intersection of big data and health care was explored at the O’Reilly Strata Rx Conference. The event has concluded, but you can still access an archive of videos, photos, and speaker slides. Read more…
Quickly perform and interpret the results of routine Small Data analysis
With so much focus on Big Data, the needs of many analysts who work with Small Data tend to get ignored. The default tool for many of these users remains spreadsheets1 and/or statistical packages which come with a lot of features and options. However many analysts need a very small subset of what these tools have to offer.
Enter Statwing, a software-as-a-service provider for routine statistical analysis. While the tool is still in the early stages, it can already do many basic “data analysis” tasks.
Consider the following example of a pivot table constructed in Excel: this required 8 mouse-clicks, if you do everything perfectly, and about 5 decisions (what variables to include, what metric to use, …)
The same task in Statwing required 4 mouse-clicks and 0 decisions! Plus it comes with visuals:
The lack of clutter and the addition of a simple “headline” (“Female tends to have much higher values for satisfaction than Male“), makes the result much easier to interpret. The advanced tab contains detailed statistical analysis (in this case the p-value, counts, values). Many users get confused by the output/results produced by traditional statistical software. Let’s face it, many analysts have had little training in statistics. I welcome a tool that produces readily interpretable results.
The company hopes to replicate the above example across a wide variety of routine data analysis tasks. Their initial focus is on tools for (consumer) survey analysis, a potentially huge market given that online companies have made surveys so much easier to conduct. Users of Statwing pay a small monthly subscription, making it cheaper than most2 statistical packages. For a small monthly fee, their intuitive UI lets analysts get their tasks done quickly. More importantly Statwing may nurture aspiring data scientists in your organization.
(1) As this recent Strata presentation points out: Spreadsheets are the glue that keeps many organizations together.
(2) Open source tools like OpenOffice, R and Octave are free. So is the use of Google spreadsheets.
With a new mobile app and API, Captricity wants to build a better bridge between analog and digital.
Unlocking data from paper forms is the problem that optical character recognition (OCR) software is supposed to solve. Two issues persist, however. First, the hardware and software involved are expensive, creating challenges for cash-strapped nonprofits and government. Second, all of the information on a given document is scanned into a system, including sensitive details like Social Security numbers and other personally identifiable information. This is a particularly difficult issue with respect to health care or bringing open government to courts: privacy by obscurity will no longer apply.
The process of converting paper forms into structured data still hasn’t been significantly disrupted by rapid growth of the Internet, distributed computing and mobile devices. Fields that range from research science to medicine to law to education to consumer finance to government all need better, cheaper bridges from the analog to the digital sphere.
“I was looking at the information systems that were available to these low-resource organizations,” Chen said in a recent phone interview. “I saw that they’re very much bound in paper. There’s actually a lot of efforts to modernize the infrastructure and put in mobile phones. Now that there’s mobile connectivity, you can run a health clinic on solar panels and long distance Wi-Fi. At the end of the day, however, business processes are still on paper because they had to be essentially fail-proof. Technology fails all the time. From that perspective, paper is going to stick around for a very long time. If we’re really going to tackle the challenge of the availability of data, we shouldn’t necessarily be trying to change the technology infrastructure first — bringing mobile phones and iPads to where there’s paper — but really to start with solving the paper problem.”
When Chen saw that data entry was a chokepoint for digitizing health indicators, he started working on developing a better, cheaper way to ingest data on forms. Read more…
Bitsy Bentley on the work behind a good visualization and why she hopes users will take data interactions for granted.
Because of the size, complexity and density of big data, it’s not always easy to find the important insights hiding in all that information. That’s where data visualization comes into play. A great visualization creates meaning where none existed.
Bitsy Bentley (@bitsybot) is the director of data visualization at GfK Custom Research, where she works with information designers to craft meaningful data experiences for a variety of business audiences. In the following interview, she discusses the space between a “wow” response and an “aha” moment, how her team addresses privacy concerns, and why practice is vital for both visualization creators and viewers.
Bentley will explore related visualization topics during her presentation at Strata Conference + Hadoop World in New York City later this month.
Why are data visualizations an effective way to understand the underlying data?
Bitsy Bentley: There is so much beauty and richness in big datasets, and now that we have enough processing power to harness that richness, it’s little wonder that interest in data visualization is exploding. To quote John Tukey: “The greatest value of a picture is when it forces us to notice what we never expected to see.” My clients find that, whether they’re more concerned with numbers or more concerned with stories, an appropriate visual is integral to their understanding of the data.
Visualization unlocks the serendipity of data analysis. It provides a language that is less intimidating than an overwhelming array of digits. Something as simple as a set of histograms breaking down the distribution of a data store makes it easy to find irregularities and outliers in the data. Read more…
Look inside health data access and you'll see why "ownership" is inadequate for patient information.
Patients, doctors and providers have a unique set of privileges that do not line up exactly with a traditional concept of ownership.
MIT and Massachusetts plan a big data initiative, Cisco predicts the Internet's big data future.
MIT announces a big data research center, Cisco predicts the future of the Internet (in zettabytes), and open data startup Junar announces seed funding.
Visualizing cities' energy usage, population density, and material intensity.
This week's visualization is an interactive web-mapping tool that lets you explore energy usage, material intensity and the overall "urban metabolism" of major U.S. cities.