Mac Slocum

Mac Slocum is O'Reilly's Online Managing Editor. He's been writing, editing and producing web content in various forms since the mid '90s. He also dabbles in video interviews from time to time.

Curiosity turned loose on GitHub data

Ilya Grigorik's GitHub project shows what happens when questions, data, and tools converge.

GitHub Archive logoI’m fascinated by people who:

1. Ask the question, “I wonder what happens if I do this?” and then follow it all the way through.

2. Start a project on a whim and open it up so anyone can participate.

Ilya Grigorik (@igrigorik) did both of these things, which is why our recent conversation at Strata Conference + Hadoop World was one of my favorite parts of the event.

By day, Grigorik is a developer advocate on Google’s Make the Web Fast team (he’s a perfect candidate for a future Velocity interview). On the side, he likes to track open source projects on GitHub. As he explained during our chat, this can be a time-intensive hobby:

“I follow about 3,000 open source projects, and I try to keep up with what’s going on, what are people contributing to, what are the new interesting sub-branches of work being done … The problem I ran into about six months ago was that, frankly, it was just too much to keep up with. The GitHub timeline was actually overflowing. In order to keep up, I would have to go in every four hours and scan through everything, and then repeat it. That doesn’t give you much time for sleep.” [Discussed 15 seconds into the interview.]

Grigorik built a system — including a newsletter— that lets him stay in the loop efficiently. He worked with GitHub to archive public GitHub activity, and he then made that data available in raw form and through Google BigQuery (the data is updated hourly).

This is a fun project, no doubt, but it’s also a big deal. Here’s why: When you shorten the distance between questions and answers, you empower people to ask more questions. It’s the liberation of curiosity, and that’s exactly what happened here. Read more…

Comment: 1 |

Now available: Big Data Now 2012 Edition

O'Reilly's annual data anthology explores the maturation of big data and data science.

Big Data Now 2012 EditionIn the first edition of our free Big Data Now anthology, the O’Reilly team tracked the birth and early development of data tools and data science. Now, with the second edition, we’re seeing what happens when big data grows up: how it’s being applied, where it’s playing a role, and the consequences — good and bad alike — of data’s ascendance.

We’ve organized the 2012 edition of Big Data Now into five areas:

Getting Up to Speed With Big Data — Essential information on the structures and definitions of big data.

Big Data Tools, Techniques, and Strategies — Expert guidance for turning big data theories into big data products.

The Application of Big Data — Examples of big data in action, including a look at the downside of data.

What to Watch for in Big Data — Thoughts on how big data will evolve and the role it will play across industries and domains.

Big Data and Health Care — A special section exploring the possibilities that arise when data and health care come together.

You can download free editions of Big Data Now 2012 in PDF, Mobi and EPUB formats here. The 2011 edition is also available.

Comment |

Strata Rx is a wrap

Watch live keynotes from this week's Strata Rx Conference in San Francisco.

The intersection of big data and health care was explored at the O’Reilly Strata Rx Conference. The event has concluded, but you can still access an archive of videos, photos, and speaker slides. Read more…

Comment |

Live from the O’Reilly Strata Conference in London

Catch live keynotes from this week's Strata Conference in London.

Experts from across the data world are coming together at the O’Reilly Strata Conference in London this week. You can watch live keynotes from the event below (full broadcast schedule is available here).

Comment |

NYC Data Week, with a scheduling twist

NYC Data Week is October 22-26. Most events are open and free, and you can add your own to the calendar.

NYC Data Week 2012More than 20 events are already scheduled for NYC Data Week, but there’s an interesting quirk to the planning process: it’s being crowdsourced. That means if you’ve got a New-York-City-based data event that’s scheduled between October 22-26, you can request that it be added to the NYC Data Week lineup.

The purpose of NYC Data Week is to showcase the application of data across New York City. It includes events relevant to startups, established businesses, government, education, finance, and science. If you’re working with data, you’ll have an easy time spotting something of interest — and that crowdsourced option is available in the off chance you don’t see what you’re looking for.

Below you’ll find a few highlights from the current schedule. Most events are open to the public and free to attend. Read more…

Comments: 2 |

When data disrupts health care

The convergence of data, privacy and cost have created a unique opportunity to reshape health care.

Health care appears immune to disruption. It’s a space where the stakes are high, the incumbents are entrenched, and lessons from other industries don’t always apply.

Yet, in a recent conversation between Tim O’Reilly and Roger Magoulas it became evident that we’re approaching an unparalleled opportunity for health care change. O’Reilly and Magoulas explained how the convergence of data access, changing perspectives on privacy, and the enormous expense of care are pushing the health space toward disruption.

As always, the primary catalyst is money. The United States is facing what Magoulas called an “existential crisis in health care costs” [discussed at the 3:43 mark]. Everyone can see that the current model is unsustainable. It simply doesn’t scale. And that means we’ve arrived at a place where party lines are irrelevant and tough solutions are the only options.

“Who is it that said change happens when the pain of not changing is greater than the pain of changing?” O’Reilly asked. “We’re now reaching that point.” [3:55]

(Note: The source of that quote is hard to pin down, but the sentiment certainly applies.)

This willingness to change is shifting perspectives on health data. Some patients are making their personal data available so they and others can benefit. Magoulas noted that even health companies, which have long guarded their data, are warming to collaboration.

At the same time there’s a growing understanding that health data must be contextualized. Simply having genomic information and patient histories isn’t good enough. True insight — the kind that can improve quality of life — is only possible when datasets are combined.

Read more…

Comment |

Visualization of the Week: The story behind the U.S. power grid

"America Revealed" illustrates the complexity of the United States electric power grid.

The PBS TV series "America Revealed" visualizes the creation, use and fragility of the U.S. electric power grid. It's also an example of how data and context should always go together.

Comment |

Visualization of the Week: A whole new way to look at the NBA Finals

A series of basketball visualizations reveal team and player tendencies.

The New York Times uses shot selection and completion data to break down the championship matchup between the Miami Heat and Oklahoma City Thunder.

Comment |
Big data in Europe

Big data in Europe

The organizers of Big Data Week discuss Europe's adoption of big data and data science.

European application of big data is ramping up, but its spread is different from the patterns seen in the U.S. In this interview, Big Data Week organizers Stewart Townsend and Carlos Somohano share the key distinctions and opportunities associated with Europe's data scene.

Comment |
Strata Newsletter: November 30, 2011

Strata Newsletter: November 30, 2011

The rise of Clojure and a look at the design behind data visualizations.

Highlights from the 11/30/11 edition of the Strata newsletter include: Clojure is a rising star in the data world, a look at the top data news, and a deep dive into data visualization design.

Comment |