Earthquakes are HUGE on Data.gov

Checking in on Data.gov roughly one year later

Strata 2011After launching just over a year ago with only 47 data sets, the “Raw Data Catalog” catalog on Data.gov now has 2,326 entries that have been collectively downloaded almost three-quarters of a million times. Of course, even these sizable download counts understate the actual impact of this data, which is being embedded in a variety of sites and apps, like those being developed for the Health 2.0 Developer Challenge.

The big Data.gov winner so far? The Department of the Interior’sWorldwide M1+ Earthquakes, Past 7 Days” data set. My guess is that there is some great app or visualization out there making daily use of this file — if you know what it it is, report it in the comments.


Update: In the comments, Mike suggested that earthquake downloads could be driven by a recurring visualization in the Popular Mechanics iPad App. I tracked down the app’s developer, Jonathan Cousins, and he confirmed that “the app grabs data about the most recent seismic activity from USGS feeds via wifi or 3G. ” Not quite sure of the mechanics of how this is being tallied on Data.gov, but it’s a really great example of how someone is using this data to create new value.


The top 10 data sets by download count are:

  1. Worldwide M1+ Earthquakes, Past 7 Days. 122,888 downloads. Real-time, worldwide earthquake list for the past 7 days. Department of the Interior.
  2. Latest Volumes of Foreign Relations of the United States. 10,090 downloads. The feed for the latest ten volumes of the official historical documentary record of U.S. foreign policy in the Foreign Relations of the United States series. Department of State.
  3. U.S. Overseas Loans and Grants (Greenbook). 6,670 downloads. These data are U.S Economic and Military Assistance by country from 1946 to the present. US Agency for International Development.

  4. Interested in making sense of your data, or teaching others how? The O’Reilly Stata Conference: The Business of Data, is happening 1-3 February, 2011, in Santa Clara, CA.


  5. Child-Related Product Recalls. 2,784 downloads. Lists recalls from CPSC, the agency charged with protecting the public from unreasonable risks of serious injury or death from thousands of types of consumer products. US Consumer Product Safety Commission.
  6. Airline On-Time Performance and Causes of Flight Delays. 2,716 downloads. On-time arrival data for non-stop domestic flights by major air carriers, as well as additional items, such as departure and arrival delays, origin and destination airports, flight numbers, scheduled and actual departure and arrival times, cancelled or diverted flights, taxi-out and taxi-in times, air time, and non-stop distance. Department of Transportation.
  7. 2005 Toxics Release Inventory data for American Samoa. 2,628 downloads. The Toxics Release Inventory (TRI) is a publicly available EPA database that contains information on toxic chemical releases and waste management activities reported annually by certain industries as well as federal facilities. Environmental Protection Agency.
  8. OSHA Data Initiative – Establishment Specific Injury and Illness Rates. 2,588 downloads. The data used by OSHA to calculate establishment-specific injury and illness incidence rates. Department of Labor.
  9. 2001 Federal Register in XML. 2,506 downloads. The official daily publication for rules, proposed rules, and notices of Federal agencies and organizations, as well as executive orders and other presidential documents. National Archives and Records Administration.
  10. 2007 National RCRA Hazardous Waste Biennial Report Data Files. 2,266 downloads. Data on the generation of hazardous waste from large-quantity generators and on waste management practices from treatment, storage, and disposal facilities. Environmental Protection Agency.
  11. Residential Energy Consumption Survey (RECS) Files, All Data, 2005 2,000 Downloads. Data on the use of energy in residential housing units including physical housing unit types, appliances utilized, demographics, fuels, and other energy-use information from the Residential Energy Consumption Survey (RECS), which is conducted every four years. Department of Energy.

Here’s a breakdown of the contributions by agency:

Agency Data sets contributed Downloads
Environmental Protection Agency 474 160,716
Department of Defense 214 44,837
Department of the Interior 197 157,273
Department of Commerce 176 37,430
Department of Health and Human Services 144 43,697
Executive Office of the President 132 7,569
Department of the Treasury 93 49,859
Department of Justice 90 16,392
Department of Energy 86 12,965
All remaining agencies 740 209,872

Finally, here’s a link to the data.gov catalog that includes the number of times the set has been downloaded. (If you’re interested in how this was done, check out Use BeautifulSoup to parse data.gov over on O’Reilly Answers).

Congrats to everyone at data.gov for creating this incredible resource for developers-at-large.

tags: , ,
  • Marten

    Nice post, but you seem to be missing about 274,000 entries in the geodata catalog in data.gov (http://www.data.gov/catalog/geodata). These originate from http://www.geodata.gov, a geospatial catalog created some 7 years ago with very much the same goal in mind as data.gov.

    The earthquakes ‘file’ is really a GeoRSS feed that alerts you/apps after earthquakes exceed the specific magnitude. In addition to this one, DOI/USGS publishes several others. These feeds have helped greatly after the tragic Haiti and Chile earthquake events earlier this year.

    What’s interesting (and sad) is that many government agencies have published geospatial web services based on industry and general IT standards (OGC, REST, SOAP). they provide access to huge amounts of data without the need to download files (web services!). Data.gov has thus far focused only on data files. Would be great if they also start to include these web services that allow for building apps. Then data.gov would truly become a federal information sharing platform.

    @martenhogeweg

    http://martenhogeweg.blogspot.com/2010/06/accessing-datagov-catalog-through-open.html

    http://martenhogeweg.blogspot.com/2010/07/datagov-adds-geoviewer.html

    • http://radar.oreilly.com/andrewo Andrew Odewahn

      Great points, Marten. I’d noticed that they segmented the site into the “Raw Data” catalog (which is the catalog I used) and the Geodata catalog (which I did not include). I should have been more clear. But, it’s interesting that the earthquake feed shows up in the raw data catalog. It must be some oversite, perhaps? Also, I’m curious on how this was counted as a “download” at all on data.gov?

      As far as the point you raise on web services — YES! I find myself a bit perplexed on the focus on files, versus web services or feeds. It seems like that would make them much more accessible for developers to do interesting things with. There are third parties republishing this stuff already (like infochimps), so maybe that type of “value add” is something they’re looking to the private sector to support. Or, maybe the focus was just on getting data out of the agencies as quickly as possible, and files were the best route. (Which seems like a very pragmatic approach.)

      I’ve heard from a several sources that the govt. geo community is very well organized and has been sharing data for a long time — not just at the federal level, but at the state and local levels as well. Any thoughts on why?

  • Mike

    The earthquake traffic might be (partly) due to Popular Mechanics. Their iPad edition has a great data visualization that’s continuously updated from live data:

    http://itunes.apple.com/us/app/popular-mechanics-interactive/id378868851?mt=8

  • Marten

    @Andrew, the separation of catalogs is historic. data.gov started indeed with just the 1 ‘raw’ data catalog. In June 2009 we supported the integration with a interoperable catalog service geodata.gov had been providing. At the time it might have been easier to integrate geodata as a separate catalog. indeed, some of the ‘raw’ data (like the earthquakes) is considered geodata by the geo community.

    Why does the geo community appear to be well organized? hmmm, perhaps because the types of problems tackled or geospatial analyses performed required integration of many types of data typically managed by different agencies at multiple levels of government. Think city planning, disaster response, climate change, etc.

    Hence an early need for data exchange that resulted in:

    - standardization of file formats (de facto standards like the Esri Shapefile format http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf or de jure standards like SDTS http://mcmcweb.er.usgs.gov/sdts/, both of which have been around for quite some time)

    - documentation of shared content in standardized metadata descriptions (as created by FGDC http://www.fgdc.gov/metadata)

    - and consensus-based service interfaces (such as those defined by the Open Geospatial Consortium http://www.opengeospatial.org/).

  • Xavier Badosa

    “Or, maybe the focus was just on getting data out of the agencies as quickly as possible, and files were the best route.”

    That was the approach. The Data.gov Concept of Operations Draft (Dec. 3, 2009) promised APIs in 6 months

    http://tr.im/HyOi (Word doc)

    but I’m not aware of any progress in this field (there has been more progress, for example, in the semantic web field, also mentioned in the Draft).

    This is the relevant paragraph:

    “Developers will interact with Data.gov through multiple Application Programming Interfaces (APIs), as shown below in Figure 11. The APIs will give programmatic access to the Data.gov catalog entries and the data within the shared data storage service. These APIs are a near-term objective and are expected to be developed over the next six months.”

  • ian

    Big Data is great, but unless you can visualize/analyze it, the question is moot. We have tornadoes, earthquakes, coastal vulnerability to sea level rise and others from the doom & gloom collection. Play around with geofactfinder.com and feedback welcome!

  • Joseph Kelly

    Infochimps has a dataset of tweets from the Chilean earthquake this year: http://infochimps.org/search?query=earthquake

    Not only is data.gov impressive on its own, but when its data can be a resource for smashing against other datasets, it becomes immensely more powerful.

  • Marten

    @Xavier, yes there was definitely a push for large numbers and if you chop up a national dataset into census block groups/counties/zip/etc you easily increase the numbers. But most of the data available now through data.gov has been made available from the various federal agencies for years. low hanging fruit so to say…

    but if you look at for example: http://services.nationalmap.gov/ArcGIS/rest/services you’ll see that federal agencies are indeed working on open web services at a national scale.

    It’s just a matter of data.gov allowing the registration and discovery of these and through its geodata catalog this is techincally possible today without any work on the data.gov side!

    @Joseph, yes indeed, and services like the above are not just capable of generating a picture of a map, but also support interaction through OGC WMS, OGC KML, JSON, and SOAP, thus giving developers with different preferences and programming environment each the most optimal access to data.

  • http://semanticommunity.net/ Brand Niemann

    You have provided a great service by extracting the Data.gov Catalog so that it could be reused in other tools like Spotfire for more advanced analytics – see http://epadata.wik.is/EPA_Data.Gov_Inventory and http://gaininitiative.wik.is/. There are other Web pages this needs to be applied to, e.g. http://www.it.ojp.gov/framesets/iepd-clearinghouse-noClose.htm. Can you do this and post the Excel.zip file? Thanks, Brand

  • http://semanticommunity.net/ Brand Niemann

    And thanks to you I made this post to the Data.gov Ideascale recently – see http://datagov.ideascale.com/a/dtd/Build-Data-Catalogs-in-the-Cloud-in-Support-of-Data.gov—EPA-s-Strategic-Data-Action-Plan/81609-6440

    Do you know of a good tool to build a spreadsheet that contains an index to the files on your hard drive or at least a subdirectory folder?

  • http://www.igsenergy.com/Residential Natural Gas Company

    There aren’t any that I know of.