NASA’s Earth Observation data—collected continuously from satellites, aircraft, and ground-based missions for more than a half-century—constitute an invaluable record of Earth processes and a critical resource for scientists and researchers. The techniques and strategies developed by NASA for processing, organizing, archiving, and disseminating these data have led to a national network of interconnected data repositories along with systems that efficiently and effectively deliver these data in a wide range of formats to users around the world.
Managing NASA Earth observing data is the responsibility of the Earth Observing System Data and Information System (EOSDIS), which provides end-to-end capabilities for managing NASA’s Earth science data. According to EOSDIS metrics for 2014, EOSDIS manages more than 9 petabytes (PB) of data. To put this into perspective, 1 PB is equivalent to about 20 million four-drawer filing cabinets filled with text. Even when you go to the next lower order of magnitude of data, the terabyte (TB), you still are talking about a lot of data—10 TB can hold the entire printed collection of the Library of Congress. EOSDIS adds about 6.4 TB of data to its archives and distributes almost 28 TB worth of data to an average of 11,000 unique users around the world every day.
“You can look at EOSDIS as a giant library,” says Kevin Murphy, who served as EOSDIS System Architect and is now the NASA Program Executive for the Earth Science Data Systems (ESDS) Program. “This means you need to know where all the data are and then you have to process the data to make sure they are all consistent and make sure you’re not making changes to the data.”
The foundations of this giant data library are the EOSDIS Distributed Active Archive Centers (DAACs). Due to the size of the data holdings and the breadth of science disciplines represented, EOSDIS data collections are stored in discipline-specific DAACs (Figure 1). For example, the Land Processes DAAC (LP DAAC) in Sioux Falls, South Dakota, is home to NASA Earth science data related to surface reflectance, radiance and temperature; topography; radiation budget; ecosystem variables; land cover; and vegetation indices. The DAACs provide a “concierge”-type of data service support for NASA’s Earth science customers, which is an important service given the complexities of remote sensing data.