Clouds in the sky constantly grow and shrink as they adjust to evolving atmospheric conditions. A cloud computing environment, like an atmospheric cloud, also easily can adjust to evolving conditions, expanding or contracting as needed based on data storage requirements and the needs of data users. This flexibility helps make the commercial cloud a viable option for archiving and disseminating large volumes of data or for managing data holdings that are expected to change rapidly over a short amount of time.
NASA’s Earth Observing System Data and Information System (EOSDIS) is responsible for a data collection that is both large in volume and projected to grow rapidly over the next several years. From its current size of almost 22 petabytes (PB), the volume of data in the EOSDIS archive is expected to increase to almost 247 PB by 2025, according to estimates by the Earth Science Data Systems (ESDS) Program.
To prepare for this tremendous growth and efficiently provide access to these data, the EOSDIS is investigating the evolution of its data and services to run in the commercial cloud. As part of these efforts, staff at the Earth Science Data and Information System (ESDIS) Project (which manages EOSDIS data) are prototyping and testing how EOSDIS data collections can be archived collectively and disseminated in the cloud. As befitting the cloud environment, this prototype is called Cumulus.
A primary feature of Cumulus is a cloud-based framework for data ingest, archive, distribution, and management, which are the primary activities of the discipline-specific Distributed Active Archive Centers (DAACs). The overall Cumulus goal is to provide the following functionality in the commercial cloud:
- Data acquisition from data providers (such as NASA science teams),
- Data ingest (including validation and processing),
- The harvest, creation, and publication of dataset metadata to the Common Metadata Repository (CMR),
- The storage and distribution of data, including disaster recovery, and
- Publication of metrics to the ESDIS Metrics System (EMS), which collects and organizes various metrics from the DAACs and other data providers.
The DAACs would still serve as gateways to EOSDIS Earth science data and continue to provide a wide range of support services for data users. EOSDIS data users likely would not even notice any difference in interactions with their discipline-specific DAACs and Earthdata Search when searching for and downloading data that happens to be stored in the cloud.
Selected EOSDIS data and services already are operating in Amazon Web Services (AWS), which currently is the only NASA-approved commercial cloud provider. Earthdata Search and the CMR evolved to the cloud in September 2016 and April 2017, respectively. Global Imagery Browse Services (GIBS), which provides access to over 400 satellite imagery products that can be viewed using client applications such as Worldview, is expected to evolve to the cloud as a prototype starting in 2018. The next step in this evolution is to prototype and test EOSDIS data collections in the cloud. This is an important undertaking given the expected significant growth of the EOSDIS archive.