Evolving NASA Earth Science Data and Services to the Cloud
NASA’s Earth Observing System Data and Information System< (EOSDIS) is in the middle of a critical project prototyping, testing, and evaluating a significant change in the way data users access and use NASA Earth Observation (EO) data. Ironically, data users likely will not even notice if this change is implemented. What they will notice is more efficient access to more data and the ability to do more with these data.
The change being considered is moving EOSDIS data to the cloud. This move would not only be a logical technical evolution for EOSDIS, but also a proactive effort to provide broader acces to a data archive that is expected to grow significantly over the next several years.
Between 2017 and 2022, the ingest rate of data into the EOSDIS archive is projected to grow from the current 3.9 petabytes (PB) per year to as much as 47.7 PB per year, according to estimates from NASA’s Earth Science Data Systems Program. As this ingest rate increases, the volume of data in the EOSDIS archive also is expected to grow—from nearly 22 PB today to more than 37 PB by 2020; by 2025, the volume of data in the EOSDIS archive is expected to be more than 246 PB.
This anticipated growth in both the EOSDIS data ingest rate as well as th overall archive volume pose new challenges for distributing and analyzing data that currently are stored and disseminated through physical servers on-premises at EOSDIS Distributed Active Archive Centers (DAACs). To address these challenges, EOSDIS is examining the effectiveness of using a commercial cloud to ingest, archive, process, distribute, and manage the anticipated large volumes of new mission data. Placing the EOSDIS archive collectively in the cloud will, for the first time, place NASA EO data “close to compute” and improve management and accessibility of these data while also expediting science discovery for data users. Key EOSDIS services such as the Common Metadata Repository (CMR) and Earthdata Search already are in the cloud; moving EOSDIS data to the cloud is a logical next step in this evolution.
EOSDIS has numerous motivations for investigating the use of the cloud for NASA EO data, including:
- The expected significant growth of data in the EOSDIS archive and the NASA requirement to provide these data efficiently and rapidly to worldwide data users;
- The need for EOSDIS to have a cost effective, flexible, and scalable data system with ingest, archive, processing, and distribution solutions that can keep pace with mission advancements and capabilities; and
- The need for data users to efficiently access and process significantly larger data volumes from multiple sources.
The complexity of EO data makes it impossible to simply move these data en masse into the cloud. Rather, the EOSDIS is developing new technologies, services, and architectures—all of which must be thoroughly tested and evaluated—to ensure that these data can work seamlessly in this environment.
Before looking at the potential benefits the cloud brings to EOSDIS and EOSDIS data users, it is important to understand what is meant by “cloud computing.” The National Institute of Standards and Technology (NIST) defines cloud computing as “a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.” As NIST notes, cloud computing includes five essential characteristics: on-demand self-service, broad network access, resource pooling, rapid elasticity, and measured service.
The cloud can be commercial (i.e., a pay-per-use system provided through a public cloud provider such as Amazon Web Services [AWS] or Microsoft Cloud) or on-premises (e.g., a local system housed at a business, organization, or agency). A key aspect of a commercial cloud is the vast capacity this type of system provides over an on-premise system.
For NASA EO data, a commercial public cloud offers several advantages. For one, a commercial system allows non-NASA users to access NASA-managed archives without the need to download data—an important consideration for enabling research using the tremendous EOSDIS archive. In addition, having EOSDIS data in the cloud enables these data to be stored collectively and accessed more easily by users, who can then more efficiently use data from multiple DAACs. Finally, a commercial public cloud does not require the need for NASA credentials to process data, which ensures that these data continue to meet the NASA mandate of providing full and open data access.
Having EOSDIS data in the cloud will not change existing methods of user interaction with these data; it will, however, offer new methods of access not otherwise possible with on-premises platforms. To see how this might work, we will first look at how the current EOSDIS data system provides data archiving and dissemination and then examine the potential benefits the cloud provides.
Under NASA’s full and open data policy, all NASA mission data (along with the algorithms, metadata, and documentation associated with these data) must be freely available and provided to the public as soon as possible following a checkout period to ensure data accuracy and validity; there is no period of exclusive data use. NASA’s EOSDIS provides end-to-end capabilities for managing NASA’s EO data, including data archive, management, and distribution; information management; product generation; and user support services. These services are managed by NASA’s Earth Science Data and Information System (ESDIS) Project<.
EOSDIS data currently are stored on-premises at 12 discipline-specific Distributed Active Archive Centers (DAACs), which archive and disseminate data. In addition, more than a dozen Science Investigator-led Processing Systems, or SIPS, also process mission-specific data and deliver these data to the appropriate DAAC for archiving and dissemination.
EOSDIS is in the middle of a year-long effort testing various ways data might be archived and disseminated in the cloud. EOSDIS requires that any cloud system must be able to provide services in four key areas:
- Data archive: The system must preserve and protect NASA EO data;
- Data management: The system must meet the development and execution of information lifecycle needs of NASA mission-based Earth science data sets;
- Data ingest: The system must support multi-mission, multi-discipline data ingest; and
- Data distribution: The system must support distribution of data, subsetting, and visualization, and must be adaptable to future technologies.
Ideally, the system also will enable large-scale data analytics for data users.
NASA’s Office of the Chief Information Officer (OCIO) has chosen Amazon Web Services (AWS) as the source of general-purpose cloud services for NASA, and EOSDIS and the DAACs are building and testing prototypes to ensure that EOSDIS data and services will work successfully on this commercial cloud platform.
Having EOSDIS data in the cloud brings numerous benefits for both data users and EOSDIS, including:
- Easy access: Data users will be able to access data directly in the cloud, removing the need to download volumes of data for use;
- Rapid deployment: With an established EOSDIS cloud platform, data users can bring their algorithms and processing software to the cloud and work directly with the data in the cloud, simplifying procurement and hardware support while expediting science discovery;
- Scalability: The size and use of the archive can expand easily and rapidly as needed;
- Flexibility: Mission needs can dictate options for selecting operating systems, programming languages, databases, and other criteria to enable the best use of mission data; and
- Cost effectiveness: EOSDIS and NASA pay only for the storage and services actually used. Along with scalability benefits, this allows the amount of storage or services to be continually adjusted to ensure that data and services are effectively provided at the lowest possible cost to NASA and EOSDIS.
With the expected significant growth of NASA EO data archived by EOSDIS, moving these data to a commercial cloud can provide greater efficiency for storing and disseminating these data. It’s important to note even if EOSDIS data are stored and disseminated in the cloud, key aspects of EOSDIS data and services will not change. For example, the DAACs will still serve as gateways to these data and provide a wide range of support services for data users. In fact, it’s likely that EOSDIS data users will not notice any difference in their interactions with the DAACs when searching for and downloading data stored in the cloud. What data users will notice is improved access to data and the ability to more efficiently utilize larger data sets for a broader range of research.
EOSDIS is still evaluating the technical and architectural aspects of this significant evolution, and this will continue over the next several months. The needs of data users remain a top EOSDIS priority, and the innovative technologies being developed by EOSDIS to archive and disseminate data in the cloud may soon enable data users to do more with this valuable resource than ever before.
Learn more about EOSDIS cloud efforts
EOSDIS cloud page: https://earthdata.nasa.gov/cloud
NASA Point of Contact: Mark McInerney, ESDIS Project Deputy Project Manager/Technical, Mark.McInerney@nasa.gov
Last Updated: May 18, 2017 at 12:03 PM EDT