Between the end of September 2018 and November 2019 the volume of data in NASA’s Earth Observing System Data and Information System (EOSDIS) collection grew from 27.4 petabytes (PB) to more than 33 PB. This significant growth is expected to not only continue, but increase at an even more rapid rate with several upcoming Earth observing missions that will add a tremendous amount of new data to the EOSDIS collection over the next five years.
With NASA’s charge to provide these data freely to global data users, NASA’s Earth Science Data and Information System (ESDIS) Project launched Cumulus—a multi-year effort to develop a cloud-based framework for data ingest, archive, distribution, and management. For more information about overall efforts to host EOSDIS data in the commercial cloud, please see the EOSDIS Cloud Evolution page.
Cumulus is an open source workflow system specific to the Earth science archive domain. The system is intended to be used by the EOSDIS Distributed Active Archive Centers (DAACs) as they are migrating their archived data to the Amazon Web Services (AWS) commercial cloud, which has been approved for use by NASA's Office of the Chief Information Officer. The core team for Cumulus comprises several contributing members, including developers from the Land Processes DAAC (LP DAAC) and the National Snow and Ice Data Center DAAC (NSIDC DAAC). The diverse composition of the core team not only provides different perspectives, but also brings to bear decades of archival experience to tackle the challenges related to this cloud migration effort.
Significant accomplishments by the ESDIS Cumulus Core team and EOSDIS DAACs in 2019 furthered this effort toward fruition. These included the addition of new features and capabilities along with enhancements to make Cumulus more robust and secure.
Throughout 2019, the Cumulus Core team focused on meeting the needs of two specific user communities: integrators and operators.
Integrators are traditionally software developers beginning to deploy and use the system for developing product workflows, and are driving the technical capabilities of the system by focusing on system scalability, robustness, and security. These highly technical users are dependent on well-documented, well-tested, and easily-extended interfaces and Application Program Interfaces (APIs). The Cumulus Core team continues to refine the system based on the experience and feedback received from integrators.
Operators, on the other hand, are the day-to-day users working with the Cumulus ingest, archive, and distribution system to ensure that NASA’s EOSDIS data are processed properly into the system as they arrive and are archived and maintained via vigilant data stewardship. Operators require streamlined and intuitive tools that provide dashboards, metrics, and alerts that are relevant and actionable. Throughout 2019, the Cumulus Core team worked to develop new systems and enhance existing systems to provide these necessary tools and metrics.