NASA’s Earth Science Data and Information System (ESDIS) Project and its constituent Distributed Active Archive Centers (DAACs) continue to evolve data in NASA’s Earth Observing System Data and Information System (EOSDIS) collection from physical servers into the cloud. This effort is called Cumulus, and has been detailed in earlier articles in this series. The benefits of this evolution to worldwide EOSDIS data users are significant, and include the ability to work with more data more efficiently than ever before.
A key element in this process is determining user requirements to gain a better understanding of how users will interact with data in the cloud, the types of analyses they intend to conduct, and options for architecting the EOSDIS cloud environment to best facilitate data use. This is an important undertaking since the EOSDIS data collection is about to become much larger.
From its current data volume of about 27.5 petabytes (PB) at the end of the 2018 Fiscal Year, the volume is forecast to grow to as much as 250 PB by 2025. This is due to the extremely high volume of data expected from upcoming missions such as the joint NASA/French, Canadian, and United Kingdom Surface Water and Ocean Topography (SWOT) mission and the joint NASA-Indian Space Research Organisation Synthetic Aperture Radar (NISAR) mission, both of which are currently scheduled for launch in 2021. NISAR, for example, is expected to generate approximately 3 terabytes (TB) of Level 0 data each day, which is equivalent to about 3,000 gigabytes (GB) (for comparison, the five instruments aboard the Terra Earth observing satellite generate about 195 GB of Level 0 data each day, according to NASA’s Earth Observing System). For most data users, the current practice of downloading data onto an individual machine for analysis simply won’t work for data collections this large; collections that earn the name “Big Data.”
A primary objective of hosting EOSDIS data in the cloud is to “level the playing field” so anyone can work with these Big Data collections. The ideal user experience (UX) allows data users to work next to EOSDIS data in the cloud, meaning that a user can simply point their analysis software to a data location in the cloud and begin analyzing without the need to transfer or download data. After completing their analyses, a user can view or download the results. An integral part of facilitating this is preprocessing these data into Analysis Ready Data (ARD), which enables end-users to begin working with data immediately.
This would be a straightforward process if all EOSDIS data users interacted with data the same way. However, the millions of individual EOSDIS data users will interact with cloud-based data in different ways depending on their research and analysis requirements as well as their individual level of experience working with EOSDIS data. Some will conduct all their work inside the cloud, some will download data for analysis outside the cloud, and some will work in a hybridized fashion partially inside and outside the cloud. The ESDIS Project must be aware of these uses and have data architecture and systems ready to support these interactions.
Specifically, ESDIS and the DAACs are collecting end-user input to determine:
- What kind of analyses will be conducted?
- What do data users consider ARD and how much preprocessing can ESDIS do?
- Where will users analyze these data—in the cloud, outside of the cloud, or somewhere in-between?
- What support products (such as primers, webpages, webinars, or tutorials) will users need?
Sources for this information include the annual EOSDIS American Customer Satisfaction Index (ACSI) surveys, feedback from webinars, various early-adopter programs, interaction with data users at applications workshops and science meetings, and input from DAAC User Working Groups.