The joint NASA/ISRO (Indian Space Research Organisation) Synthetic Aperture Radar (NISAR) mission, currently scheduled to launch later in 2024, will be a landmark event. The NISAR satellite will carry L- and S-band synthetic aperture radars (SAR) designed to systematically map Earth and measure changes on the planet's surface, including movements as small as a centimeter.
Further, NISAR is expected to generate as much as 140 petabytes (PB) of data over its scheduled three-year mission. In comparison, the total volume of data in NASA's Earth Observing System Data and Information System (EOSDIS) archive was about 116 PB at the end of February 2024, according to monthly metrics from the agency's Earth Science Data and Information System (ESDIS) Project. In fact, NISAR is expected to generate a data volume close to 85 terabytes (TB) each day, which is much greater than any currently operating NASA Earth observing mission.
Yet, while such a large amount of data is a boon to scientists, it presents something of a challenge to both ESDIS, which is responsible for processing, archiving, and distributing Earth science data, and NASA's Alaska Satellite Facility Distributed Active Archive Center (ASF DAAC), which will archive and distribute NISAR data.
"Preparing the ESDIS enterprise for NISAR meant reimagining almost everything we had spent decades mastering," said Dana Shum, ESDIS deputy project manager of mission services. "Our on-premises enterprise architecture was not designed to handle the vast volumes of data that NISAR was projected to produce, and our download-first mentality—that had served our users well for decades—would not work for NISAR users due to the size of the data files."
That "reimagining" took the form of Getting Ready for NISAR (GRFN, pronounced "Griffin"), a pathfinder, proof-of-concept initiative launched in 2016. GRFN sought to create a high-volume software data system (SDS)-DAAC interface for ingesting, archiving, and distributing large datasets entirely within a commercial cloud environment.
By the end of its first year, GRFN featured a NASA Jet Propulsion Laboratory (JPL)-built SDS in the Amazon Web Services (AWS) cloud that used SAR data products from the ESA (European Space Agency) Sentinel-1 mission to both generate interferograms (i.e., an image that combines two or more SAR images of the same area to reveal surface displacement or motion) and deliver them to ASF DAAC. During this same period, ASF DAAC personnel developed procedures to manage every aspect of the data lifecycle for the products in AWS, including ingest, storage, discovery, distribution, and on-demand product generation. The result was a system that not only allowed data users to familiarize themselves with using SAR products in the cloud, but also provided ESDIS with a better understanding of the costs associated with the cloud-based storage and dissemination of large file sizes and volumes associated with SAR data.
"One of the primary concerns we approached early in the mission-planning phase was having the science data processing system located at JPL and the archive located at the Alaska Satellite Facility in the same region of the AWS cloud," said Karen Michael, ESDIS mission system manager. "This is important because moving data between different regions of the AWS cloud (East versus West) results in egress costs. Co-locating the SDS and archive in the same region of the cloud lets us avoid paying those egress costs. The other advantage to handling the large volumes of NISAR data in the cloud is to offer services in the cloud so that scientists and users of the data [can] perform their analysis on the data without [the data] ever having to leave the cloud."
Along with demonstrating the viability of using the commercial cloud for SAR missions like NISAR, GRFN also showcased the tools and techniques that would later be applied to other data-intensive endeavors, such as NASA’s Surface Water and Ocean Topography (SWOT) mission.
For example, the GRFN SDS relied on Cumulus, an ESDIS-created cloud-based framework that provides data acquisition (from providers including NASA science teams), ingest (including validation and processing), publication of dataset metadata to NASA’s Common Metadata Repository (CMR), storage, distribution, and publication of metrics to the ESDIS Metrics System (EMS) entirely within the cloud. Further, Cumulus is integrated with the NASA-Compliant General Application Platform (NGAP), a custom-built cloud-optimized platform that provides highly flexible cloud-native infrastructure, NASA-compliant information technology security controls, networking services, and cost control in AWS.
"GRFN was a multiyear cooperative effort to get ready for NISAR and really helped us learn how to archive and distribute large-scale Earth science data from the commercial cloud," said Drew Kittel, ESDIS science data systems engineer. "Not just from a technical perspective, but from administrative, security, and cost control perspectives as well. The confluence of three data-intensive missions—Sentinel-1, SWOT, and NISAR—raised the issues about managing data egress in a cloud environment while staying within budget and remaining faithful to our promise of free and open data."
By laying the groundwork for processing, archiving, and disseminating mission data using the commercial cloud well in advance of the NISAR launch, the efforts of ESDIS, ASF DAAC, and JPL ensure that data users will be well positioned to take full advantage of the tremendous amount of data expected from the ground-breaking NISAR mission and from missions yet to come.