In addition to providing an overview of the water cycle and the application of relevant datasets, MacManus and Martinez showed attendees how to access, read, and subset historical and near real-time data and use these data with other socioeconomic administrative data to identify sites at risk of flooding and calculate the potential number of vulnerable people in each site.
"We wanted to show [workshop attendees] how to use both historical data and near real-time [NRT] data to determine the areas that are most vulnerable to flooding or to identify areas that may need assistance following recent floods to better inform decision-making," said Martinez. "The first thing we did was access historical NASA data through the AWS [Amazon Web Services] cloud. We then used an application programming interface system with NASA Earthdata Download tokens to access the NRT data."
Workshop Benefits
For MacManus, the workshop was noteworthy not only for what the attendees learned, but how.
"We built this online curriculum using Jupyter Notebooks to teach people about data science. As part of that work, we partnered with 2i2c [the International Interactive Computing Collaboration], which provided an interactive computing environment to support the delivery of the NASA Transform to Open Science Training (TOPST) water resources module," said MacManus. "Because of that, over 100 participants were able to participate in the workshop and process large amounts of NASA Earth science data."
TOPST is a NASA initiative designed to teach the data science lifecycle using data from NASA's Earth Sciences Division and to foster an inclusive culture of open science. TOPST is part of the larger NASA TOPS initiative to rapidly transform agencies, organizations, and communities to an inclusive culture of open science. Open science and the provision of unrestricted access to agency Earth science data are cornerstones of NASA Earth Science Data Systems (ESDS) Program operations.
The SEDAC workshop also provided a good opportunity to test how the Jupyter Notebooks performed when used at scale in a virtual workshop environment.
"Working with 2i2c allowed us to provide access to a live computing environment through a link," said Martinez. "We told the attendees, 'Just go to this link,' and it essentially gave them their own virtual computer pre-loaded with the lesson. All they really had to do was press play and follow along."
This approach proved beneficial for those interested in learning how to work with NASA Earth science data in a cloud environment, but don’t yet have the skills to do so.
"We understood that there were participants who did not know how to code but were still interested in learning, or wanted to learn how to use NASA data," Martinez said. "We formatted the lessons almost like a narrative that let them see what the code is, explain what the code is doing, and then witness the results."
The workshop’s computing environment also helped participants meet the challenges of working with large datasets.
"There's always a barrier to working with these data, whether it's people not knowing how to access them or not having the computing capacity to load the datasets," Martinez added. "This was one way we could kind of lower that barrier, even if it's just within a training environment, to let people interact with and visualize data from very large datasets."
Promoting Open Science
Minimizing barriers wasn't just a technical consideration for MacManus and Martinez. As a TOPST training module, the workshop also sought to promote the principles of open science, which seek to eliminate barriers of a different sort.
"In addition to advocating for science that is collaborative and transparent, TOPS emphasizes inclusion and accessibility, meaning that the content of the workshop is presented in a way that it makes sense to different groups of people," MacManus said.
To achieve these aims, MacManus and Martinez posted a YouTube video of the workshop. There is also a Zenodo entry containing meeting artifacts, including English and Spanish translations of the module lessons, and an extensive collection of files and images from the lessons on GitHub (the datasets are not available on GitHub as they are too large).
"Our goal is to provide different avenues and different modes of learning for people," said Martinez. "We want to make sure we can accommodate anyone who is interested in this project and wants to contribute."
MacManus and Martinez also wanted to ensure workshop participants gained a greater awareness of how working with NASA Earth science data in the cloud can aid decision-making before and after a natural disaster.
"I don't think [the participants] left with the knowledge to stand up their own cloud solution and start working with data," MacManus said. "But they certainly left with an understanding of how to access the specific NASA datasets that we were working with and the advantages of using NASA Earth science data in a cloud environment."
Learning Resources