Like finding a needle in a haystack. Until the 20th century, the search for specific facts or knowledge within a large data collection was a human endeavor; sometimes successful, sometimes frustratingly tedious. This undertaking changed significantly with the ability to digitally collect, store, process, and analyze larger and larger data collections. Today, scientists are searching for ever smaller needles (granules of data) in ever-larger haystacks (Big Data collections).
SpaceML: Rise of the Machine (Learning)
Global satellite imagery is one such haystack. A sensor such as NASA’s Moderate Resolution Imaging Spectroradiometer (MODIS) aboard NASA’s Terra and Aqua satellites observes the entire globe – an area of approximately 197 million square miles – every one-to-two days. More than 20 years of MODIS global imagery is available through NASA’s Global Imagery Browse Services (GIBS) for interactive exploration using the NASA Worldview visualization application. MODIS imagery is an excellent resource for a broad look at planetary change or for tracking the movement of a hurricane or other natural event as it occurs over several days.
But what about the needles within this imagery haystack? What about searching for sand dunes of a particular shape to explore changes in desert wind direction? Or differences in the orientation of cloud streets to study atmospheric circulation? For a human, searching for these needles in years of MODIS or other sensor imagery is a time-consuming (and budget-consuming) process that can delay scientific exploration.
One solution is to let machines do the work. This is the goal of a global citizen science initiative called SpaceML, which applies artificial intelligence (AI) and machine learning (ML) to space science and exploration. Undertaken by the NASA-supported Frontier Development Lab (FDL), SpaceML is showing the possibilities of applying AI and ML to open science using open-source software. One SpaceML project, the NASA GIBS Worldview Similarity Search, employs AI and ML to create an imagery search pipeline for the discovery of patterns within GIBS imagery viewed using Worldview.
“This is the best of both worlds,” says Ryan Boller, the GIBS/Worldview product owner at NASA’s Earth Science Data and Information System (ESDIS) Project. “You see as a human what is interesting, then you let the AI take over for the heavy lifting. I think of it as finding a lot more needles in a haystack than you ever could do practically.”
Deus ex machina
Machine learning is a type of artificial intelligence in which algorithms learn relationships between input data and output results. For rapidly and efficiently discovering patterns and trends hidden within vast quantities of data, a machine is the perfect tool for catching minutiae that easily can be missed by human analysts.
For example, let’s say you want to find images of islands in Worldview. During a recent demonstration of the GIBS/Worldview imagery pipeline, a machine was trained to search for islands through five million tiles of Earth imagery starting with a single seed image of an island. Approximately 1,000 islands were identified in just 52 minutes. If done manually, this effort would take an estimated 7,000 hours (assuming five seconds to evaluate and label each image tile) and potentially cost as much as $105,000 (assuming $15 per hour).
“Once you have the ability for a network to start understanding what it’s looking at, it can start doing marvelous things and really become a tool for science and a tool for discovery,” says FDL Director James Parr. “AI is incredibly liberating for a scientist and removes the labor-intensive aspect of scientific discovery. This is an area in which machine learning can make a massive amount of difference.”
The FDL was created as an initiative through NASA’s Office of the Chief Technologist, and is an applied research accelerator based at NASA’s Ames Research Center in Silicon Valley, CA. Through internal NASA collaborations as well as collaborations with academia and Silicon Valley companies, the FDL works to further NASA AI efforts. Parr describes SpaceML as a democratization of machine learning to make it available to the entire scientific community through the use of open-source tools and components. SpaceML, as Parr describes it, is about figuring out how to do broad-scale AI deployment.
"AI is like a telescope for data -- a datascope," Parr says. "This is what AI is enabling, especially when you help networks understand the science. This is also why AI needs to be open-source; we believe in this extremely strongly. We're making everything open so others can follow and improve on our work."
The SpaceML initiative began in the summer of 2020 as an extension of the FDL. Its proposal was simple: Provide opportunities for citizen scientists to get involved in efforts to apply ML to NASA data. The NASA GIBS Worldview Similarity Search is one of seven SpaceML projects applying AI and ML to NASA data in efforts ranging from Earth science to space weather.
Anirudh Koul, an AI scientist at Pinterest and the FDL machine learning lead who is coordinating the GIBS Worldview Similarity Search project, describes four fundamental guides for SpaceML:
- All project code must be open source;
- All developed projects must be in readily deployable quality after completion;
- Whatever is produced must be easy to use without AI or programming knowledge and without barriers to adoption; and
- The final product must be interdisciplinary and able to be widely applied.
“Our goal is high-quality open-source [code and products],” Koul says. “This work should be the benchmark for other open-source efforts to look up to.”
The provision of open data and open-source software to enable open science is also a primary objective of NASA’s Earth Science Data Systems (ESDS) Program. NASA’s Interagency Implementation and Advanced Concepts Team (IMPACT), an ESDS component, works to maximize the scientific return of NASA’s missions and experiments for scientists, decision makers, and society. One key IMPACT focus area is exploring AI and ML partnerships in order to utilize these techniques in novel applications to address challenges in data discovery, access, and use. After seeing an early demonstration of the GIBS/Worldview imagery search pipeline, representatives from IMPACT realized the effort brought viable open science and open-source solutions to problems IMPACT had been working to address.
“This is not the way NASA normally does innovation,” says Rahul Ramachandran, the NASA IMPACT manager in a SpaceML video demonstration. “This was a sprint with a global team working on a problem. This effort has gone way beyond any of our expectations.”
Making ML magic
The GIBS Worldview Similarity Search project involves close to 30 developers working in teams at locations including Singapore, Korea, Germany, India, Canada, Mexico, and the U.S. Team members are mainly high school graduates and undergraduate computer science majors, along with at least two teachers transitioning their careers to computer science. Mentors from Silicon Valley powerhouses such as Twitter, NVIDIA, Netflix, Pinterest, and Square also are supporting individual teams.
Koul describes the GIBS/ Worldview imagery pipeline as an “integration story.” Each team conducts research and development for one tool to handle a specific aspect of Machine Learning Operations, or MLOps. There’s the GIBS Downloader, an elegant single line of code developed by students at Carnegie Mellon University and the University of California, Berkeley. There’s the Swipe Labeler tool developed in Sacramento, CA, and at Anna University in Chennai, India, which makes labeling a large collection of images as easy as swiping left (not applicable) or right (applicable). A tool to remove clouds from imagery viewed in Worldview was developed by students at Ludwig Maximilian University in Munich, Germany, and at the University of Virginia. These individual tools are integrated to form a single pipeline.
A key element of the pipeline, and in machine learning, is Self-Supervised Learning (SSL). SSL is a subfield of machine learning focused on developing representations of images without labels, such as daily global imagery. It is useful for reverse image searching as well as image categorization and filtering, especially when it would be impractical to have a human inspect each image due to cost or time.
In SSL, a machine is provided a representative selection of labeled data, such as a distinctive cloud pattern, and tasked to search through a collection of unlabeled images to find images matching the relative properties of the original image. On initial runs, the machine may identify images of similar cloud patterns, but also might identify objects that look similar to the labeled cloud pattern, such as a snow-covered mountain or sea ice. Images that are identified correctly are put into a model store, which the machine draws on to improve its ability to find similar images with greater accuracy. As the collection of labeled data grows, the performance of the model improves. Eventually, the machine is supervising its own learning and correctly identifying a majority of images sought by the investigator. This unsupervised learning is what FDL's Parr calls the "secret sauce" behind this effort.
An important aspect of the imagery pipeline is that the individual tools are designed to be used in other applications. As Koul points out, someone interested in flooding could use the GIBS Downloader or the Self-Supervised Learner (SSL) and then put their own tools on top for acquiring satellite data for floods. The Swipe Labeler tool, for example, can be used on any image dataset to significantly reduce labeling efforts by swiping images right or left based on whether a dataset image is relevant or not.
“We onboarded someone with a keen interest in exploring galaxies,” says Koul. “It took them just two days to adopt our packages to work with imagery from NASA’s Hubble Space Telescope. The code was there, we just needed to join the right parts of the pipeline. The point being that this work is interdisciplinary and can be widely applied.”
To GitHub – and beyond
As Worldview product owner Boller points out, the SpaceML work developing the GIBS/Worldview imagery pipeline adds a new dimension to how the GIBS/Worldview system can be used. Exploration on Worldview was designed to be fully human-driven in knowing what to look for and where to look, whether the search was for an ongoing hurricane or an erupting volcano. By being able to combine AI with a human in the loop, Worldview users can quickly search for a particular variable – from the shape of a sand dune to the extent of sea ice – in more than 20 years of global satellite imagery.
“I’m just blown away with what’s possible in what they’ve done,” Boller says. “You can immediately see the appeal of how science users could really take advantage of this and how it lowers the barriers to finding interesting phenomena on a large scale.”
As the project tools mature, they are being put on the GitHub code hosting platform for use by anyone. Koul expects all tools to be openly available on GitHub by the end of June. From there, Koul notes, the sky is the limit.
“I encourage my mentees to open-source high-quality, tested code that everyone else can build upon,” he says. “If we do that, then our multiplicative impact stretches everyone’s outcomes further than it ever could go if they did it alone. Open collaboration moves technology faster than a Saturn V rocket.”
FDL Director Parr also sees an exciting future for the application of open-source AI and ML efforts to openly available NASA data.
“Unsupervised [machine] learning is still magic,” he says. “Once you have the ability for a network to start understanding what it’s looking at, it can then start doing marvelous things and really become a tool for science and a tool for discovery. This is where we want to get to – can AI help win a Nobel Prize?”
NASA GIBS Worldview Similarity Search technical contributors include Surya Ambardar, Aaron Banze, Esther Cao, Dharini Chandrasekaran, Sarah Chen, Daniela Fragoso, Siddha Ganju, Erin Gao, Rajeev Godse, Nathan Hilton, Meher Anand Kasam, Mandeep Khokhar, Suhas Kotha, Anirudh Koul, Ajay Krishnan, Yujeong Zoe Lee, Mike Levy, Fernando Lisboa, Subhiksha Muthukrishnan, Tarun Narayanan, Deep Patel, Stefan Pessolano, Jenessa Peterson, Satyarth Praveen, Kai Priester, Sumanth Ramesh, Navya Reddy Sandadi, Leo Silverberg, Abhigya Sodani, Walker Stevens, Sherin Thomas, Rudy Venguswamy, Shivam Verma, and Udara Weerasinghe.