NASA Earth Exchange: Improving Access to Large-Scale Data Analytics Infrastructure
Ramakrishna Nemani - PI, NASA's Ames Research Center
The overall goal of the proposed project is to enhance access and analysis of very large datasets by Earth science researchers and scientists using NASA's Earth Exchange (NEX), (Nemani et al., 2011). NEX is a collaborative platform that brings together state-of-the-art computing facility with large volumes (hundreds of terabytes) of NASA satellite and climate data as well as number of ecosystem and climate models. NEX facilitates end-to-end execution of Earth science research projects complete with data acquisition, process executions and result sharing. The component that will be the core of the NEX analytics services is SciDB - an advanced analytics platform that provides massively scalable complex analytics capabilities with data versioning to support the needs of scientific applications. SciDB is an open source software platform that runs on a grid of commodity hardware or in a cloud. Data on NEX is currently organized using spatio-temporal schemas in relational and NoSQL datastores that enable NEX users to search for data based on number of different criteria. Currently mostly metadata are stored and users have to use existing tools and utilities to access and process the information within the individual datasets, which is at time limiting especially during the exploratory phase of scientific research. In order to ease adding new datasets to SciDB, we will first develop a set of loaders that will facilitate the import and transformation of the NEX-held datasets into the internal SciDB structures. Because there are number of similarities in the formats of Earth science datasets on NEX, there will be a high degree of reuse of the loaders. We will work with the NEX science team and the NEX User Working Group to prioritize datasets and the loaders that will need to be initially developed. Because we want to integrate not only full datasets, but also information such as features and anomalies that are being currently produced by number of existing NEX projects, we propose to develop a set of tools that will ease the integration of such results with the analytics infrastructure. This will greatly enhance the scientists' ability to go beyond their existing datasets and readily corroborate results within multi-mission, multi-instrument global datasets. Finally, we will capture the analytics processes and queries in the existing NEX workflow and provenance system, which further enhances the process reuse, as well as traceability and visualization of results. The proposed integration effort will greatly enhance data analysis capabilities and services to the current and future research on NEX.
Last Updated: Oct 19, 2018 at 2:47 PM EDT