ESDS Program

GeoWeaver: Building An Open-Source Platform for Enabling Ad Hoc Management, Open Sharing, and Robust Reuse of NASA Earth Data-Driven Hybrid AI Workflows

Principal Investigator: Ziheng Sun (George Mason University)

GeoWeaver is a lightweight workflow management system that aims to help Earth scientists easily integrate Python code and Shell scripts, allow them to leverage all available computing resources, create FAIRable workflows in an intuitive interface, and share them in whole-package form with users. It is a major productivity tool for keeping permanent records of every execution to avoid wasted time or duplicated investigation work. It is also an urgently-needed collaboration tool that can seamlessly connect community members that have various technical backgrounds.

Project Objectives

  • GeoWeaver will make artificial intelligence (AI) workflows Findable, Accessible, Interoperable, and Reusable (FAIR) by linking all the processes and scripts into an intuitive sharable pipeline
  • GeoWeaver will boost the overall productivity of the entire Earth science community, especially to meet the AI research agenda by avoiding duplicated investigation work via standardized comprehensive knowledge transfer among researchers

GeoWeaver will make Earth scientific workflows tangible so beginners will be on-board at lightning speed and all users will find it easier to understand the existing work and complete the transition from learners to contributors in a shorter period of time (GeoWeaver Java Repository).

Community Multiscale Air Quality (CMAQ) AI operation site. We spent the primary portion of the project developing GeoWeaver to address the technical issues and feature requests from users, and also developed pygeoweaver, which is required by the majority of the community members who are using Python. Right now, the software is running smoothly for CMAQ AI and other AI workflows such as severe weather event (SWE) forecasting and ocean eddy detection.

The key features of GeoWeaver include:

  • Executing process on any chosen hosts, either locally or remotely
  • Recording the history of both code and logs of every execution automatically
  • Decoupling workflows from datasets and computing platforms to keep them clean, safe, and portable
  • Wrapping everything into a simple understandable zip file that can be shared easily via channels like Slack, email, or social media
  • Condensing thousands lines of code into an intuitive graph and allowing users to browse and edit on the same view
  • Maintaining confidence by scientists by knowing that their work will be always in a permanent safe place and won’t disappear if remote servers go away
  • Synchronizing team members on the same page where they can be allocated various processes while working on the same workflow/project

The team also has dedicated a great amount of effort in outreach by sponsoring Earth Science Information Partners (ESIP) mini grants and working with domain scientists to recognize the challenges of FAIR Earth AI workflows and promoting the use of GeoWeaver. One project has already used GeoWeaver to create their snow workflow and presented it at the American Geophysical Union (AGU). We will continue to work closely with them and hope we can leverage the advantages of GeoWeaver to turn the small prototype projects into reliably running sites offering continuous operational snow products. We are also attending the Openscape cohort and hope to connect with NASA scientists to adopt GeoWeaver to make their workflows FAIR and tangible in both high performance computing (HPC) and in the cloud, and to enable them to benefit from the features our AI workflows are constantly relying on now.

We have worked closely with NASA's Earth Science Data System Working Group (ESDSWG) and the ESIP machine learning cluster to share our experiences. The paper A Review of Earth Artificial Intelligence has been drafted to blow away the fog surrounding Earth AI by covering all the major spheres in the Earth system to overview the representative AI research. The paper has been one of the most popular papers of the journal Computers & Geosciences.

Major Accomplishments

  • Released GeoWeaver 1.0.0-rc10 and ready for use
  • Released pygeoweaver 0.6.6 and ready for pip install pygeoweaver
  • Developed CMAQ AI operational workflow and running this workflow daily

For More Information

GeoWeaver Python Repository

GeoWeaver online demo site

Publications and Presentations

Sidik, S. (2023). Welcome to a New Era in Geosciences Data Management. EOS. doi:10.1029/2023EO230121

Sun, Z., Sandoval, L., Crystal-Ornelas, R., Mousavi, S.M., Wang, J., Lin, C., Cristea, N., et al. (2022). A review of earth artificial intelligence. Computers & Geosciences. 105034. doi:10.1016/j.cageo.2022.105034

Ahmed, A., Sun, Z., & Tong, D. (2022). Evaluating Machine Learning and Remote Sensing in Monitoring NO2 Emission of Power Plants. Remote Sensing, 14(3): 729. doi:10.3390/rs14030729

Suresh,, K.A. Sun, Z., & Xiaogang Ma, X. (2022). Geoweaver_cwl: Translate workflows from Geoweaver into Common Workflow Language (CWL) to improve Interoperability. In AGU Fall Meeting Abstracts, vol. 2022, pp. IN22B-0311.

Sun, Z., Nicoleta, S., Cristea, C., Yang, K., Alnuaim, A., Bikshapathireddy, L.C.G., John, A., Pflug, J., et al. (2022). Making Machine Learning-based Snow Water Equivalent Forecasting Research Productive and Reusable by GeoWeaver. In AGU Fall Meeting Abstracts, vol. 2022, pp. IN23A-04.

Last Updated