Skip to main content

Principal Investigator: Nga Chung (NASA's Jet Propulsion Laboratory)

The Cloud-based Data Match-Up Service (CDMS) is a collaborative effort between NASA JPL, the Center for Ocean-Atmospheric Prediction Studies (COAPS), the National Center for Atmospheric Research (NCAR), and Saildrone. CDMS is an extension of the Distributed Oceanographic Match-Up Service (DOMS) which was originally funded by the NASA Advanced Information Systems Technology (AIST) Program. CDMS provides a mechanism for users to input geospatial and temporal references for satellite observations and receive the in situ or satellite observations that are matched to the primary satellite data within selectable temporal and spatial search domains. The CDMS software stack is available via the Apache Science Data Analytics Platform (SDAP).

Project Objectives

  • Deliver a production-ready, near real-time and delayed-mode match-up service in the cloud to address calibration and validation (cal/val) and science use cases
  • Formalize architecture and information model for in situ and satellite data nodes to efficiently onboard additional datasets via NASA's Physical Oceanography Distributed Active Archive Center (PO.DAAC) and remote data provider hosts
  • Demonstrate a range of representative match-up scenarios and validate CDMS colocation results
  • Demonstrate CDMS match-up application programming interface (API) calls and outputs using Jupyter notebooks

There is a need in the oceanographic community for a generalized data collocation capability for satellite and/or in situ observations that is publicly accessible and provides flexibility and reproducibility for use cases ranging from open science to satellite retrieval cal/val. With an exponential increase in the volume of satellite data, the CDMS architecture is designed to be scalable and to leverage the elasticity of the cloud. The differences between remote sensing data at various data processing levels and the heterogeneous nature of in situ data makes developing a generic system challenging, but CDMS is designed with the consideration of making it possible to efficiently onboard new datasets.

CDMS eliminates the need for one-off match-up programs that require satellite and in situ data to be downloaded locally, as the computation occurs in the cloud and supports connectivity to remote data providers via a common set of interfaces and protocols. CDMS exposes a number of HTTPS API endpoints, allowing users to execute match-up requests from their desired programming environment, such as Jupyter notebooks, or from a Swagger user interface (UI). Match-up results are returned in JSON format and stored as a JSON blob with an associated job ID in Apache Cassandra. A separate API endpoint can be invoked with a specified job ID to return match-up results as JSON, a CSV in flat file format, or a NetCDF file employing Group structures.

The CDMS architecture is made up of distributed in situ and satellite data nodes. To leverage the scalability of the cloud, satellite data are split into tiles and stored either in Apache Cassandra, a distributed NoSQL database, or in Amazon Web Services Simple Storage Service (AWS S3). Metadata for each tile, which includes time, spatial bounding box, provenance and summary statistics, are indexed in Apache Solr or Elasticsearch to enable a high-performance spatial temporal lookup. Due to the unique characteristics of in situ data, CDMS stores in situ data as Apache Parquet format in AWS S3 since Apache Parquet has been shown to be most suitable as a cloud-optimized format for in situ data.

Major Accomplishments

  • Redesigned the in situ support capability to store in situ data as Apache Parquet format in AWS S3 and leverage Elasticsearch as a metadata store to keep track of the Parquet schema and partition information
  • Added support for Level 2 satellite-to-in situ and Level 2 to Level 4 satellite-to-satellite data match-up
  • Added support for large match-up requests
  • Performed validation of data match-up results by comparing outputs from CDMS against outputs from one-off match-up algorithm implementations

For More Information

Science Data Analytics Platform (SDAP)

CDMS Notebooks on Github

Publications and Presentations

Chung, N., Huang, T., Tsontos, V., Perez, S., Phyo, W., Kang, J., Kuttruff, R., Smith, S., Lovett, A., Cram, T., Ji, Z., & Sparling, K. (2023). Cloud-based Data Match-Up Service (CDMS). Earth Science Data System Technology Spotlight 2023-05-15.

Chung, N., Huang, T., Tsontos, V., Perez, S., Phyo, W., Kang, J., Kuttruff, R., Smith, S., Lovett, A., Cram, T., Ji, Z., & Sparling, K. (2023). Development of a Cloud-based Data Match-Up Service (CDMS) in Support of Ocean Science Applications. 2023 Earth Science Data System Working Groups Meeting.

Smith, S.R., Bourassa, M.A., Elya, J., Huang, T., Gill, K.M., Greguska, F.R., III, T. Chung, N., Tsontos, V., Holt, B., Cram, T. and Ji, Z. (2022). The Distributed Oceanographic Match-Up Service. In Big Data Analytics in Earth, Atmospheric, and Ocean Sciences (editors T. Huang, T.C. Vance, and C. Lynnes). doi:10.1002/9781119467557.ch11

Phyo, W., Chung, N., Huang, T., Tsontos, V., Perez, S., Rodriguez, J., Kuttruff, R., Smith, S., Gethers, J., Cram, T., Ji, Z., & Sparling, K. (2022). Enhancing the Interoperability and Reusability of In Situ Oceanographic Data Through the Cloud-based Data Match-Up Service (CDMS). 2022 American Geophysical Union Fall Meeting.

Perez, S., Chung, N., Huang, T., Tsontos, V., Phyo, W., Rodriguez, J., Kuttruff, R., Smith, S., Gethers, J., Cram, T., Ji, Z., & Sparling, K. (2022). Demonstration of a Cloud-based Data Match-Up Service (CDMS) in Support of Ocean Science Applications. 2022 American Geophysical Union Fall Meeting.

Chung, N., Cram, T., Smith, S., Tsontos, V., Huang, T., Sparling, K., Perez, S., Phyo, W., Ji, Z., and Kuttruff, R. (2022). Development of a Cloud-based Data Match-Up Service (CDMS) in Support of Ocean Science Applications. OCEANS 2022, Hampton Roads, Hampton Roads, VA, USA, 2022, pp. 1-6. doi:10.1109/OCEANS47191.2022.9977163

Chung, N., Huang, T., Tsontos, V., Perez, S., Phyo, W., Rodriguez, J., Kuttruff, R., Smith, S., Gethers, J., Cram, T., Ji, Z., Sparling, K., & Wang, J. (2022). Cloud-based Data Match-Up Service (CDMS) and AI/ML. 2022 ESIP Summer Meeting.

Chung, N., Huang, T., Tsontos, V., Perez, S., Phyo, W., Rodriguez, J., Kuttruff, R., Smith, S., Gethers, J., Cram, T., Ji, Z., & Sparling, K. (2022). Cloud-based Data Match-Up Service (CDMS). 2022 ESIP Summer Meeting. doi:10.6084/m9.figshare.20293998.v1

Perez, S., Chung, N., Huang, T., Tsontos, V., Phyo, W., Smith, S., McMillan, H., Cram, T., Ji, Z., & Sparling, K. (2022). Development of a Cloud-based Data Match-Up Service (CDMS) in Support of Ocean Science and Applications. 2022 Earth Science Data System Working Groups Meeting. doi:10.6084/m9.figshare.19584085.v1

Chung, N., Huang, T., Tsontos, V., Perez, S., Phyo, W., Garde, J., Smith, S., McMillan, H., Cram, T., Ji, Z., & Sparling, K. (2022). Development of a Cloud-based Data Match-Up Service (CDMS) in Support of Ocean Science Applications. 2022 Ocean Sciences Meeting.

Perez, S., Chung, N., Huang, T., Tsontos, V., Phyo, W., Smith, S., McMillan, H., Cram, T., Ji, Z., & Sparling, K. (2021). Development of a Cloud-based Data Match-Up Service (CDMS) in Support of Ocean Science and Applications. 2021 American Geophysical Union Fall Meeting.

Chung, N., Huang, T., Tsontos, Perez, S., Phyo, W., Smith, S., McMillan, H., Cram, T., Ji, Z., & Sparling, K. (2021). Cloud-based Data Match Up Service. 2021 ESIP Summer Meeting.

Chung, N., Huang, T., Tsontos, Perez, S., Phyo, W., Smith, S., McMillan, H., Cram, T., Ji, Z., & Sparling, K. (2021). Cloud-based Data Match Up Service. 2021 Earth Science Data System Working Groups Meeting.