Skip to main content
image of world map on binary code
image of burst shape over EMIT satellite data
image of unlocked icon on top of data points

Analysis and Review of CMR Project

NASA's Common Metadata Repository (CMR) is the centralized repository for NASA’s Earth science metadata and serves as a key vehicle for data search and discovery. In order to provide a more effective and consistent experience of this repository for researchers, there is a need to improve the quality of the metadata currently in CMR.

The Analysis and Review of CMR (ARC) project helps address this need by conducting quality assessments of NASA’s metadata records in CMR. These records correspond to approximately 8,000 datasets collected from Earth observing satellites, airborne, and in situ instruments. Having high quality metadata records is important since it is the content of these records that are indexed for search on the web, connecting users to data.

This metadata improvement task is an ongoing, collaborative effort between NASA's Distributed Active Archive Centers (DAACs), the CMR team, and the ARC team, which is part of NASA’s Interagency Implementation and Advanced Concepts Team (IMPACT).

The ARC project contributes to Earth science data curation and stewardship activities by conducting metadata quality evaluations of records stored within CMR. The ARC team responsibilities include:

•    Reviewing metadata for quality from both the scientific and user perspectives
•    Identifying opportunities for improvement in the metadata records
•    Working with the data archives to resolve any and all identified issues
•    Developing methods to automate quality evaluation checks, and
•    Developing processes to minimize detected issues in the future.
The ultimate goal of the ARC project is to guarantee that all records currently in CMR, and all future records, will meet a minimum quality requirement. This commitment to quality ensures that data will be consistently accessible and discoverable by users.

The ARC project contributes to Earth science data curation and stewardship activities by conducting metadata quality evaluations of records stored within CMR. The ARC team responsibilities include:

•    Reviewing metadata for quality from both the scientific and user perspectives
•    Identifying opportunities for improvement in the metadata records
•    Working with the data archives to resolve any and all identified issues
•    Developing methods to automate quality evaluation checks, and
•    Developing processes to minimize detected issues in the future.
The ultimate goal of the ARC project is to guarantee that all records currently in CMR, and all future records, will meet a minimum quality requirement. This commitment to quality ensures that data will be consistently accessible and discoverable by users.

Detailed Quality Assessments of all NASA Metadata Records in the CMR

NASA’s collection in the CMR currently comprises approximately 8,000 datasets (or collections). Each collection metadata record, as well as one randomly selected file level metadata record (or granule) per collection, are assessed for quality against a set of quality criteria. These assessments are conducted via a combination of automated and manual methods. ARC identifies opportunities for improvement in each of the records and works with the EOSDIS data providers (i.e., DAACs), who are the stewards of the records, to fix any and all findings identified.

Opportunities for improvement are highlighted in reports which include actionable improvement recommendations. Opportunities for improvement can include:

•    updating information that has become outdated;
•    adding contextual information to a record to make it more informative to general/non-expert users;
•    including links to all resources and tools relevant to a dataset;
•    improving the consistency of content between sets of related records; and
•    adopting newly developed metadata elements which offer benefit to ease of finding, accessing, or understanding how to use the data.

Develop a Metadata Quality Framework

ARC has developed a metadata quality framework to systematically assess metadata records. This framework identifies quality criteria which helps ensure consistent reporting and also provides transparency of the assessment process to the DAACs. This framework also provides a baseline from which to generate quantitative metadata quality metrics to demonstrate improvements. The ARC metadata quality framework is described in detail in the following publication:

Bugbee, K., le Roux, J., Sisco, A., Kaulfus, A., Staton, P., Woods, C., Dixon, V., Lynnes, C. and Ramachandran, R., 2021. "Improving Discovery and Use of NASA’s Earth Observation Data Through Metadata Quality Assessments," Data Science Journal, 20(1), p.17. doi.org/10.5334/dsj-2021-017

Collaboration and Communication

The ARC team has established a collaborative approach for improving metadata quality and contributing to the broader metadata quality community by:

•    collaborating with DAAC metadata curators through the metadata curation dashboard;
•    improving metadata documentation, in collaboration with the CMR team, to make metadata curation easier;
•    sharing relevant code, including automated quality checks; and
•    reporting lessons learned to both NASA's Earth Science Data and Information System (ESDIS) Project and the broader community.

The ARC team also strives to make information about its processes as open and transparent as possible and offers the following resources:

•    specific metadata quality criteria and best practices documentation is openly available on the Earthdata wiki;
•    open source code  for the CMR Metadata Curation Dashboard tool used to facilitate the assessment process and generation of metadata quality reports shared with the DAACs; and
•    open source code  for ARC’s suite of automated metadata quality checks.

Detailed Quality Assessments of all NASA Metadata Records in the CMR

NASA’s collection in the CMR currently comprises approximately 8,000 datasets (or collections). Each collection metadata record, as well as one randomly selected file level metadata record (or granule) per collection, are assessed for quality against a set of quality criteria. These assessments are conducted via a combination of automated and manual methods. ARC identifies opportunities for improvement in each of the records and works with the EOSDIS data providers (i.e., DAACs), who are the stewards of the records, to fix any and all findings identified.

Opportunities for improvement are highlighted in reports which include actionable improvement recommendations. Opportunities for improvement can include:

•    updating information that has become outdated;
•    adding contextual information to a record to make it more informative to general/non-expert users;
•    including links to all resources and tools relevant to a dataset;
•    improving the consistency of content between sets of related records; and
•    adopting newly developed metadata elements which offer benefit to ease of finding, accessing, or understanding how to use the data.

Develop a Metadata Quality Framework

ARC has developed a metadata quality framework to systematically assess metadata records. This framework identifies quality criteria which helps ensure consistent reporting and also provides transparency of the assessment process to the DAACs. This framework also provides a baseline from which to generate quantitative metadata quality metrics to demonstrate improvements. The ARC metadata quality framework is described in detail in the following publication:

Bugbee, K., le Roux, J., Sisco, A., Kaulfus, A., Staton, P., Woods, C., Dixon, V., Lynnes, C. and Ramachandran, R., 2021. "Improving Discovery and Use of NASA’s Earth Observation Data Through Metadata Quality Assessments," Data Science Journal, 20(1), p.17. doi.org/10.5334/dsj-2021-017

Collaboration and Communication

The ARC team has established a collaborative approach for improving metadata quality and contributing to the broader metadata quality community by:

•    collaborating with DAAC metadata curators through the metadata curation dashboard;
•    improving metadata documentation, in collaboration with the CMR team, to make metadata curation easier;
•    sharing relevant code, including automated quality checks; and
•    reporting lessons learned to both NASA's Earth Science Data and Information System (ESDIS) Project and the broader community.

The ARC team also strives to make information about its processes as open and transparent as possible and offers the following resources:

•    specific metadata quality criteria and best practices documentation is openly available on the Earthdata wiki;
•    open source code  for the CMR Metadata Curation Dashboard tool used to facilitate the assessment process and generation of metadata quality reports shared with the DAACs; and
•    open source code  for ARC’s suite of automated metadata quality checks.

The DCD team collaborates with the IMPACT Satellite Needs Working Group (SNWG) project to serve as a resource to other agencies in need of Earth observation data. This includes communicating data needs and concerns expressed from external government agencies regarding NASA Earth science data to the appropriate points of contact within NASA, and connecting agencies with the Earth observation data needed to meet their key objectives.

The DCD team also upholds NASA's commitment to the Geospatial Data Act of 2018 (PDF) by supporting the use of geospatial data standards, working with Geoplatform.gov to make NASA's metadata more discoverable and by collaborating with other agencies to integrate NASA's geospatial data into processes and workflows.

The DCD team collaborates with the IMPACT Satellite Needs Working Group (SNWG) project to serve as a resource to other agencies in need of Earth observation data. This includes communicating data needs and concerns expressed from external government agencies regarding NASA Earth science data to the appropriate points of contact within NASA, and connecting agencies with the Earth observation data needed to meet their key objectives.

The DCD team also upholds NASA's commitment to the Geospatial Data Act of 2018 (PDF) by supporting the use of geospatial data standards, working with Geoplatform.gov to make NASA's metadata more discoverable and by collaborating with other agencies to integrate NASA's geospatial data into processes and workflows.