NASA makes nearly 8,000 Earth science data products freely and openly available to all users. NASA is committed to accelerating open science by making these key data products easier to discover and use. NASA achieves this goal by providing and maintaining high quality metadata through the Common Metadata Repository (CMR).
Through its Analysis and Review of the CMR Project, NASA's Interagency Implementation and Advanced Concepts Team (IMPACT) conducts metadata quality assessments of CMR records. ARC conducts these assessments using a framework that consists of a series of automated and manual checks of metadata attributes. The assessments are shared with data providers, who work to improve the metadata quality over time.
As a result, IMPACT developed pyQuARC, an open source, Python-based library that automates the ARC assessment framework as much as possible and demonstrates the ARC team's commitment to open science and open source software. The tool reads and evaluates metadata records with a focus on the consistency and robustness of the metadata. pyQuARC also ensures that information common to both the data product and the file-level metadata are consistent and compatible. For example, a check is performed to ensure the spatial extent specified in the data product’s metadata encompasses the cumulative extent of the individual data files.
pyQuARC flags opportunities to improve or add to contextual metadata information in order to help the user connect to relevant data products and frees up human evaluators to make more sophisticated assessments such as whether an abstract accurately describes the data and provides the correct contextual information. A well-documented data product with detailed and sufficient metadata is easier to find, easier to understand, and easier to use. As open source software, pyQuARC can be adapted and customized by a data provider to allow for quality checks that evolve with their needs, including checking metadata not included in CMR.