IMPACT has released pyQuARC, an automatic assessment tool for Earth observation metadata built to improve metadata quality. This tool was developed as part of the Analysis and Review of the CMR (ARC) project which is tasked with assessing NASA’s metadata records in the Common Metadata Repository (CMR) for correctness, completeness, and consistency. pyQuARC incorporates a robust set of metadata quality criteria developed by the ARC team and is the culmination of the lessons learned with regard to automating metadata quality assessment processes to the greatest extent possible.
Ensuring high-quality metadata records is essential to scientific research as ARC team lead Jeanné le Roux explains:
"Metadata management can have a direct effect on a scientist’s experience in finding, accessing, and using data. Since metadata is the connection point between users and data, metadata that is well maintained helps lower barriers to data use."
High quality metadata that includes a direct data access point and ample contextual information about the data (such as user documentation and compatible software) helps scientists get to the actual science faster rather than spending time hunting for information and resources. Rich metadata also allows for more complex and niche searches across data volumes that are ever increasing.
Metadata focuses attention on important information about the data, such as the date and time captured by a camera when a picture is taken. The metadata, rather than the data itself, is what is indexed by online data catalogs and other applications that connect users to data. Inaccurate metadata can connect users with data that does not, in fact, match their search criteria; incomplete metadata can make data difficult or impossible to find. Given the importance of high quality metadata, it is necessary that metadata be regularly assessed and updated as needed.
pyQuARC is a python code package that streamlines the process of assessing the quality of metadata by performing automated quality checks on metadata. It employs a metadata quality assessment framework which specifies a common set of assessment criteria. In addition to basic validation checks (e.g. adherence to the metadata schema, controlled vocabularies, and link checking), pyQuARC flags opportunities to improve or add contextual metadata information in order to help the user connect to, access, and better understand relevant data products. pyQuARC also ensures that information common to both data product and corresponding file-level metadata are consistent and compatible.