Summary
These documents (ESDS RFCs 033, 034 and 039) provide a set of recommendations regarding data quality for producers and distributors of Earth science data. The focus of these documents is on collecting and conveying data quality information to end users rather than providing a precise definition of data quality. The recommendations highlight issues regarding capturing, describing and conveying information about the quality of datasets held at NASA's Earth Observing System Data and Information System (EOSDIS) Distributed Active Archive Centers (DAACs).
Data Quality Working Group’s Comprehensive Recommendations for Data Producers and Distributors (ESDS-RFC-033) summarizes the approach and outcomes of the Data Quality Working Group (DQWG), one of NASA’s Earth Science Data System Working Groups (ESDSWG), during 2014-2018, following collecting and analyzing 16 use cases. The document describes the use cases analyzed, the data quality information management lifecycle, and primary focus areas for data quality. It provides a total of 93 recommendations, grouped into 7 categories, as well as proposed solutions and implementation strategies.
High Priority Data Quality Recommendations for Data Producers and Distributors (ESDS-RFC-034) highlights a subset of high-priority recommendations for Earth Science Data and Information System (ESDIS) to plan and coordinate concrete actions to be taken by data producers and distributors. It points to existing potential solutions that can be adopted across EOSDIS and NASA's Earth science research community.
Reuse Readiness Assessment of Data Quality Software Products (ESDS-RFC-039) provides information to EOSDIS and NASA's Earth science research community about the reuse readiness of various data quality software products that were identified by the ESDSWG DQWG as helpful for managing data quality information. This document offers insight into some of the implementation issues that should be considered when planning to adopt software products, as well as a guide for software developers to produce reusable software.
While these documents targeted NASA Earth science data, other organizations may also benefit from the methodology described here and the resulting recommendations for improvement.
Status
Data Quality Working Group’s Comprehensive Recommendations for Data Producers and Distributors (ESDS-RFC-033) was recommended for use in NASA Earth Science Data Systems in September 2019
High Priority Data Quality Recommendations for Data Producers and Distributors (ESDS-RFC-034) was recommended for use in NASA Earth Science Data Systems in April 2019.
Reuse Readiness Assessment of Data Quality Software Products (ESDS-RFC-039) was recommended for use in NASA Earth Science Data Systems in September 2019.
Recommendations Documents | |
User Resources | Solutions Master List |
NASA Earth Science Community Recommendations for Use
Strengths
Recommendations made by the ESDSWG DQWG help data producers and distributors to better capture, describe, and enable the use of data quality information. They also help NASA's ESDIS Project and Earth science programs to identify actions to be taken to encourage and support data producers and distributors to better address data quality challenges. These recommendations are based on real use cases within NASA's Earth science community. They are comprehensive (i.e. total of 93 individual recommendations, consolidated into 12 high-level recommendations) and well-organized into different strategies (i.e. 6 implementation strategies) to facilitate understanding and adoption. The recommendations have been further described and prioritized into what has been termed Prioritized Recommended Implementation Actions (PRIA). In addition to recommendations, the DQWG also identified concrete existing implementation solutions. Reviewers found many of the recommendations are highly relevant to their work and can potentially address the challenges they’ve encountered.
Weaknesses
Data quality is a big challenge to tackle. Though recommendations made by the DQWG cover a wide range of topics and aspects, they still lack concrete example implementation solutions for different types of applications with specific types of Earth science data, e.g. airborne and in-situ observations. There is also a lack of concrete implementation guidance on leveraging the ISO 19115 and 19157 standards for specific types of Earth science data.
Applicability
The Recommendations made by the DQWG are applicable to different types of users (e.g. data producers, distributors, and program managers) in a wide range of areas of NASA's Earth science community. The DQWG purposely generalized the recommendations to ensure the concepts and philosophies behind the recommendations are applicable to the many types of applications and observation systems for Earth science data, some of which may be determined relevant even beyond NASA.
Limitations
Due to limited resources and the broad and complex scope of data quality itself, the DQWG had to limit the scope of its work to help formulate its recommendations in a manner that could be both relevant to NASA Earth science and sustained during the DQWG’s managed periods of study and analysis. For example, the use cases collected and the concrete example implementation solutions identified are limited to the remote sensing and data assimilation missions and projects within NASA. Though the general concepts of the DQWG recommendations apply to other disciplines and observation system, e.g. airborne and in-situ, the concrete implementation solutions and guidance information provided for these communities are limited and will need additional investigations and resources to be more fully addressed.
Reviewer comments
Strengths
“In particular, we support the idea of a science review board/team to advise data producers on quality and usability of the data set as it is being developed. I believe that this would help catch data quality issues at an earlier stage, and would make re-processing less likely and/or less difficult.”
“The [recommendations] document will be very helpful in shaping our total data support architecture and operations. There is a lot here to re-read and the document is an excellent reference for supporting our initiatives in the future. A streamlined version with different examples from different DAACs regarding their implementation strategies would be nice too (Executive Summary with real world DAAC examples of successful approaches etc.). ”
“A lot of these recommendations are already somewhat in place and standardizing them would help make these practices more efficient and clear up any uncertainties between data producers and DAACs”.
“These recommendations highlight the need for improvement in particular areas where we face the most challenges such as DAAC-PI communication, metadata validation, and quality information consolidation. More specific standards should be set between the data producers/curators and archives/distributors once validation tools and other mechanisms are put in place. For instance, what kind of metadata validation will be done prior to the DAACs receiving the data? It would be helpful if the quality information consolidation and data validation processes carried out by the data producers/curators are well-communicated upon delivery of the data to the DAACs.”
Weaknesses
“CF is a problem for field campaigns... CMR standards do not address certain issues for in situ measurements”.
“This may be different for different types of measurements. Some [recommendations deemed] important to passive remote sensing measurements may not be relevant for in-situ measurements”.
"It is clear that the [recommendations] are not applicable to some data producing operations. As an airborne field study data manager, I could not see how to apply many of the LHFs to airborne in-situ measurements. I would like to suggest this document to discuss more general concepts and provide some detailed implementation examples. This will help others to develop similar implementations. It is also clear that the recommendations are relevant to the data sets used to develop the use case.”
“[The term] 'dataset limitations' is definitely data use dependent! Limitation to one type of data use may not matter with the other type of use. The description of "dataset limitation" needs to be related to specific types of data uses. This cannot be stated in generic terms.”
“Someone needs to develop implementation plans for different types of measurements to ISO [metadata] standards.”