This document nominates the Climate and Forecast (CF) Metadata Conventions for adoption as a NASA Earth Science Data Systems community standard. The CF Metadata Conventions are intended to promote interoperability among data providers, data users, and data services by providing a clear and unambiguous standard for representing geolocations and times of Earth science data, physical quantities that the data represent, and other ancillary information useful in interpreting the data or comparing it with data from other sources.
Status
The Climate and Forecast (CF) Metadata Conventions is an approved standard recommended for use in NASA Earth Science Data Systems in April 2010. At that time, the current version of CF was 1.4. The CF conventions are regularly updated by a robust and well-documented community consensus process. Thus, ESCO recommends using the most current, released version.
|
NASA Earth Science Community Recommendations for Use
Strengths
The CF metadata conventions are used to describe attributes of various in-situ, airborne, and satellite datasets and earth system model outputs. Due to the needs of the different user groups, a wide collection of variable attributes is available in the CF conventions. As stated eloquently by reviewers:
“One of the major strengths of CF conventions is that the attributes are both machine and human readable form”
“CF is very practical, function(al), and to-the-point”
Because of this, interoperability is easily achieved when CF metadata conventions are used. Many open software and commercial data visualization clients are available to make use of the CF attributes from both netCDF and HDF data files. In addition, CF conventions are widely used by the Intergovernmental Panel on Climate Change (IPCC) Assessment Report numerical model outputs across various disciplines. In the words of one reviewer:
“The strength of the CF convention is its consensus approach bridging across several earth system modeling communities. Data sets which adhere to the CF standard and make wide use of its optional attributes contain rather well-structured descriptions of the dataset content which enable flexible distribution of datasets and also ease manual sharing of data, because the description is complete”
A few of the reviewers pointed out that CF allows reading of the axes, coordinate systems, data units, temporal attributes, maxima and minima and therefore more comprehensive than some of the other metadata conventions. Reviewers also commented on the fact that CF provides additional descriptive data that are useful to many users. According to one reviewer:
“Inclusion of these local metadata elements should make the product more self-descriptive, and thus enable users to more easily grasp the import of the product content”
While there were some complaints, almost all of the data providers agreed that the benefits outweigh the challenges leading them to use CF conventions.
Weaknesses
There were two common weaknesses of the CF Metadata conventions mentioned by the reviewers.
- Since version 1.6, CF has added support for “Discrete Sampling Geometries”, which can be leveraged to handle feature data types including point, time series, trajectory, profile, time series profile, and trajectory profile.
- The CF conventions were originally tied to netCDF files and therefore implementing them with HDF files has been a challenge. In particular, there are not many examples and one group mentioned that it took them a fair amount of time to implement these conventions in HDF files.
A few reviewers also mentioned that while the standards names table is comprehensive, the current organization of standard names makes it time consuming to find the right name. Additionally, the units attribute in the CF conventions makes use of the UDUNITS package, which is not described very well and lacks clarity according to two reviewers. Overall, most reviewers found the RFC to be complete, clear and concise for implementation of CF metadata conventions.
Applicability
The CF metadata conventions are widely used across multiple disciplines and as suggested by the reviewers, it is “crucial to promoting interoperability”. Based on the reviews received, the CF conventions are used by many of NASA's Distributed Active Archive Centers (DAACs) with data volumes exceeding many petabytes of earth science data. In particular, one review from a NASA DAAC said the following:
“In general, this specification is useful for our needs, supplying excellent context for the conventions and the philosophy behind them. It is particularly useful for most Level 3 gridded and Level 2 swath data.”
It was also mentioned that the CF metadata provides valuable information about each array in the product and therefore CF compliant metadata may be more valuable to researchers who use the products directly. Additionally, software libraries and tools exist to support netCDF and HDF file creation and modification with the CF conventions. According to one reviewer:
“… many Data Access Protocol (DAP) visualization client tools, especially Java visualization client tools (IDV and Panoply) almost strictly follow CF conventions”
The CF conventions are particularly relevant to NASA Earth Science Datasets. These conventions facilitate interoperability between NASA HDF data and available netCDF visualization tools. One reviewer summarizes this fact as follows:
“It greatly helped us to achieve the interoperability of NASA HDF data with existing netCDF tools”
Another mentioned:
“The CF Conventions are critical to ensure interoperability of data stored in the netCDF file format”
Limitations
While the CF conventions are widely used with a great community support, the RFC reviewers identified important limitations. These are summarized as follows:
- These conventions work very well for two-dimensional data. In the third dimension, however, one is limited to vertical height/ depth/ pressure/ sigma/ density/ temperature or time. Representation of remote sensing data in the third dimension such as a band number is not currently an option, which limits the CF metadata conventions. In addition, one reviewer mentioned that vertical coordinate systems such as those used by remote sensing data sets from vertical sounders are problematic.
- The standards names table has names across various disciplines, but, it is not a complete set and it probably will never be. While this may be an obvious limitation, the user/ data provider groups can request that additional more appropriate names be added to the table to the CF conventions committee. Some reviewers pointed out that linking the CF standard names to existing ontologies could be beneficial.
- The HDF implementation of CF metadata conventions is time consuming and there is a lack of clarity in handling 2-dimensional latitude and longitude arrays.
- Geo-location of data is subject to the following limitations identified by the reviewers: 1. Since there is no way to store projection information using CF conventions, all the latitude and longitude values for all points will have to be stored, greatly increasing the size of the data files, 2. Datum information is not defined in the CF conventions, 3. Providing latitude/ longitude pairs does not indicate whether these define the center of the pixel or one of its corners, and 4. Does not allow for missing values inside the latitude and longitude arrays.
- The CF conventions provide descriptive metadata best suited for data in a collection, but not the collection itself, and therefore may not be sufficient for data discovery and cataloguing.
- The CF conventions document depends on UDUNITS software package to provide canonical units. However, this is poorly explained and leads to confusion when dealing with some measurement units. In addition, some of the units are not available in the UDUNITS package.
- As mentioned earlier, the standard names table could use better organization perhaps in a hierarchical structure with semantic relationships based on an ontological approach.
- Too many attributes are optional and too few data sets contain the attributes that are needed to fully understand the meaning and structure of the data. If major data providers and operational centers would get together and define an additional set of rules based on the CF convention (so that various optional arguments become mandatory in their context) one should have all that is needed.
Despite this long list of limitations, all of the reviewers overwhelmingly recommend that CF should be a NASA Earth science data system standard because the benefits of using these conventions far outweigh limitations. In addition, the CF metadata conventions have a large user base with very good support and tools both inside and outside of NASA.