Summary
This document nominates the NetCDF File Format document [1] for adoption as a NASA Earth Science Data System (ESDS) community standard. The NetCDF File Format document specifies netCDF file format variants in a way that is independent of I/O libraries designed to read and write netCDF data. The purpose of netCDF is to provide a data model, software libraries, and machine-independent data format for geoscience data. Together, the netCDF interfaces, libraries, and format support the creation, access, and sharing of scientific data.
With suitable community conventions, netCDF can help improve interoperability among data providers, data users, and data services.
Status
The NetCDF classic and 64-bit Offset File Formats is an approved standard recommended for use in ESDS in August 2009.
|
NASA Earth Science Community Recommendations for Use
Strengths
A major strength of netCDF classic according to the reviewers was that it has fostered data interoperability and exchange through its self-describing file format, platform independent architecture, and robust access methods. Its overall file format and metadata attributes were simple enough to be easily understood and applied yet robust enough to describe and store compound multidimensional data types (i.e., float vs. integer array data type) in the same file. For example, a pertinent comment included this statement:
"Using netCDF has huge interoperability and data exchange benefits. Having a de facto standard in a scientific field saves a truly enormous amount of time when trying to exchange data, work with others, or just get things done. The self-describing ability of netCDF files makes it much less likely for important information to be separated from the file...."
Although the request for comments (RFC) was specifically written to evaluate the file format specification, most of the responses were based on experiences accessing and managing netCDF classic data or building tools for these data using existing libraries. Only about 3-4 reviewers directly implemented the specification, but of this group all indicated that this RFC was technically sound. For example, one such reviewer stated:
"If you wanted to write a native python library, for example, to read netCDF3, this is exactly what you'd want, so it's very useful."
Furthermore, even among those reviewers who obviously did not implement the specification directly many commented positively on the clarity of the RFC. For example, here are two independent comments:
"No part of the specification is ambiguous or poorly explained.
"The specification is clear."
Weaknesses
The primary weakness weaknesses identified by the community were that 1) netCDF classic does not support internal compression of data variables and, 2) there is some limitation on the size of arrays (about 2 GB) and well as 3) there is no support for 64 bit integers. On the whole, the reviewers found very few weaknesses or limitations of netCDF classic; however, a few reviewers expressed interest in and support for the emerging netCDF 4 specification with added benefits of internal compression, handling of large sized files and flexible data model.One reviewer commented:
"Your timing for this seems strange, considering that netcdf-4 has now been released. How will your efforts interact with that? I'm surprised you've not asked any questions about whether people are intending to move wholesale into version 4 or will be sticking with classic format. That certainly seems relevant."
Applicability
The netCDF file format is a popular format for distributing satellite products and other Earth science data. For instance the upcoming Ocean Surface Topography Mission (OSTM)/Jason-2 will distribute products in this format. Based on the responses of the reviewers, the volumes of data distributed in this format run into the tens of terabytes coming from both past and existing NASA/NOAA missions and one reviewer stressed that netCDF classic still has a long lifetime:
"We will support netCDF3/classic with our data server for a long, long time."
It has a number of APIs, libraries and tools for accessing the data in a straightforward manner (APIs are simpler and more straightforward than those for HDF). The reviewers were nearly unanimous in their positive experiences with a typical response on the usefulness of netCDF:
“Simple API. Wide support. Free as in beer and speech. The benefits of this combination are not surpassed by any other geoscience data format.”
Many reviewers commented on the wide variety of third party applications and tools such as MATLAB, Interactive Data Language (IDL), Ferret, ncview etc. that can be used to interrogate or visualize data in netCDF files, and the ease with which such interactions can be made. Only one reviewer out of twenty complained about tools and services to access netCDF files (that reviewer appeared to prefer GeoTIFF).
Limitations
As noted in the Weaknesses section, there are some limitations on the file sizes with netCDF classic. The absence of internal compression limits their usefulness for large data archives. Some reviewers also stressed that benefits of interoperability and self-description were tightly coupled to the availability of Climate Forecast metadata in the file:
"Note that most of the benefits confer only when CF-1 is also added. Although in theory, CF-1 can apply to other formats, in reality it is implemented fully only in netCDF. However, this allows data to be easily imported and viewed into a number of very powerful analysis and visualization tools."