netCDF-4/HDF5 File Format
This document nominates the netCDF-4/HDF5 File Format for adoption as a NASA ESDS community standard. It specifies the netCDF-4/HDF5 file format independent of the netCDF I/O libraries designed to read and write netCDF-4/HDF5 data. The netCDF-4/HDF5 file format enables the expansion of the netCDF model, libraries, and machine-independent data format for geoscience data. Together the netCDF interfaces, libraries, and formats support the creation, access, and sharing of scientific data.
With suitable community conventions, the netCDF-4/HDF5 data format can help improve the capability to read and share important scientific data among data providers, data users, and data services.
The ESDS-RFC-022 Technical Working Group (TWG) has conducted a review of ESDS-RFC-022 – “netCDF-4/HDF5 File Format” and reachedt the following conclusion:
That the Standards Process Group should forward ESDS-RFC-022 to NASA Earth Science Division with the recommendation that it be endorsed as a NASA Standard.
The TWG bases its recommendation on positive comments from the Earth Science community, including users from NASA, NOAA and academia, and an analysis of the following factors in a NASA context:
Strengths – netCDF-4 is straightforward to use relative to HDF5, with a lower learning curve. The many tools available enhance ease of use. Use of the HDF5 storage layer in netCDF-4 software provides features for improved performance, such as compression, parallel I/O, relaxed size limits, and the performance benefits of chunking and endianness control.
Weaknesses – Installation and set-up could be improved. HDF5 users point out that more manual intervention is required for installing netCDF-4/HDF5 than for HDF5 alone. NetCDF users point out that for netCDF-4, multiple software libraries must be installed (netCDF, HDF5, possibly other supporting libraries), rather than the one software library required for netCDF-3.
Applicability – netCDF-4 handles many data types and structures needed for Earth science. Those already using HDF tools can access netCDF-4 data using the HDF-5 API rather than netCDF. Those who have not been using HDF tools welcome access to much of the power of HDF via the simpler netCDF API. Community reviewers of the RFC cite many terabytes of data in netCDF-4, with thousands of users.
Limitations– One reviewer noted some internal inconsistencies between the netCDF-4 specification and the DAP library implementation. A more significant issue is that support for Windows users lags significantly behind Linux. This appears to be a problem for both the HDF5 and netCDF-4 software libraries.
NetCDF-4 implements the netCDF classic and enhanced data models using HDF5 as the storage layer. The netCDF-4/HDF5 format specification document references both HDF5 and netCDF Classic RFCs, and describes netCDF-4 in terms of these standards. A complete specification of HDF5 files is provided in ESDS-RFC-007 “HDF5 Data Model, File Format and Library – HDF5 1.6”. The netCDF classic file format is specified in ESDS-RFC-011 “NetCDF Classic and 64-bit Offset File Formats”. This document, therefore, describes the new features available with netCDF-4, and the additional data types and structures in the netCDF-4 enhanced data model.
In addition to specifying the netCDF-4 data format, the RFC provides instructions for writing a netCDF-4 compliant file using the HDF5 application programming interface (API). These instructions, included in the document as Appendix B, will allow HDF5 users to create data files usable by the netCDF community.
We note that netCDF-4 users are encouraged to continue to use the netCDF Classic data model (compatible with netCDF-3 and earlier versions), because most existing netCDF software handles only the classic data model. For more complex data sets, the enhanced data model available with netCDF-4 offers more expressive power such user-defined types and multiple unlimited dimensions.
One reviewer comments: “My sense is that one factor in the success of netcdf version 4 specifically is that the relatively exotic new capabilities (for example, nested user-defined types) are little used. I think it would behoove user communities to identify applications that require some of the new capabilities and preemptively define standards before a rats nest of different idiosyncratic approaches develops. That really is the danger that the new abilities of netcdf-4 poses, in my opinion. Much of the benefits of easily interchangeable data and quick import into analysis programs will evaporate if you just have to spend time decoding the complicated new netcdf-4 user types.”