Review of netCDF version 3 implementation and operational suitability

NASA's Earth Science Data Systems Standards Process Group (SPG) is considering the network Common Data Form version 3 (netCDF classic), for adoption as a community standard. You are invited to review this Requests For Comment (RFC) in the context of your implementation experience with this data format specification and its suitability for operational use. Only answer questions applicable to your experience. Please send completed review to: spg-rfc-011@lists.nasa.gov.

Implementation Experience questions:

  1. (Your background) Describe in a sentence or two your overall implementation experience related to the proposed specification. (e.g., specification implementer, tools developer, data provider, scientific analyst, science user, etc.) Have you directly implemented the netCDF classic format specification or modified a netCDF classic library using the specification? Did you use pre-existing software, and if so, what did you use?

    I use the netcdf libraries both as a scientist performing large-scale data analysis, and as head a group that provides data over the internet (~20Tb) almost all of which is in netcdf format. We also use Ferret and Matlab heavily in-house to access netcdf files.

  2. (Completeness) Does the specification provide all the detail you need to implement it in software? (e.g., to read or write a data file; to implement or modify the library, a profile or extension; or develop a tool such as a format translator) If not, describe what is missing in the specification.

    Yes, I feel it does. This is aided by the fact that there is a reference implementation, and libraries exist for many languages, as well the availability of example programs.

  3. (Accuracy) Do any parts of the specification contain inaccuracies, or internal inconsistencies? If so, please provide details.

    Not that I am aware of.

  4. (Clarity) Is any part of the specification ambiguous, or poorly explained? If so, please provide details.

    I find specification much clearer than most.

  5. (Balance) Does the standard describe the right set of concepts and data types, and enable the appropriate data operations for its intended users? Is this set of concepts and data types an overly broad set (requiring excessive complexity) or narrowly simplistic set?

    There is the saying that you should strive to make everything as simple as possible but no simpler, and the classic netcdf format achieves this quite well. It was designed with a specific set of needs of a community of users, and it meets those needs very well, but with a level of simplicity that makes it usable for both sophisticated and novice users. As can be seen with the new netcdf format (netcdf4), the increase in the number of data types comes at the expense of increasd complexity. Also, netcdf allows for conventions to be defined (such as CF) that provide semantic meaning to the files and increase their usefulness.

  6. (Usefulness) How well does this specification meet your information sharing needs? (e.g., does it work well with the data types and data manipulations in your application? Does it properly represent your datasets? What are the pros and cons of this data format?)

    Netcdf has worked excellently for handling our gridded and time series data. We are strong believers in self-documenting formats, of which netcdf is one, and combined with the ability to do random access on the coordinates, as well as strides and mappings at very rapid speeds, make it an excellent toolkit for actual use. Also, tools like NCO, the OPeNDAP netcdf library, and the THREDDS Data Server take advantage of and extend the features of the netcdf classic format, increasing the usefulness of the format.

  7. (Implementation) What implementation challenges does the proposed standard present? (e.g., does it require advanced processing power, large amounts of memory, complex configuration, etc.? Does it scale to a production environment?)

    We are a 24/7 shop, handling about 20Gb of data a day, all of which ends up in netcdf classic, and we serve about 20Tb online. We have found netcdf scales very well, in both directions, in that it works in our production environment as well as on small computers. If there is one complaint about the traditional format is that there is no built-in compression, so filesizes can be larger than with some competing formats.

  8. (Flexibility) In what software environment(s) have you used netCDF classic (e.g., Solaris, Linux, Windows, Mac OS X)? Have you implemented, tested or deployed netCDF classic or packages other than those provided by the original netCDF classic and developers?

    We have used netcdf in all of these environments, and the fact that netcdf works across many operating systems and computer languages is one of is major attractions. We, or the users we support, access netcdf files in Matlab, R, IDL, Mathematica, Ferret, NCO among others, we have written our own programs for access, and with ASA have developed an ArcGIS plugin for remote access.

Operational Suitability questions:

  1. Do you currently use or plan to use netCDF classic in a production setting? What types of applications do you use with netCDF classic? Is netCDF classic applicable to your applications (e.g., Does it work well with the data types and data manipulations in your application?)

    See answers above. We are a near-real time, operational provider of data. The data services are based on netcdf classic format.

  2. Why do you choose to use netCDF classic over other data formats for your applications?

    Self-documenting, speed, the data types mirror those of our data, portability across operating systems and languages, ease of use, semantic conventions, and software tools that meet both our needs and those of our users.

  3. Have you or your users encountered any difficulty when using some of the data access or visualization tools (e.g., IDL, GrADS, etc.) on netCDF classic data files? If you have, please provide a brief description of your experience.

    Not really. Some users will always have some trouble if they are not very familiar with either the data itself or the program they are using or both, and it is on that level where we have had problems.

  4. Does the netCDF file format meet your requirements for storing and accessing data? (e.g., Can it handle the data types in your applications?)

    It does for most of our data, which are gridded datasets that follow CF conventions. For in-situ data, there are some limitations that may well be worked out with the netcdf4 format, but we find similar limitations in most other solutions for in-situ data.

  5. What operational challenges or limitations does netCDF classic present? (e.g., Does it take a long time to learn how to use it? Does it require advanced processing power, large amounts of memory, complex configuration, etc)

    Given the amount of data we have, fortunately storage is relatively cheap, but the lack of internal compression is an issue. The netcdf4 libraries will/are providing compression for the netcdf3 format.

  6. What benefits does netCDF classic present? Do the benefits of netCDF classic outweigh the challenges? (e.g., Does it offer the flexibility you want to package the data types in your applications? Does it facilitate interdisciplinary studies?)

    We find very little downside to the format. It is widely known and understood in the metocean community. The self-documenting feature makes it a perfect format for sharing files with colleagues and researchers elsewhere.

  7. How much data do/will you provide or archive in netCDF classic? (number of distinct data products or data sets, total data volume, number of files.)

    We have roughly 20Tb of data in netcdf classic format. The total number of datasets is roughly 300, the number of separate files is in the hundreds of thousands.

  8. How many users do you have or expect to have for data in netCDF classic, and what is your expected user community?

    We provide oceanographic data worldwide, mostly data relevant for the study and management of marine resources. While the rate of our data requests vary, we have achieved rates of over one million data requests in a week through our web and web services based data access systems.

  9. (User comments) Any additional comments, observations or criticisms of netCDF classic and the RFC can be provided here.

    I hope it is clear that we are big fans of the netcdf classis format, in particular when used with a well-defined convention that meets the needs of a community. I strongly recommend the adoption of netcdf classic as a NASA standard.