Review of netCDF version 3 implementation and operational suitability

NASA's Earth Science Data Systems Standards Process Group (SPG) is considering the network Common Data Form version 3 (netCDF classic), for adoption as a community standard. You are invited to review this Requests For Comment (RFC) in the context of your implementation experience with this data format specification and its suitability for operational use. Only answer questions applicable to your experience. Please send completed review to: spg-rfc-011@lists.nasa.gov.

Implementation Experience questions:

  1. (Your background) Describe in a sentence or two your overall implementation experience related to the proposed specification. (e.g., specification implementer, tools developer, data provider, scientific analyst, science user, etc.) Have you directly implemented the netCDF classic format specification or modified a netCDF classic library using the specification? Did you use pre-existing software, and if so, what did you use?

    LASP has some experience with the netCDF. Multiple personnel at LASP are data providers, science analysts, and science users. Nearly all experience with netCDF at LASP involves reading/writing files with the IDL DLM.

  2. (Completeness) Does the specification provide all the detail you need to implement it in software? (e.g., to read or write a data file; to implement or modify the library, a profile or extension; or develop a tool such as a format translator) If not, describe what is missing in the specification.

    Through use of the IDL library, and common semi-generic reader/writer software, we do not access the low-level specification details regularly. Nearly all of the low level details are hidden from general users, however, the addition of a basic checksum algorithm is missing from the specification. A basic test for data file self-consistency prevents netCDF from being an archival-quality data format. An optional compression extension to the specification could be very useful for certain circumstances.

  3. (Accuracy) Do any parts of the specification contain inaccuracies, or internal inconsistencies? If so, please provide details.
  4. (Clarity) Is any part of the specification ambiguous, or poorly explained? If so, please provide details.
  5. (Balance) Does the standard describe the right set of concepts and data types, and enable the appropriate data operations for its intended users? Is this set of concepts and data types an overly broad set (requiring excessive complexity) or narrowly simplistic set?
  6. (Usefulness) How well does this specification meet your information sharing needs? (e.g., does it work well with the data types and data manipulations in your application? Does it properly represent your datasets? What are the pros and cons of this data format?)
  7. (Implementation) What implementation challenges does the proposed standard present? (e.g., does it require advanced processing power, large amounts of memory, complex configuration, etc.? Does it scale to a production environment?)
  8. (Flexibility) In what software environment(s) have you used netCDF classic (e.g., Solaris, Linux, Windows, Mac OS X)? Have you implemented, tested or deployed netCDF classic or packages other than those provided by the original netCDF classic and developers?

    We have used netCDF classic under Linux, Mac, and Solaris.

Operational Suitability questions:

  1. Do you currently use or plan to use netCDF classic in a production setting? What types of applications do you use with netCDF classic? Is netCDF classic applicable to your applications (e.g., Does it work well with the data types and data manipulations in your application?)

    LASP uses netCDF classic in the routine production of science data products for NASA's TIMED-SEE instrument, and also for all science products generated by the AIM spacecraft. TIMED-SEE data processing is performed in IDL.

  2. Why do you choose to use netCDF classic over other data formats for your applications?

    The netCDF classic format is a NASA requirement for the TIMED mission.

  3. Have you or your users encountered any difficulty when using some of the data access or visualization tools (e.g., IDL, GrADS, etc.) on netCDF classic data files? If you have, please provide a brief description of your experience.

    It would be nice if netCDF supported more complex data, such as ragged arrays. We use netCDF files to store data from the AIM mission and the data isn't always in regular 2-D arrays. With that dataset, we end up adding a lot of fill data to comply with netCDF. Probably, we would have been better off with a more flexible format but it was a mission requirement to use netCDF and we didn't run into data complexity issues until late in our development.

  4. Does the netCDF file format meet your requirements for storing and accessing data? (e.g., Can it handle the data types in your applications?)

    The netCDF file format meets the requirements for TIMED-SEE. However, the absence of a checksum capability prevents it from being an acceptable archival format.

  5. What operational challenges or limitations does netCDF classic present? (e.g., Does it take a long time to learn how to use it? Does it require advanced processing power, large amounts of memory, complex configuration, etc)

    Our generic IDL reader software is very flexible, but the penalty is performance and memory usage. Files that exceed 1 GB are difficult to work with due to the flexibility that has been implemented.

  6. What benefits does netCDF classic present? Do the benefits of netCDF classic outweigh the challenges? (e.g., Does it offer the flexibility you want to package the data types in your applications? Does it facilitate interdisciplinary studies?)

    One major benefit is the platform-independence of the format.

  7. How much data do/will you provide or archive in netCDF classic? (number of distinct data products or data sets, total data volume, number of files.)

    All TIMED-SEE data products are stored in netCDF. There are currently 8 daily unique data products plus 6 merged products for TIMED-SEE that are in netCDF. The mission continues to collect data, so we can only estimate the total volume at about 200 GB.

  8. How many users do you have or expect to have for data in netCDF classic, and what is your expected user community?

    We have about 150 registered users of TIMED-SEE data products. It is reasonable to assume that there are at least that many unregistered users, since registration is optional.

  9. (User comments) Any additional comments, observations or criticisms of netCDF classic and the RFC can be provided here.

    It would be nice if there were some way to specify the netCDF content list in one place. For example, PDF uses a table-like object at the end which contains start and stop offset locations for each of it's object entries. By reading that table, you can identify the content and even convey the intended organization or relationships between variables.