Review of HDF5 operational readiness:
NASA's Earth Science Data Systems Standards Process Group (SPG) is considering the HDF5 for adoption as a community standard. This is the second review of HDF5, this one focusing on its readiness for operational use. The questions below are provided to guide feedback from data systems, application providers, instrument teams and others. You only need to answer questions applicable to you. Please send comments to email@example.com.
- Describe in a sentence or two your overall experience related to HDF5 (e.g., science data provider, science data systems, software tools developer, and science data user, etc).
I am a software tools developer and science data user. My experience as an end user has been good. My HDF5 experience as a software developer has been frustrating. They are chronically late in meeting their own schedules.
- Do you currently use or plan to use HDF5 in a production setting? What types of applications do you use with HDF5? Is HDF5 applicable to your applications (e.g., Does it work well with the data types and data manipulations in your application?)
I currently use HDF5 data with NCAR graphics and NCAR command language (NCL). I have adapted the netCDF Operators (NCO) to use netCDF4, which employs an HDF5 backend accessible through the standard netCDF API. I have no plans to use the HDF5 API directly. NCO is used extensively in production settings in most climate modeling centers.
- Why do you choose to use HDF5 over other data formats for your applications?
I prefer netCDF for my applications because of its ease of use. I am only developing HDF5 software because netCDF4 is moving to HDF5 for the storage layer. The primary benefit for my purposes is the HDF5 implementation of MPI I/O and packing, both of which netCDF currently lacks.
- Have you or your users encountered any difficulty when using some of the data access or visualization tools (e.g., IDL, GrADS, ..) on HDF-5 data files? If you have, please provide a brief description of your experience.
Nothing too serious.
- Does the performance of HDF5 you have experienced meet your requirements? (e.g., Can it handle the data types in your applications? Does it take a long time to read and write HDF5 files?)
All I know is that the netCDF4-alpha releases (which are based on HDF1.7.x) produce files much more bloated than their netCDF3 counterparts. We have not done extensive benchmarking for read/write times yet.
- What operational challenges or limitations does HDF5 present? (e.g., Does it take a long time to learn how to use it? Does it require advanced processing power, large amounts of memory, complex configuration, etc)
HDF5 has a reputation as being very hard to learn. Because of this, I will probably never try to learn the native HDF5 interface. I suspect most physical scientists will never use the HDF5 API in their models, whereas a good number do use the netCDF API because it is relatively easy to learn.
- What benefits does HDF5 present? Do the benefits of HDF5 outweigh the challenges? (e.g., Does it offer the flexibility you want to package the data types in your applications? Does it facilitate interdisciplinary studies?)
MPI I/O and packing are the main benefits for NCO software development.
- How much data do/will you provide or archive in HDF5? (number of distinct data products or data sets, total data volume, number of files.)
- How many users do you have or expect to have for data in HDF5, and what is your expected user community?
Everyone who uses NCO will produce netCDF4/HDF5 files once their data sources (e.g., climate models) switch to netCDF4. My guess is this will take about five years, and involve about 1000 users.