The fidelity of geoscientific model results are increasingly evaluated by comparison to products derived from NASA satellite measurements. The satellite data are archived in HDF-EOS format, which is now a superset of the netCDF format employed by most geoscientific models. Putting NASA-generated (HDF-EOS) data and model-generated (netCDF) data on a common grid, in the same format, for numerical comparison can be arduous because of data format incompatibilities. Moreover, some analysis tools for netCDF data have no counterparts or equivalents for HDF-EOS data. Many researchers desire a common toolkit for both HDF-EOS and netCDF data would
1. simplify and accelerate the independent analysis of both data formats (HDF-EOS and netCDF),
2. exploit the strengths of netCDF's underlying HDF data format with easy-to-use netCDF tools,
3. ease evaluations of model predictions (in netCDF format) by NASA-generated data (in HDF-EOS format).
The primary purpose of this project will simplify the workflow involved in intercomparing HDF-EOS format data to model results in netCDF format. It will do so in a user-friendly and transparent way, by improving the netCDF Operators (NCO) which are robust components of the scientific data analysis software stack already employed at most Earth science modeling centers. The key NCO improvement will be to support group hierarchies. Groups are nestable namespaces that allow for hierarchical storage (the ``H'' in HDF). Utilizing groups to store ensembles of observations and predictions would vastly simplify and accelerate the characterization, evaluation, and intercomparison of multiple geophysical observations and simulations.
Until now, this has been impossible since NCO supports only ``flat'' datasets. The proof of this claim will be demonstrated by applying the improved NCO to a prototypical, NASA-relevant, Earth System Science research problem: to characterize, evaluate, and intercompare Earth System Model-simulated and NASA-retrieved snow cover and albedo trends and variability in the CMIP5 models to be used in the IPCC AR5 climate assessment. NCO is a robust element of the scientific software stack used by the community of Earth Science researchers inside and outside of NASA for over fifteen years. Researchers worldwide employ NCO's user-friendly commands, honed through years of open source, developer-user feedback, to process terascale model datasets (often in preparation for comparison to HDF data). However, there is not yet an NCO-equivalent for processing HDF-EOS data. This is partly because NCO does not yet understand all the powerful HDF capabilities now accessible through netCDF API.
The project will remediate much though not all of this deficiency. The primary outcome will be the applicability of NCO to ever-increasing sets of HDF-EOS data, and of netCDF data, that utilize groups to organize and contain data. The proposed work directly responds to the ACCESS call to increase use of EOS data by the climate modeling community analyzing and evaluating the CMIP5 simulations. Moreover, the improved NCO capabilities will apply to all geophysical data archived in HDF-EOS and netCDF formats. The significance of the proposed work is expected to be greatest for applied science researchers wishing to more fully exploit NASA data to evaluate model simulations. The PI is a long-standing climate modeler, software developer, and NASA-funded researcher who understands many of the barriers to model evaluation and who has developed, in the form of NCO, an elegant solution to some of them. The PI participates in the relevant geoscientific communities, including as a reviewer for the ESDS Standards Process Group, for the last two IPCC climate assessments, and in the development of ESG-supported models such as the community Earth System Model.
Charlie Zender , PI, University Of California, Irvine
Deployed at GES DISC