Skip to main content

Summary

Zarr is a specification for storage of and access to multi-dimensional array data. Its development targeted data in cloud environments (i.e., in object store), and the specification is optimized for cloud access via capabilities such as metadata consolidation and ability to chunk across any dimension.

Status

Zarr Storage Specification V2 is an approved data format convention for use in NASA Earth Science Data Systems (ESDS).

ESCO RFC ESDS-RFC-048 – Zarr Storage Specification Version 2: Cloud-optimized persistence using Zarr.
DOI https://doi.org/10.5067/DOC/ESCO/ESDS-RFC-048v1
Suggested Citation Newman, D. J. (2024). Zarr storage specification version 2: Cloud-optimized persistence using Zarr. NASA Earth Science Data and Information System Standards Coordination Office. https://doi.org/10.5067/DOC/ESCO/ESDS-RFC-048v1
Specification Zarr Storage Specification Version 2
User Resources

Zarr Homepage Zarr Documentation

NASA Earth Science Community Recommendations for Use

Strengths

Zarr is growing in use and popularity in the Earth science community, especially among Python users, where the popular Xarray library supports reading and writing from Zarr stores. Dask can also easily be used in conjunction with Zarr in parallel computing applications. Active users in the popular open source Pangeo project have also started to leverage Zarr as a data format, typically accessing Zarr via the Xarray library. The Pangeo community is a well-respected and active group of developers and contributors that has made major advances in scientific computing with Big Data in the cloud.

Weaknesses

Zarr is still a relatively new storage specification, rapidly evolving and not inherently for geospatial applications. It is more a generalized specification for n-dimensional data. This convention approval is written with the existing understanding of Zarr version 2, but version 3 is under development. Tangential activities like the GeoZarr extension intend to address issues with metadata consistency and standardization.

Applicability

Zarr can be used for any multi-dimensional Earth science product in the NASA product portfolio. It has been integrated as a capability at NASA's Goddard Earth Sciences Data and Information Services Center (GES DISC) in preparation for an entirely cloud-based version of their popular Giovanni service.

Limitations

Although the Zarr specification is generally well defined, there is no standardization of the metadata contents to align with accepted conventions like CF. This is a capability that the GeoZarr extension is addressing along with resolving the needs of storing GeoTIFF-like datasets as Zarr stores.

For NASA science data systems or data archives there is no clear paradigm for updating and modifying a Zarr data store as new data from ongoing satellite missions become available or as data is retroactively republished.

Also converting historical HDF/netCDF data to Zarr storage will substantially increase the size of the NASA Earth Science archive as the original formats cannot be deleted for provenance and preservation reasons.