"The other big thing that the COG standard does is reorganize the internal content of the file to basically make it as easy as possible to get as much information as you can in as few requests as possible," said Shiklomanov.
The standard does this by addressing two key issues: One, when downloading from the internet, there is always latency—a time delay—for each trip back and forth to a server to retrieve data. Second, metadata are usually tagged and kept with its associated data. For example, if imagery has multiple bands, the file for each band must be explored to see its metadata. This means it can take a long time identify, get, sort, and select the exact data that you want from a large dataset.
"Cloud optimized GeoTIFFs do things differently by presenting all the metadata up front. So, the first time you go to grab the file, you request all the metadata, which then come in one single block. In a single request you can understand the entire file," said Shiklomanov. "Then, if you want to grab imagery for a location from a particular light band, you just make one additional request and ultimately only pay that time penalty twice."
In addition to allowing users to easily read metadata and download tiles of a COG, the standard has other significant benefits:
- The COG standard supports parallel access to different parts of an image, enabling even faster access to large datasets; this allows users to easily scale up their data access and analysis workflows by simply adding more computing resources
- COGs are backward compatible and tools that work with regular GeoTIFF files still work with COG files
- The COG ecosystem is growing rapidly with many tools, libraries, and services, including the Geospatial Data Abstraction Library (GDAL), QGIS, and ArcGIS, being compatible with the standard
Speaking of ArcGIS, one field embracing the COG standard is geographic information systems (GIS). GeoTIFF, in general, is the preferred format for GIS users because it is a simple, easy way to visualize imagery and raster data; almost any GIS tool can ingest GeoTIFF without error.
"In the most basic sense it is almost a guaranteed 'drag and drop' process versus some of the more complex scientific data file types, which may be supported but may not be translated correctly, projected to display on a map, or have the metadata readily available because of the file structure," said Leah Schwizer, team lead for NASA's Earth Science Data Systems GIS Team (EGIST). "COGs possess efficiencies for doing retrieval, visualization, analysis, embedding, and integration with apps that work in the cloud."
Implementing the COG Standard
Implementation of the COG standard is already in full swing for NASA Earth science data.
"The COG standard aligns very well with where Earthdata is heading: the Earthdata Cloud Evolution," said Dr. Yaxing Wei, lead scientist for NASA's Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC). ORNL DAAC is one of the key data archives implementing the COG standard at NASA. "The standard will help with achieving one of the initiative's main objectives, which is efficient and scalable in-cloud data access and analysis."
For example, users can already search for data products in COG by using the data format search filter in Earthdata Search. COG was recognized as an emerging standard by the NASA Earth Science Data and Information System (ESDIS) Project Standards Coordination Office (ESCO), and COG is one of the formats supported by NASA's Earthdata GIS (EGIS) and the Visualization, Exploration, and Data Analysis (VEDA) project. Many DAACs have started adopting COG as the standard option for all new data in GeoTIFF format, as well.
Wei points out that while COG addresses many important bottlenecks and makes working with GeoTIFF files more efficient, it also has its important considerations. For example, COG is not the best format for all data, such as for multiple-dimensional raster data. In these cases, Zarr and netCDF-4/HDF5 are more suitable formats.
With these considerations in mind, for many users and uses the new COG standard offers great enhancements to Earth science and GIS research and analysis. Through the standard, users will enjoy the benefits of georeferenced data in the cloud that offers them easier pathways to selecting precisely the data they need, efficiency and flexibility in access, in-place cloud analysis, and downloading in a modern format that is being widely adopted yet is still backward compatible with most GIS software.