STARE: SpatioTemporal Adaptive-Resolution Encoding to Unify Diverse Earth Science Data for Integrative Analysis
Principal Investigator (PI): Michael Rilee, Rilee Systems Technologies, LLC.
OBJECTIVES: With SpatioTemporal Adaptive-Resolution Encoding (STARE) we address Focus 2.1.3 "Cloud Optimized Preprocessing and Data Transformation." Current Earth Science data processing features large, centralized archives providing exceptional browse and search capabilities used by researchers who identify then download data in file form to local compute/storage resources for preprocessing and integration prior to analysis. This data flow forces end-users to devote scarce resources to support the transfer, storage, and management of archived data, as well as specialist expertise in the various different kinds of data sets of their research domain. We propose to simplify this flow by moving preprocessing activities to the archived data, eliminating the costs of transferring, creating, and maintaining redundant, idiosyncratic local archives, developed by researchers who are generally not archivists, nor the expert producers of the original data. With STARE providing a unifying platform for diverse data models (swath, point, grid), Earth Observing System Data and Information System (EOSDIS) data archives will be able to produce higher-level products made to order for end-user researchers.
METHODS: The critical, new technology we apply is STARE. STARE's spatial component (SC) has descended from the Hierarchical Triangular Mesh (HTM) spherical indexing originally developed for the Sloan Digital Sky Survey, in which storage and computational efficiency was key. The STARE/SC recursively divides the Earth's surface into a set of quad-trees allowing any point on Earth to be identified with a single number. The STARE temporal component (TC) has similar properties. For observations, these STARE indices contain both location and resolution information, promoting efficient data placement on distributed, cloud resources minimizing costly data transport between nodes for operations such as joining, intersecting, (conditional) subsetting, and re-gridding diverse datasets.
STARE automatically co-aligns diverse data in the cloud, placing spatiotemporally close data on the same compute/storage node, for a relatively small cost in metadata. STARE thus allows diverse data to be efficiently integrated for analysis in the cloud, providing a foundation on which existing tools and processing methods can be placed. Much capability, e.g. preprocessing, searching, visualization, etc., has been developed to support researchers' use of Earth Science data. In the course of the proposed work, we will show how existing tools and methods benefit from the STARE-enabled platform. This can be via a tight integration as has been done, for example, incorporating STARE with the distributed array database SciDB, along with re-gridding functions, and fast parallel, geographic intersections. Or current tools may simply be applied to the results of STARE-enabled distributed processing, e.g. fast granule intersection, in a more conventional, but cloud-based, data processing flow.
SIGNIFICANCE: As a unifying platform, STARE supports conventional processing, analysis, and visualization tools, bringing the opportunity for massively increasing the amount of data researchers can use. In the longer term, as tools evolve to take greater advantage of STARE's integrative capabilities, we can move from the current focus on the expensive low-level manipulation of data files to an ability to interact with Earth Science data at a higher level, with query-based declarative tools and user interfaces that favor scientific inquiry rather than data management. At the very least, STARE helps automate critical spatiotemporal functions while making efficient use of cloud computing, promising to eliminate the need for researchers to devote time, money, and expertise to the redundant transfer of archived data to their own, local systems. The time and effort saved improves scientific quality and the productivity of the current researchers and reduces the cost-of-entry for others who might seek value from EOSDIS data resources.
Last Updated: Jun 10, 2019 at 10:30 AM EDT