Much capability (e.g. preprocessing, searching, visualization, etc.) has been developed to support researchers’ use of Earth science data. In this project, existing tools and methods benefit from the STARE-enabled platform. This occurs via a tight integration as has been done, for example, incorporating STARE with the distributed array database SciDB, along with re-gridding functions and fast parallel, geographic intersections (as illustrated in the image on right). This also occurs when current tools are applied to the results of STARE-enabled distributed processing (e.g., fast granule intersection) in a more conventional, but cloud-based, data processing flow. When needed, sidecar files with STARE indexes can be used to bring the benefits of STARE to legacy file formats.
Significance
As a unifying platform, STARE supports conventional processing, analysis, and visualization tools, bringing the opportunity for massively increasing the amount of data researchers can use. In the longer term, as tools evolve to take greater advantage of STARE’s integrative capabilities, researchers can move from the current focus on the expensive low-level manipulation of data files to an ability to interact with Earth science data at a higher level, with query-based declarative tools and user interfaces that favor scientific inquiry rather than data management. STARE helps automate critical spatiotemporal functions while making efficient use of cloud computing, which will help eliminate the need for researchers to devote time, money, and expertise to the redundant transfer of archived data to local systems. The time and effort saved improves scientific quality and the productivity of researchers and reduces the cost-of-entry for others using EOSDIS data resources.
STARE has demonstrated its potential to address challenges associated with the variety and volume of Big Data. It is also adaptive to different compute-storage architectures. The technology will reduce processing time and is poised to flip the notorious 80/20 dilemma plaguing data science endeavors – where 80% of a researcher’s time is spent finding, cleaning, and reorganizing huge amounts of data and 20% is spent on actual data analysis.
Project Accomplishments
Year one:
- Core STARE library and API functions established.
- Many science usability functions, mostly spatial, implemented.
- PySTARE functional for experimental scientific work.
- OPeNDAP Hyrax integration started.
- UCSB snow cover science use case in progress.
Year two:
- Basic cloud services in place. The STARE library and PySTARE API are usable and in relatively stable development.
- OPeNDAP added an initial set of STARE-aware functions and is ready for testing in an infusion environment as is the STAREmaster georeferencing file (sidecar) tool, which is a key component for deployment.
- STAREPandas ready for relatively modest datasets and will be improved for better scalability, including using STAREmaster sidecars.
- A basic set of STARE tutorials created. The project provisioned a JupyterHub with STAREindexed data and tools to aid the development, education, and training of STARE-based integrative analysis techniques.
- STARE tested in science use cases (ongoing).
Publications & Presentations (listed alphabetically)
Project Year One
Bauer, M., Kuo, K. S., Oloso, A. & Rilee, M. L. (2018). “Exploring the Spatio-temporal Connectivity of Blizzard Conditions and Mid-latitude Cyclones: A Template for a Process-based Workflow.” American Geophysical Union (AGU) Fall Meeting, Washington, D.C. Session IN24A-08, 11 December 2018.
Kuo, K.S. et al. (2019). “Best-value Data-intensive Analysis Architecture Deduced Using ‘Geo-lly’ Beans.” Earth Science Information Partners (ESIP) Summer Meeting, Tacoma, WA. 15-19 July 2019.
— (2019). “STARE and data packaging.” ESIP Summer Meeting, Tacoma, WA. 15-19 July 2019.
Kuo, K.S., Yu, H., Pan, Y. & Rilee, M. (2019). “Leveraging STARE for Co-aligned Data Locality with netCDF and Python MPI.” IEEE Geoscience and Remote Sensing Society (IGARSS) Symposium, Yokohama, Japan. Session THP1.PT: Big Data and Machine Learning - New Trends in Remote Sensing I, 1 August 2019.
Rilee, M.L. & Kuo, K.S. (2018). “The Impact on Quality and Uncertainty of Regridding Diverse Earth Science Data for Integrative Analysis.” AGU Fall Meeting, Washington, D.C. Session IN43C-0916, 13 December 2018.
Rilee, M., Kuo, K.S., Frew, J., Griessbaum, N., Gallagher, J. & Neumiller, K. (2019). “STARE Compatibility.” ESIP Summer Meeting, Tacoma, WA. 15-19 July 2019.
Project Year Two
Gallagher, J., Hartnett, E., Rilee, M. & Kuo, K.S. (2020). “STARE Companion Files for NASA Earth Science Data (Vision Paper).” International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL 2020), Seattle, WA, 3-6 November 2020.
Griessbaum, N., Frew, J., Gallagher, J., Rilee, M. & Kuo, K.S. (2020) “Solving Science Use Cases with STARE (Demo Paper).” ACM SIGSPATIAL 2020, Seattle, WA, 3-6 November 2020.
Griessbaum, N., Frew, J., Rilee, M., Kuo, K.S., Gallagher, J. & Neumiller, K. (2020). “STARE data frames for geospatial analysis - a high level STARE interface.” ESIP Winter Meeting, Bethesda, MD, 2-7 January 2020.
Kuo, K.S. & Rilee, M. (2020). “Analytics Optimized Geoscience Data Store with STARE-based Packaging.” 22nd EGU General Assembly, held online 4-8 May 2020.
Kuo, K.S. & Rilee, M.L. (2019). “Supporting Efficient Parallel Processing for Integrative Analysis in Cloud with STARE-based Hierarchical Packaging.” AGU Fall Meeting 2019, San Francisco, CA. Poster: IN11D-0691.
Kuo, K.S., Yu, H., Rilee, M.L., Pan, Y. & Wang, J. (2019). “STARE-based Interactive Analytics for Earth Science Big Data.” AGU Fall Meeting 2019, San Francisco, CA. Poster: IN13B-0717.
Rilee, M., Griessbaum, N., Kuo, K.S., Frew, J. & Wolfe, R. (2020). “STARE-based integrative analysis of diverse data using DASK Parallel Programming. Demo Paper.” ACM SIGSPATIAL, Seattle, WA, 3-6 November 2020 [doi:10.1145/3397536.3422346].
Rilee, M., Kuo, K.W., Frew, J., Gallagher, J., Griessbaum, N., Neumiller, K., & Wolfe, R. (2020). “STARE into the future of geodata integrative analysis.” Earth Science Informatics, accepted.
Rilee, M., Kuo, K.S., Frew, J., Griessbaum, N. & Gallagher, J. (2020). “STARE towards integrative analysis with minimized data wrangling hassle.” IGARSS 2020, virtual symposium. Paper TU2.R7.8, 29 September 2020.
Rilee, M., Kuo, K.S., Gallagher, J., Frew, J., Griessbaum, N., Hartnett, E., Wolfe, R., Heber, G. & Khalsa, S.J. (2020). “STARE-PODS: A Versatile data store leveraging the hdf virtual object layer for compatibility.” ESIP Summer Meeting (virtual), 14-24 July 2020.
Rilee, M., Kuo, K.S., Gallagher, J., Frew, J., Griessbaum, N., Neumiller, K., Wolfe, R., Yu, H. & Clark P. (2019). “STARE for scalable unification of diverse data within Earth, Space, and Planetary Science.” 2019 AGU Fall Meeting, San Francisco, CA. Poster: IN31B-0791.