A New Paradigm for Managing Data and Information
By Rachel Hauser
Faced with the daunting task of managing information garnered from the Large Scale Biosphere-Atmosphere Experiment in Amazonia - Ecology (LBA-E), Diane Wickland, manager of NASA's Terrestrial Ecology Program, wanted a tool to facilitate the process that did not result in a large, centralized data management infrastructure. Wickland sought a system receptive both to quick identification and access to rudimentary field data and one that put more responsibility and control in the hands of the scientific investigators collecting this information—without unduly increasing their burden. Paul Kanciruk, a Program Manager at Oak Ridge National Laboratory (ORNL), left the LBA-E meeting thinking about the problem. "At this point all I knew about HTML was how to spell it, but I knew that there must be a way to integrate metadata (data about data) from the investigator's web servers to efficiently create the system Wickland desired," Kanciruk said
At a subsequent meeting, discussions with LBA-E Project Manager, Don Deering, brought to light the merit of investigating a web-based distributed data system. "So, on the drive back from this meeting I stopped for lunch with a colleague, Merilyn Gentry; we did some brainstorming and the architecture for the Mercury system was designed in about an hour. It's really a simple concept."Mercury supplies investigators with a program (the Metadata Editor) that assists them in categorizing their data (defining metadata), automatically gathers metadata from web sites and organizes it into a searchable index at ORNL, and allows users with a web browser to search the ORNL system for data of interest.
The Metadata Editor facilitates the process of creating the information record. Project leads must fill in a minimum number of fields, which ultimately give users the basic facts they need when searching for data. Facilitating this process are a tutorial, lists of descriptive terms used as input for some fields, and a reference page that clearly defines each field.
"We have an editing tool that makes it easy for scientists to provide data. We take all of the programming out of it. We are getting the scientists to document the data as they first acquire it," said Kanciruk. "Typically, providing data sets to the public is a very labor-intensive and expensive endeavor. People at the data centers must process and organize the data as well as create searchable keywords. By distributing the workload, the actual monetary and labor resources are also distributed."
Investigators supplying data sets just need to make their metadata and data available on the World Wide Web. At night, an automated "harvester" travels to each participating web site retrieving metadata, finally depositing it at a central site.
"If a scientist put a data set out tonight, it would be part of the system by the next morning. Each night, after the system has harvested the metadata, the index is rebuilt from scratch. So if the science investigator chooses to make a data set invisible to other users, it is taken out of the index by the next day. This is important because it allows people to pull data from public view if they later realize that there is some problem with it," said Kanciruk.Users searching the central index can search for data in a variety of ways, including free text of data set documentation, and are linked back the web sites of the participating investigators for the full documentation and the data. Each project using Mercury can have its own custom interface, but data from different projects can be searched simultaneously, if their metadata are similar enough.
Mercury uses commercial off-the-shelf software extensively and adds custom-written components when needed. It also supports metadata standards and is interoperable with several international data-sharing initiatives.
An initial test of Mercury began in late Spring 1998. The International Geosphere-Biosphere Programme (IGBP) tested it until September of 1998, when they officially adopted it as their system.
Currently, NASA's LBA Ecology Project and the Earth Observing System Land Validation team have adopted Mercury. While "Mercury" remains the core system name, the LBA groups in Brazil and at Goddard have since re-christened the system "Beija-flor." The new name wonderfully addresses system dynamics in that it translates to "Flower Kisser" or "hummingbird," Brazil's national bird.
"We have been asked by NASA to expand Beija-flor to NASA's Earth Science Information Partners (ESIP) Program," said Kanciruk. "The groups that have used the system have reacted favorably to the system. It has some nice characteristics, it is inexpensive, and creating metadata is straightforward."
"What Beija-flor does well is connect data from dispersed groups. Other systems are designed to run software applications that subset data and run analyses. Beija-flor doesn't overlap these other systems, it complements them," said Kanciruk.
For more information
NASA Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC)
Large-Scale Biosphere-Atmosphere Experiment in Amazonia (LBA-ECO)
|About the remote sensing data used|
|Sensor||Large-Scale Biosphere-Atmosphere Experiment in Amazonia (LBA-E)|
|Parameter||web-based data system
|DAAC||NASA Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC)
Last Updated: Oct 9, 2019 at 4:09 PM EDT