Skip to main content

BERT-E: An Earth Science-Focused Language Model

The BERT-E project is an effort by NASA’s Machine Learning team to develop an industry-standard language model for Earth science based on transformers.

The machine learning team led by NASA's Interagency Implementation and Advanced Concepts Team (IMPACT) fine-tuned a pre-trained BERT (Bidirectional Encoder Representations from Transformers) model for science called Sci-BERT with an additional layer to create a domain-specific Earth science model called BERT-E.

IMPACT has used BERT-E to develop the GCMD Keyword Recommender, a tool that provides data curators with suggested GCMD keywords using predictions based on sentences and labels in existing dataset descriptions.

Flow chart showing the GCMD Keyword Recommender architecture, with sentences on the left, the BERT-E tool in the center, and resulting labels on the right
Image Caption

The GCMD Keyword Recommender architecture.

Looking forward, the IMPACT machine learning team envisions utilizing BERT-E for other additional tasks such as graph convolutions and satellite/data product recommendations.

A public BERT-E Github repository will be released soon.

Details

Last Updated

Published