Of Mice and Machines: Using Machine Learning to Study Space Radiation in Mice

Longtime readers of this blog know IMPACT’s research team explores diverse topics in Earth science. Recently, however, researchers from the machine learning team have expanded their field of interest to collaborate with NASA’s Biological and Physical Sciences (BPS) division. In order to better understand the physiological changes that occur as a consequence of spaceflight, mice are sent to the International Space Station and housed for variable periods of time before their return to Earth.

As part of this effort, NASA GeneLab is an open-source data repository for “omics” data (omics is a multi-disciplinary field encompassing genomics, epigenomics, transcriptomics, proteomics, and metabolomics) from spaceflight and space-relevant biological experiments, while NASA Ames Life Sciences Data Archive (ALSDA) curates and shares the non-omic and environmental data from experiments.

In order to facilitate future algorithm development for space biology, and to encourage standardized benchmarking of methods for specific problems, IMPACT researchers Nishan (Nish) Pantha and Vishal Perekadan leveraged two common types of biological data from GeneLab and ALSDA: gene expression (synthetic RNA transcript data) and microscopy (space radiation immune cell microscopy benchmark dataset). Nish and Vishal presented their research last December at the American Geophysical Union 2022 Fall Meeting in Chicago.

Nish’s work leveraged machine learning (ML) algorithms to identify important genes contributing to the physical characteristics of rodents, while Vishal’s work compared convolutional neural network (CNN) and support vector machine (SVM) approaches for space radiation classification.

Nish and his IMPACT teammates developed an ensemble-based gene-ranking pipeline called GeneRanker that used eXtreme Gradient Boosting (XGboost) based models to identify the most important genes contributing to a target attribute. Vishal’s team’s research used a dataset of ML-ready microscopic imagery of the radiated cells, including labels indicating radiation type and dosage of exposure. These efforts focused on creating multiple benchmark models, starting with simple thresholding techniques then advancing to deep learning for radiation and dosage labeled classification.

Nish predicts his research will help the BPS team establish an initial benchmark for gene expression and could encourage the community to explore other paradigms pertaining to the feature ranking problem. Moreover, bioinformatics scientists will be able to leverage the ranking pipeline and clustering algorithms. These IMPACT teams have developed interpretable feature selection methods and ranking algorithms, particularly using ensembles of tree-based models.

Bar graph showing the most important genes for gender

Most important genes for gender identified using the ranking algorithm

For Nish, this line of research is interesting in multiple ways. First, bioinformatics is a topic he had never explored before. Nish wanted to apply his knowledge of machine learning and data science to solve problems relating to gene identification. Second, he found the dataset interesting because it is high-dimensional (25,000 input gene features) with a low sample size (a 6,000 member balanced dataset that closely matches the original distribution of 112 samples), which is particularly challenging. Finally, he was interested in exploring a method that did not rely on neural networks and could yield a clearer and more explainable approach to problem solving.

Vishal’s research established a benchmark model for radiation type and dosage classification on mouse cell nuclei. In the future, the space radiation community could look into the features used by deep learning models to predict radiation and dosage levels from cell images and compare results with traditional approaches used in radiation biology. BPS intends to publicize the dataset split that the team used through an open registry.

Asked about his interest in the subject, Vishal responded that he had always wanted to apply his machine learning knowledge to biological datasets, making this field of research a perfect fit for his professional and personal interests. He was also curious to know if deep learning-based models would find features for classifying radiation type and dosage that the space radiation community had never considered before.

Test accuracy comparison for different CNN models

Test accuracy of different CNN models on 0 Gy images

In the future, Nish, a graduate student in UAH’s computer science department, plans to extend this research to his Master’s thesis, in which he aims to explore neural network-based feature attribution and selection methodologies. Nish’s team intends to open source the code, while BPS will be open sourcing the dataset.

Similarly, Vishal says all the benchmark models, as well as the data split they used, will be made public through an open registry by BPS. They are also collaborating on a manuscript with the BPS team, which will be submitted to a journal in the near future.

Deep learning approaches can be used for more than detecting or classifying objects and topics of interest; they can also be used to glean meaningful insights from datasets. Nish, Vishal, and their teams collaborated with BPS researchers to establish a benchmark process to extract meaningful information from the mice dataset. We can be certain these benchmarks will be valuable in the future to further understanding of the dataset.

Additional details are available on GitHub and on the scientific poster.

View LinkedIn profiles for Nish and Vishal.

This work was funded by NASA’s Science Mission Directorate’s Open Source Science AI Initiative.

We would like to thank Lauren Sanders and Sylvain Costes from NASA Biological & Physical Sciences for providing training datasets and helpful feedback.

More information about IMPACT can be found at NASA Earthdata and the IMPACT project website.

Of Mice and Machines: Using Machine Learning to Study Space Radiation in Mice

Details

Last Updated

Published

Find Data

By Platform

By Topic

Data Catalog

Data Tools