Articles | Journal of Emerging Investigators

Assessing and Improving Machine Learning Model Predictions of Polymer Glass Transition Temperatures

Ramprasad et al. | Mar 18, 2020

In this study, the authors test whether providing a larger dataset of glass transition temperatures (T_g) to train the machine-learning platform Polymer Genome would improve its accuracy. Polymer Genome is a machine learning based data-driven informatics platform for polymer property prediction and T_g is one property needed to design new polymers in silico. They found that training the model with their larger, curated dataset improved the algorithm's T_g, providing valuable improvements to this useful platform.

Depression detection in social media text: leveraging machine learning for effective screening

Shin et al. | Mar 25, 2025

Depression affects millions globally, yet identifying symptoms remains challenging. This study explored detecting depression-related patterns in social media texts using natural language processing and machine learning algorithms, including decision trees and random forests. Our findings suggest that analyzing online text activity can serve as a viable method for screening mental disorders, potentially improving diagnosis accuracy by incorporating both physical and psychological indicators.

Optimizing data augmentation to improve machine learning accuracy on endemic frog calls

Anand et al. | Mar 09, 2025

The mountain chain of the Western Ghats on the Indian peninsula, a UNESCO World Heritage site, is home to about 200 frog species, 89 of which are endemic. Distinctive to each frog species, their vocalizations can be used for species recognition. Manually surveying frogs at night during the rain in elephant and big cat forests is difficult, so being able to autonomously record ambient soundscapes and identify species is essential. An effective machine learning (ML) species classifier requires substantial training data from this area. The goal of this study was to assess data augmentation techniques on a dataset of frog vocalizations from this region, which has a minimal number of audio recordings per species. Consequently, enhancing an ML model’s performance with limited data is necessary. We analyzed the effects of four data augmentation techniques (Time Shifting, Noise Injection, Spectral Augmentation, and Test-Time Augmentation) individually and their combined effect on the frog vocalization data and the public environmental sounds dataset (ESC-50). The effect of combined data augmentation techniques improved the model's relative accuracy as the size of the dataset decreased. The combination of all four techniques improved the ML model’s classification accuracy on the frog calls dataset by 94%. This study established a data augmentation approach to maximize the classification accuracy with sparse data of frog call recordings, thereby creating a possibility to build a real-world automated field frog species identifier system. Such a system can significantly help in the conservation of frog species in this vital biodiversity hotspot.

Monitoring drought using explainable statistical machine learning models

Cheung et al. | Oct 28, 2024

Droughts have a wide range of effects, from ecosystems failing and crops dying, to increased illness and decreased water quality. Drought prediction is important because it can help communities, businesses, and governments plan and prepare for these detrimental effects. This study predicts drought conditions by using predictable weather patterns in machine learning models.

Cardiovascular Disease Prediction Using Supervised Ensemble Machine Learning and Shapley Values

Shah et al. | Aug 06, 2024

The authors test the effectiveness of machine learning to predict onset of cardiovascular disease.

Automated classification of nebulae using deep learning & machine learning for enhanced discovery

Nair et al. | Feb 01, 2024

There are believed to be ~20,000 nebulae in the Milky Way Galaxy. However, humans have only cataloged ~1,800 of them even though we have gathered 1.3 million nebula images. Classification of nebulae is important as it helps scientists understand the chemical composition of a nebula which in turn helps them understand the material of the original star. Our research on nebulae classification aims to make the process of classifying new nebulae faster and more accurate using a hybrid of deep learning and machine learning techniques.

Prediction of preclinical Aβ deposit in Alzheimer’s disease mice using EEG and machine learning

Igarashi et al. | Nov 29, 2022

Alzheimer’s disease (AD) is a common disease affecting 6 million people in the U.S., but no cure exists. To create therapy for AD, it is critical to detect amyloid-β protein in the brain at the early stage of AD because the accumulation of amyloid-β over 20 years is believed to cause memory impairment. However, it is difficult to examine amyloid-β in patients’ brains. In this study, we hypothesized that we could accurately predict the presence of amyloid-β using EEG data and machine learning.

Predicting asthma-related emergency department visits and hospitalizations with machine learning techniques

Chatterjee et al. | Oct 25, 2021

Seeking to investigate the effects of ambient pollutants on human respiratory health, here the authors used machine learning to examine asthma in Lost Angeles County, an area with substantial pollution. By using machine learning models and classification techniques, the authors identified that nitrogen dioxide and ozone levels were significantly correlated with asthma hospitalizations. Based on an identified seasonal surge in asthma hospitalizations, the authors suggest future directions to improve machine learning modeling to investigate these relationships.

Advancing pediatric cancer predictions through generative artificial intelligence and machine learning

Yadav et al. | Dec 21, 2024

Pediatric cancers pose unique challenges due to their rarity and distinct biological factors, emphasizing the need for accurate survival prediction to guide treatment. This study integrated generative AI and machine learning, including synthetic data, to analyze 9,184 pediatric cancer patients, identifying age at diagnosis, cancer types, and anatomical sites as significant survival predictors. The findings highlight the potential of AI-driven approaches to improve survival prediction and inform personalized treatment strategies, with broader implications for innovative healthcare applications.

Identifying factors, such as low sleep quality, that predict suicidal thoughts using machine learning

Dong et al. | Apr 30, 2024

Sadly, around 800,000 people die by suicide worldwide each year. Dong and Pearce analyze health survey data to identify associations between suicidal ideation and relevant variables, such as sleep quality, hopelessness, and anxious behavior.

Browse Articles

Assessing and Improving Machine Learning Model Predictions of Polymer Glass Transition Temperatures

Depression detection in social media text: leveraging machine learning for effective screening

Optimizing data augmentation to improve machine learning accuracy on endemic frog calls

Monitoring drought using explainable statistical machine learning models

Cardiovascular Disease Prediction Using Supervised Ensemble Machine Learning and Shapley Values

Automated classification of nebulae using deep learning & machine learning for enhanced discovery

Prediction of preclinical Aβ deposit in Alzheimer’s disease mice using EEG and machine learning

Predicting asthma-related emergency department visits and hospitalizations with machine learning techniques

Advancing pediatric cancer predictions through generative artificial intelligence and machine learning

Identifying factors, such as low sleep quality, that predict suicidal thoughts using machine learning

Search Articles

Popular Tags

Browse Articles

Search Articles

Category

School Level

Popular Tags