The authors compare the predicted versus actual geyser eruption times for the Old Faithful and Beehive Geysers at Yellowstone National Park.
Read More...Evaluating the predicted eruption times of geysers in Yellowstone National Park
The authors compare the predicted versus actual geyser eruption times for the Old Faithful and Beehive Geysers at Yellowstone National Park.
Read More...An explainable model for content moderation
The authors looked at the ability of machine learning algorithms to interpret language given their increasing use in moderating content on social media. Using an explainable model they were able to achieve 81% accuracy in detecting fake vs. real news based on language of posts alone.
Read More...An analysis of junior rower performance and how it is affected by rower's features
In this study, with consideration for the increasing participation of high school students in indoor rowing, the authors analyzed World Indoor Rowing Championship data. Statistical analysis revealed two key features that can determine the performance of a rower as well as increasing competitiveness in nearly all categories considered. They conclude by offering a 2000-meter ergometer time distribution that can help junior rowers assess their current performance relative to the world competition.
Read More...A comparative analysis of machine learning approaches to predict brain tumors using MRI
The authors use machine learning on MRI images of brain tissue to predict tumor onset as an avenue for early detection of brain cancer.
Read More...Using economic indicators to create an empirical model of inflation
Here, seeking to understand the correlation of 50 of the most important economic indicators with inflation, the authors used a rolling linear regression to identify indicators with the most significant correlation with the Month over Month Consumer Price Index Seasonally Adjusted (CPI). Ultimately the concluded that the average gasoline price, U.S. import price index, and 5-year market expected inflation had the most significant correlation with the CPI.
Read More...Predicting sickle cell vaso-occlusion by microscopic imaging and modeling
The authors use blood smears from individuals with sickle cell disease to correlate sickle cell frequency with the occurrence of vaso-occlusive crises.
Read More...Predicting the spread speed of red imported fire ants under different temperature conditions in China
The authors looked at non-natural factors that influenced the spread rate of fire ants in multiple cities in China.
Read More...Predicting voting and union support in certification elections: Evidence from Starbucks workers, 2021-2024
The authors looked at unionization petitions from Starbucks workers between August 2021 and July 2024 to determine what factors influence votes for or against unionization.
Read More...Predicting and explaining illicit financial flows in developing countries: A machine learning approach
The authors looked at the ability of different machine learning algorithms to predict the level of financial corruption in different countries.
Read More...Predicting smoking status based on RNA sequencing data
Given an association between nicotine addiction and gene expression, we hypothesized that expression of genes commonly associated with smoking status would have variable expression between smokers and non-smokers. To test whether gene expression varies between smokers and non-smokers, we analyzed two publicly-available datasets that profiled RNA gene expression from brain (nucleus accumbens) and lung tissue taken from patients identified as smokers or non-smokers. We discovered statistically significant differences in expression of dozens of genes between smokers and non-smokers. To test whether gene expression can be used to predict whether a patient is a smoker or non-smoker, we used gene expression as the training data for a logistic regression or random forest classification model. The random forest classifier trained on lung tissue data showed the most robust results, with area under curve (AUC) values consistently between 0.82 and 0.93. Both models trained on nucleus accumbens data had poorer performance, with AUC values consistently between 0.65 and 0.7 when using random forest. These results suggest gene expression can be used to predict smoking status using traditional machine learning models. Additionally, based on our random forest model, we proposed KCNJ3 and TXLNGY as two candidate markers of smoking status. These findings, coupled with other genes identified in this study, present promising avenues for advancing applications related to the genetic foundation of smoking-related characteristics.
Read More...