Implementing machine learning algorithms on criminal databases to develop a criminal activity index

(1) Aspiring Scholars Directed Research Program, Fremont, California; Irvington High School, Fremont, California, (2) Aspiring Scholars Directed Research Program, Fremont, California; Mission San Jose High School, Fremont, California, (3) Aspiring Scholars Directed Research Program, Fremont, California; Branham High School, San Jose, California, (4) Aspiring Scholars Directed Research Program, Fremont, California; Dougherty Valley High School, San Ramon, California, (5) Aspiring Scholars Directed Research Program, Fremont, California; West Windsor-Plainsboro High School North, Plainsboro, New Jersey, (6) Aspiring Scholars Directed Research Program, Fremont, California; Valley Christian High School, San Jose, California, (7) Aspiring Scholars Directed Research Program, Fremont, California; Westwood High School, Westwood, California, (8) Aspiring Scholars Directed Research Program, Fremont, California

https://doi.org/10.59720/22-250
Cover photo for Implementing machine learning algorithms on criminal databases to develop a criminal activity index

Criminal activity is a major concern in today's society. Local police agencies collect and analyze vast amounts of data to prevent future crimes from taking place and protect vulnerable populations. However, despite the availability of publicly accessible data, such as the Open Justice California website, there remains a lack of efficient methods for relaying this information to the public in a digestible format. Our research aims to bridge this gap by utilizing machine learning techniques to correlate crime data with a range of explanatory factors. We first employed a clustering algorithm to normalize the data based on population. Then we tested five different predictive algorithms to determine the most effective machine learning model. Our results indicated that a neural network approach was more accurate based on our training in predicting crime rates. Additionally, we hypothesized that higher median income, lower population density, lower unemployment duration, and lower median age would be associated with lower crime rates and that these associations would be statistically significant. Our results show that median income, population, and unemployment duration all have a significant correlation with crime rates in California while median age does not.

Download Full Article as PDF