A comparative analysis of machine learning approaches for prediction of breast cancer
(1) Plano West Senior High School, Plano, Texas
One of the most dreadful diseases for women and their health is breast cancer. Breast cancer death rates are higher than those for any other cancer, aside from lung cancer. Machine learning and deep learning techniques can be used to predict the early onset of breast cancer. The main objective of this analysis was to determine whether machine learning algorithms can be used to predict the onset of breast cancer with more than 90% accuracy. Based on research with supervised machine learning algorithms, Gaussian Naïve Bayes, K Nearest Algorithm, Random Forest, and Logistic Regression were considered because they offer a wide variety of classification methods and also provide high accuracy and performance. We hypothesized that all these algorithms would provide accurate results, and Random Forest and Logistic Regression would provide better accuracy and performance than Naïve Bayes and K Nearest Neighbor. The Wisconsin Breast Cancer dataset from the UC Irvine repository was used to perform a comparison between the supervised machine learning algorithms of Gaussian Naïve Bayes, K Nearest Neighbor, Random Forest, and Logistic Regression. Based on the results, the Random Forest algorithm performed best among the four algorithms in malignant prediction (accuracy = 98%), and Logistic Regression algorithm performed best among the four algorithms in benign prediction (accuracy = 99%). All the algorithms performed well in the prediction of benign versus malignant cancer, with more than 90% accuracy based on their F1-score. The study results can be used for further research in prediction of cancer using machine learning algorithms.