Breast Cancer Prediction Based On Gene Expression Data Using Interpretable Machine Learning Techniques
Loading...
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Scientific Reports
Abstract
Breast cancer remains a global health burden, with an increase in deaths related to this particular
cancer. Accurately predicting and diagnosing breast cancer is important for treatment development
and survival of patients. This study aimed to accurately predict breast cancer using a dataset
comprising 1208 observations and 3602 genes. The study employed feature selection techniques to
identify the most influential predictive genes for breast cancer using machine learning (ML) models.
The study used K-nearest Neighbors (KNN), random forests (RF), and a support vector machine
(SVM). Furthermore, the study employed feature- and model-based importance and explainable ML
methods, including Shapley values, Partial dependency (PDPS), and Accumulated Local Effects (ALE)
plots, to explain the genes’ importance ranking from the ML methods. Shapley values highlighted
the significance of some of the genes in predicting cancer presence. Model-based feature ranking
techniques, particularly the Leaving-One-Covariate-In (LOCI) method, identified the ten most critical
genes for predicting tumor cases. The LOCI rankings from the SVM and RF methods were aligned.
Additionally, visualization methods such as PDPS and ALE plots demonstrated how individual
feature changes affect predictions and interactions with other genes. By combining feature selection
techniques and explainable ML methods, this study has demonstrated the interpretability and
reliability of machine learning models for breast cancer prediction, emphasizing the importance of
incorporating explainable ML approaches for medical decision-making.
Description
Research Article
Citation
Kallah-Dagadu, G., Mohammed, M., Nasejje, J. B., Mchunu, N. N., Twabi, H. S., Batidzirai, J. M., ... & Maposa, I. (2025). Breast cancer prediction based on gene expression data using interpretable machine learning techniques. Scientific Reports, 15(1), 7594.
