Breast Cancer Prediction Based On Gene Expression Data Using Interpretable Machine Learning Techniques

dc.contributor.authorKallah-Dagadu, G.
dc.contributor.authorMohammed, M.
dc.contributor.authorNasejje, J.B.
dc.contributor.authorMchunu, N.N.
dc.contributor.authorTwabi, H.S.
dc.contributor.authorBatidzirai, J.M.
dc.contributor.authorSingini, G.C.
dc.contributor.authoret al.
dc.date.accessioned2025-11-11T12:05:16Z
dc.date.issued2025-03-04
dc.descriptionResearch Article
dc.description.abstractBreast cancer remains a global health burden, with an increase in deaths related to this particular cancer. Accurately predicting and diagnosing breast cancer is important for treatment development and survival of patients. This study aimed to accurately predict breast cancer using a dataset comprising 1208 observations and 3602 genes. The study employed feature selection techniques to identify the most influential predictive genes for breast cancer using machine learning (ML) models. The study used K-nearest Neighbors (KNN), random forests (RF), and a support vector machine (SVM). Furthermore, the study employed feature- and model-based importance and explainable ML methods, including Shapley values, Partial dependency (PDPS), and Accumulated Local Effects (ALE) plots, to explain the genes’ importance ranking from the ML methods. Shapley values highlighted the significance of some of the genes in predicting cancer presence. Model-based feature ranking techniques, particularly the Leaving-One-Covariate-In (LOCI) method, identified the ten most critical genes for predicting tumor cases. The LOCI rankings from the SVM and RF methods were aligned. Additionally, visualization methods such as PDPS and ALE plots demonstrated how individual feature changes affect predictions and interactions with other genes. By combining feature selection techniques and explainable ML methods, this study has demonstrated the interpretability and reliability of machine learning models for breast cancer prediction, emphasizing the importance of incorporating explainable ML approaches for medical decision-making.
dc.description.sponsorshipResearch reported in this publication was supported by the Fogarty International Center of the National Insti tutes of Health under Award Number D43TW010547. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
dc.identifier.citationKallah-Dagadu, G., Mohammed, M., Nasejje, J. B., Mchunu, N. N., Twabi, H. S., Batidzirai, J. M., ... & Maposa, I. (2025). Breast cancer prediction based on gene expression data using interpretable machine learning techniques. Scientific Reports, 15(1), 7594.
dc.identifier.urihttps://doi.org/10.1038/s41598-025-85323-5
dc.identifier.urihttps://ugspace.ug.edu.gh/handle/123456789/44125
dc.language.isoen
dc.publisherScientific Reports
dc.subjectBreast cancer
dc.subjectPrediction
dc.subjectMachine learning
dc.subjectInterpretable machine learning
dc.titleBreast Cancer Prediction Based On Gene Expression Data Using Interpretable Machine Learning Techniques
dc.typeArticle

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Breast cancer prediction based.pdf
Size:
2.91 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: