Empirical exploration of whale optimisation algorithm for heart disease prediction
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Scientific Reports
Abstract
Heart Diseases have the highest mortality worldwide, necessitating precise predictive models
for early risk assessment. Much existing research has focused on improving model accuracy with
single datasets, often neglecting the need for comprehensive evaluation metrics and utilization of
different datasets in the same domain (heart disease). This research introduces a heart disease risk
prediction approach by harnessing the whale optimization algorithm (WOA) for feature selection and
implementing a comprehensive evaluation framework. The study leverages five distinct datasets,
including the combined dataset comprising the Cleveland, Long Beach VA, Switzerland, and
Hungarian heart disease datasets. The others are the Z-AlizadehSani, Framingham, South African, and
Cleveland heart datasets. The WOA-guided feature selection identifies optimal features, subsequently
integrated into ten classification models. Comprehensive model evaluation reveals significant
improvements across critical performance metrics, including accuracy, precision, recall, F1 score,
and the area under the receiver operating characteristic curve. These enhancements consistently
outperform state-of-the-art methods using the same dataset, validating the effectiveness of our
methodology. The comprehensive evaluation framework provides a robust assessment of the model’s
adaptability, underscoring the WOA’s effectiveness in identifying optimal features in multiple datasets
in the same domain. Heart Disease (HD) is of utmost importance due to the heart’s critical role among other human organs. HD
has high death rates worldwide, with approximately 17.9 million people dying from heart conditions in 20191.
Heart diseases account for 32% of global deaths, with heart attacks and stroke alone making more than 85% of
recorded deaths. Over 75% of cardiovascular deaths in 2019 occurred in underdeveloped nations, accounting for
38% of deaths under 70 years1.
Since cardiovascular diseases are fatal, their early detection will enable medical
professionals to provide timely healthcare to patients to avert death.
Because of a scarcity of ultra-modern examination tools and medical experts, conventional medical methods
for diagnosing heart diseases are challenging, complicated, time-consuming, and exorbitant, making the diagnosis of heart diseases difficult and sometimes unavailable, especially in developing countries2.
Machine and
deep learning methods have been recently used to analyze clinical data and make predictions3.
Machine learning (ML) provides cost-efficient alternatives where already collected patient data serve as a
data mine to perform predictive analysis for diagnostic purposes. To improve the accuracy of ML models, some
existing works have focused on using various classifiers or their enhanced forms4–
7. Related works confirm that
the feature selection reduces data dimensionality and improves model performance significantly8.
Hence, some
studies have utilized various methods to improve performance by varying the feature selection methods9,10.
However, some works that utilize feature selection are fraught with redundant features that impact metrics
recorded. This is affirmed when wrapper methods are used over filter methods and when embedded methods
are used over filter and wrapper methods. It also explains why works, including feature selection, may only
record better performance on some datasets if the technique is efficient. In addition, though the researchers do
not present the reason some existing works have not reported on specific metrics, studies such as Hicks et al.11
have posited that in a clinical setting, a subset of metrics may give an erroneous outlook of how a model performs
and not enabling holistic model performance evaluation. There is an avenue for more scientific work on
feature selection methods capable of improving other metrics besides the accuracy metric. This helps to affirm
the reliability of the model performance as the unavailability of multiple evaluation metrics is an indication of
an unbalanced model not capable of being thoroughly assessed.
This study proposes the use of the whale optimization algorithm (WOA) as a swarm-inspired feature selection
algorithm on five (5) heart datasets on ten (10) models (classical ML, ensemble and deep learning models)
for the selection of relevant datasets features. The approach contributes to the body of knowledge in the heart
disease domain by providing a comprehensive assessment of five different datasets (in the same domain), ten
different models and five evaluation metrics. The proposed methodology also validates the robustness of the
WOA algorithm on five datasets of variable sizes in the same domain compared to most works, which do not
test their methodologies on multiple datasets in the same domain.
Description
Research Article
