Expert Systems With Applications 258 (2024 ) 125133 

A
0

Contents lists available at ScienceDirect

Expert Systems With Applications

journal homepage: www.elsevier.com/locate/eswa

Enhancing corporate bankruptcy prediction via a hybrid genetic algorithm
and domain adaptation learning architecture
T. Ansah-Narh a,b,c,∗, E.N.N. Nortey b, E. Proven-Adzri a, R. Opoku-Sarkodie d

a Ghana Space Science and Technology Institute, Ghana Atomic Energy Commission, P. O. Box LG 80, Legon-Accra, Ghana
b Department of Statistics and Actuarial Science, University of Ghana, P. O. Box LG 115, Legon-Accra, Ghana
c School of Technology, Ghana Institute of Management and Public Administration, P. O. Box AH 50, Achimota-Accra, Ghana
d Department of Information Technology and Mathematical Sciences, Methodist University Ghana, P. O. Box DC 940, Dansoman-Accra, Ghana

A R T I C L E I N F O

Dataset link: Taiwanese Bankruptcy Prediction
dataset, Polish Companies Bankruptcy data

Keywords:
Bankruptcy prediction
Financial ratios
Genetic algorithm
Domain adaptation learning
Data distribution shifts
Bayesian optimisation

A B S T R A C T

In the contemporary business landscape, accurately evaluating a company’s financial health is essential for
stakeholders to mitigate risks and avert bankruptcy. This study presents an innovative approach to improving
business bankruptcy prediction through the hybrid integration of Domain Adaptation Learning (DAL) and
Genetic Algorithm (GA) techniques. The hybrid model harnesses DAL to address distributional changes in real-
world scenarios and utilises GA’s proficiency in feature selection. Six machine learning models are rigorously
evaluated against the proposed hybrid model: Random Forest (RF), Support Vector Machine (SVM), Logistic
Regression (LR), Gradient Boosting (GB), k-Nearest Neighbours (k-NN), and Stacking Ensemble (SE). Our hybrid
model performs well on imbalanced target datasets using the Area Under the Precision–Recall Curve metric:
0.93 (RF), 0.93 (SVM), 0.89 (LR), 0.91 (GB), 0.88 (k-NN), and 0.92 (SE). These findings highlight the model’s
ability to overcome the limitations of traditional approaches, offering a more reliable predictive framework for
stakeholders to make informed decisions and proactively manage financial stability. Future research directions
may explore the applicability of this hybrid model across different industries and the integration of additional
techniques to further enhance its performance.
1. Introduction

Examining a company’s financial performance is an important task,
as it plays a pivotal role in determining its strengths and weaknesses. A
company’s daily transaction records serve as a valuable source of infor-
mation for decision-making, especially when focusing on scenarios that
lead to bankruptcy. When a company experiences financial distress,
it undergoes a gradual evolution, initially with limited liquidity and
eventually leading to bankruptcy (Fahlevi & Marlinah, 2018). In today’s
business environment, there has been a marked increase in the number
of companies facing financial failure and subsequent liquidation. A
relevant example is the financial sector reforms that began in Ghana
in 2017, which resulted in the central bank revoking the licenses of
23 universal banks and 388 microfinance and microcredit companies.1
Also, because privately held enterprises frequently lack the trustworthy
and open financial statements of publicly audited organisations, recent
research by da Silva Mattos and Shasha (2024) has shown how difficult
it is to predict insolvency for these types of businesses. Managing these
less trustworthy reports presents special difficulties for stakeholders as

∗ Corresponding author at: Ghana Space Science and Technology Institute, Ghana Atomic Energy Commission, P. O. Box LG 80, Legon-Accra, Ghana.
E-mail addresses: theophilus.ansah-narh@gaec.gov.gh (T. Ansah-Narh), ennnortey@ug.edu.gh (E.N.N. Nortey), emmanuel.proven-adzri@gaec.gov.gh

(E. Proven-Adzri), rsarkodie@mucg.edu.gh (R. Opoku-Sarkodie).
1 https://www.bog.gov.gh/wp-content/uploads/2019/08/Revocation-of-Licenses-of-SDIs-16.8.19.pdf.

a result. These case studies emphasise how vital it is to carry out in-
depth research in order to offer information that will help pertinent
stakeholders prevent business defaults. In fact, the Ghana case serves
as a compelling illustration of the challenges faced by companies in
dynamic economic environments, highlighting the critical need for
predictive models that can adapt to evolving industry landscapes. While
the data used in this study originates from the Taiwan bankruptcy pre-
diction dataset, it also incorporates the Polish Companies Bankruptcy
data and considers the Ghanaian context, highlighting the global issue
of corporate bankruptcy and the need for adaptable predictive models
across diverse financial environments. To address this urgent research
need, it is important to advance our understanding of the complex
dynamics that lead to corporate bankruptcy. This study aims to focus
on the following primary research question: How can the accuracy and
adaptability of bankruptcy prediction models be enhanced to effectively
handle distributional changes in real-world scenarios? The recent finan-
cial crisis highlights the importance of continually re-evaluating and
refining methods to improve predictive power. Given the complexity
https://doi.org/10.1016/j.eswa.2024.125133
Received 6 March 2024; Received in revised form 31 July 2024; Accepted 15 Augu
vailable online 20 August 2024 
957-4174/© 2024 Elsevier Ltd. All rights are reserved , including those for text and 
st 2024

data mining , AI training , and similar technologies. 

https://www.elsevier.com/locate/eswa
https://www.elsevier.com/locate/eswa
https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction
https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction
https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction
https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction
https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction
https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction
https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction
https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction
https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction
https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction
https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction
https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction
https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction
https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction
https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction
https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction
https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction
https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction
https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction
https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction
https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction
https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction
https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction
https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction
https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction
https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction
https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction
https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction
https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction
https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction
https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction
https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction
https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction
https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction
https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction
https://archive.ics.uci.edu/ml/datasets/Taiwanese+Bankruptcy+Prediction
https://archive.ics.uci.edu/dataset/365/polish+companies+bankruptcy+data
https://archive.ics.uci.edu/dataset/365/polish+companies+bankruptcy+data
https://archive.ics.uci.edu/dataset/365/polish+companies+bankruptcy+data
https://archive.ics.uci.edu/dataset/365/polish+companies+bankruptcy+data
https://archive.ics.uci.edu/dataset/365/polish+companies+bankruptcy+data
https://archive.ics.uci.edu/dataset/365/polish+companies+bankruptcy+data
https://archive.ics.uci.edu/dataset/365/polish+companies+bankruptcy+data
https://archive.ics.uci.edu/dataset/365/polish+companies+bankruptcy+data
https://archive.ics.uci.edu/dataset/365/polish+companies+bankruptcy+data
https://archive.ics.uci.edu/dataset/365/polish+companies+bankruptcy+data
https://archive.ics.uci.edu/dataset/365/polish+companies+bankruptcy+data
https://archive.ics.uci.edu/dataset/365/polish+companies+bankruptcy+data
https://archive.ics.uci.edu/dataset/365/polish+companies+bankruptcy+data
https://archive.ics.uci.edu/dataset/365/polish+companies+bankruptcy+data
https://archive.ics.uci.edu/dataset/365/polish+companies+bankruptcy+data
https://archive.ics.uci.edu/dataset/365/polish+companies+bankruptcy+data
https://archive.ics.uci.edu/dataset/365/polish+companies+bankruptcy+data
https://archive.ics.uci.edu/dataset/365/polish+companies+bankruptcy+data
https://archive.ics.uci.edu/dataset/365/polish+companies+bankruptcy+data
https://archive.ics.uci.edu/dataset/365/polish+companies+bankruptcy+data
https://archive.ics.uci.edu/dataset/365/polish+companies+bankruptcy+data
https://archive.ics.uci.edu/dataset/365/polish+companies+bankruptcy+data
https://archive.ics.uci.edu/dataset/365/polish+companies+bankruptcy+data
https://archive.ics.uci.edu/dataset/365/polish+companies+bankruptcy+data
https://archive.ics.uci.edu/dataset/365/polish+companies+bankruptcy+data
https://archive.ics.uci.edu/dataset/365/polish+companies+bankruptcy+data
https://archive.ics.uci.edu/dataset/365/polish+companies+bankruptcy+data
https://archive.ics.uci.edu/dataset/365/polish+companies+bankruptcy+data
https://archive.ics.uci.edu/dataset/365/polish+companies+bankruptcy+data
mailto:theophilus.ansah-narh@gaec.gov.gh
mailto:ennnortey@ug.edu.gh
mailto:emmanuel.proven-adzri@gaec.gov.gh
mailto:rsarkodie@mucg.edu.gh
https://www.bog.gov.gh/wp-content/uploads/2019/08/Revocation-of-Licenses-of-SDIs-16.8.19.pdf
https://doi.org/10.1016/j.eswa.2024.125133
https://doi.org/10.1016/j.eswa.2024.125133


T. Ansah-Narh et al.

o

Expert Systems With Applications 258 (2024 ) 125133 
of the global business environment, a comprehensive investigation of
the factors that influence a company’s financial position is necessary.
Additionally, the landscape of financial analysis is constantly changing
due to technological advances, so it is essential to utilise innovative
methodologies.

Previous research has mostly concentrated on using structural and
statistical approaches to predict insolvency. The latter, which is the
subject of this study, uses traditional machine learning models such as
k-nearest neighbours (Chen et al., 2011; Li & Wang, 2017), discrimi-
nant analysis (Altman, 1968; Kliestik, Vrbka, & Rowland, 2018), logit
models (Chi & Tang, 2006; Li, Lee, Zhou, & Sun, 2011), artificial neural
networks (ANN) (Odom & Sharda, 1990; Zhang, Hu, Patuwo, & Indro,
1999), and decision trees (Olson, Delen, & Meng, 2012; Syed Nor, Is-
mail, & Yap, 2019). For instance, the study by Min, Lee, and Han (2006)
discusses how bankruptcy prediction affects bank lending decisions and
profitability. In contrast to logistic regression (LR) and neural networks,
it emphasises the recent application of support vector machines (SVM)
in this field and shows its promising outcomes. The article highlights
the growing application of genetic algorithms (GA) in conjunction with
other AI methods such as neural networks and case-based reasoning. It
does, however, highlight the paucity of research on the combination
of GA and SVM, in spite of its promise for useful applications. In
order to improve bankruptcy prediction, the study uses GA to simul-
taneously optimise two factors–feature subset selection and parameter
optimisation, in order to increase SVM performance. These models
seek to pinpoint the relevant financial factors that directly impact
bankruptcy prediction. The former strategy, on the other hand, entails
complex accounting ratio forecasts and a thorough comprehension of
the economic subtleties of the organisation being studied.

While the concept of ANN dates back approximately eight decades
to McCulloch and Pitts (1943)’s threshold logic-based model designed
to emulate the human brain, the modern environment is characterised
by the incorporation of high-performance computing systems that are
driving AI into the mainstream. The computational design of ANN is
based on interconnected neurons, where each connection facilitates
the transmission of signals from one neuron to another. The receiving
neuron processes the signal and subsequently transmits the processed
information to other interconnected neurons. The organisation of these
neurons typically involves layers, with the first layer serving as the
input and the last layer as the output. Sandwiched between these are
hidden layers, which can be shallow or deep in design. The versatility
of ANN architecture allows it to handle data ranging from single to
multiple dimensions, making it applicable across a broad spectrum
of cases. The efficiency of AI applications, particularly in bankruptcy
prediction, has substantially increased due to the growth of large
and diverse datasets, both organised and unstructured. Investigations
by Awoyemi, Adetunmbi, and Oluwadare (2017), Kristóf and Virág
(2020), Sharma, Banerjee, Tiwari, and Patni (2021), Tripathi, Edla,
Cheruku, and Kuppili (2019) into bankruptcy prediction demonstrate
this efficiency. In this domain, the term ‘‘prediction’’ is often used
interchangeably with ‘‘classification’’ because the ultimate goal is to
determine whether a company will likely face financial distress or
bankruptcy. Table 1 provides a comparative analysis of various stud-
ies in bankruptcy prediction, highlighting the diversity in methods,
datasets, and results.

However, the AI approach to bankruptcy prediction described above
relies on assumptions inherent in classical machine learning. One key
assumption is that the training and test sets come from the same
distribution, meaning a model trained on labelled data is expected to
perform effectively on test data. This assumption may not always hold
in real-world applications where training and test data can come from
different distributions. Discrepancies can arise from various factors,
such as differences in the origins of the training and test sets or
an outdated training set due to changes in data patterns over time.
In instances where there is a disparity across domain distributions,

blindly applying the trained model to a new dataset can lead to a e

2 
decline in performance. Addressing this challenge falls within the realm
of domain adaptation, a subfield within machine learning (Farahani,
Voghoei, Rasheed, & Arabnia, 2021; Guan & Liu, 2021; Jiang & Zhai,
2007). The primary objective of domain adaptation is to mitigate issues
arising from differing distributions by aligning them, thus enabling
the trained model to generalise effectively within the domain of inter-
est. This alignment process is crucial for ensuring the robustness and
applicability of the predictive model in real-world scenarios.

In light of the complex issues mentioned, this study attempts to
investigate a novel approach to reduce the effects of distributional
changes. The principal aim is to facilitate the creation of bankruptcy
prediction models with increased flexibility and robustness under dy-
namic and changing conditions. As a result, this work presents a hybrid
model that combines the strengths of Genetic Algorithm (GA) and
Domain Adaptation Learning (DAL). This strategy aims to create a
more reliable and adaptable predictive model for bankruptcy analysis
by combining the best features of both approaches. The GA employed
utilises a heuristic search-based scheme to extract relevant financial
features from the original dataset and feed them into the proposed
DAL pipeline for bankruptcy prediction. Keep in mind that the com-
putational method used by the optimisation solver in GA is based on
biological evolution and follows the guidelines of the natural evolution
process (Ghanea-Hercock, 2003). This methodology finds applications
across diverse fields, showcasing its versatility. Notably, it has been
extensively used in pattern recognition (Alsultanny & Aqel, 2003; Kim,
Park, Yang, & Sim, 2006; Maulik & Bandyopadhyay, 2000; Pal & Wang,
1996), route optimisation (Inagaki, Haseyama, & Kitajima, 1999), net-
work intrusion detection (Li, 2004), and image processing (Bhanu, Lee,
& Ming, 1995; Saitoh, 1999). On the other hand, DAL is a machine
learning paradigm that aims to address the challenges that arise when
a model trained in one domain (source domain2) is deployed to another
related domain.(target domain3) The core principle of domain adapta-
tion pipelines is to leverage knowledge gained from the source domain
to improve the generalisation and performance of the model in the tar-
get domain. This becomes very important when there are distributional
changes between training data and deployment data. Therefore, for the
purpose of this work, we seek to perform the following tasks:

i. Create a spatial distribution model to visually represent corre-
lations between financial variables and gain a comprehensive
understanding of hidden patterns within the original dataset.

ii. Mitigate biases in the source domain dataset through the use of
simulation techniques, ensuring the robustness and reliability of
the analysis.

iii. Systematically identify and select financial features considered
crucial for predicting corporate bankruptcy within the source
domain, thereby improving the precision of the subsequent mod-
elling process.

iv. Utilise the financial features selected in Task (iii) within the
source domain to apply domain adaptation techniques, ensuring
the model’s robustness in the face of variations in data dis-
tributions. This crucial step enhances the model’s applicability
to real-world scenarios beyond the training data by addressing
potential disparities in data distribution.

We put together the following sections of the paper: Section 2 dis-
cusses the data acquisition process, emphasising the dataset’s relevance,
coverage of financial information, and the rigorous data-gathering
procedures followed to ensure accuracy and applicability. Section 3
focuses on the classification measures used in the study to evaluate the
performance of the proposed hybrid model for bankruptcy prediction.

2 The source domain refers to the domain from which the training data is
btained to build a machine learning model.

3 The target domain is the domain where the model’s performance is
valuated. The target data is mostly imbalanced.


T. Ansah-Narh et al. Expert Systems With Applications 258 (2024 ) 125133 
Table 1
Summary of various bankruptcy prediction models and their respective performance metrics.

Author Dataset Methoda ACC (%) Obs. period Attributes Pred. type Limitation

Almaskati,
Bird, Yeung,
and Lu (2021)

S & P firms
• ALTb

• OHLc

• ZMJd

• SHWe

• BSHf

• CHSg

• PRMh

• 0.82
• 0.73
• 0.82
• 0.79
• 0.73
• 0.80
• 0.81

2005–2015 19 Bankruptcy
• Impact of specific governance variables
• Comparison of different non-parametric
methods
• Temporal changes in governance impact

Liang, Tsai,
Dai, and Eberle
(2018)

• Taiwanese
• Chinese
• Australian
• German

• SVMi

• KNNj

• MLPk

• CARTl

• Bayesm

• 77.60–91.27
• 69.40–90.39
• 71.40–89.38
• 73.20–93.00
• 70.70–88.82

2005–2015
• 95
• 45
• 14
• 24

• Bankruptcy
• Bankruptcy
• Credit
• Credit

• Exploration of new classifier ensembles
• Type I error reduction
• Dataset Diversity

Barboza,
Kimura, and
Altman (2017)

North
American
firms

• MDAn, LRo, ANNp

• Baggingq,
Boostingr, RFs, SVMi

• 52–77
• 71–87

1985–2013 11 Bankruptcy
• Limited analysis of new financial indicators
• Longitudinal changes in model performance
• Study focused on North American firms

Heo and Yang
(2014)

Korean
construction
companies

AdaBoostt 78.5 2008–2012 12 Bankruptcy
• Limited analysis of new financial indicators
• Exploration of other algorithms
• Study focused on Korean construction firms

a Method: The approach or algorithm used for bankruptcy prediction.
b ALT: Altman Z-score Model.
c OHL: Ohlson O-score Model.
d ZMJ: Zmijewski Model.
e SHW: Shumway Model.
f BSH: Bharath–Shumway Model.
g CHS: Campbell, Hilscher, and Szilagyi Model.
h PRM: Premachandra Model.
i SVM: Support Vector Machines.
j KNN: K-Nearest Neighbors.
k MLP: Multi-Layer Perceptron.
l CART: Classification and Regression Trees.
m Bayes: Bayesian Classifiers.
n MDA: Multiple Discriminant Analysis.
o LR: Logistic Regression.
p ANN: Artificial Neural Networks.
q Bagging: Bootstrap Aggregating.
r Boosting: Ensemble Technique for Improving Weak Models.
s RF: Random Forest.
t AdaBoost: Adaptive Boosting.
It outlines the metrics and evaluation criteria employed to assess the
model’s predictive accuracy, adaptability, and generalisation capabili-
ties. The section provides insights into how the model’s performance
is measured and analysed in the context of imbalanced target datasets.
The Section 4 delves into the implications of the results, providing a
detailed analysis of how the hybrid model addresses the challenges of
traditional approaches in bankruptcy prediction. The final section (in
Section 5) summarises the key findings and contributions of the study,
emphasising the significance of the hybrid model in enhancing corpo-
rate bankruptcy prediction. It discusses the implications of the research
for financial risk management and decision-making, highlighting the
potential impact of the hybrid model on stakeholders in the business
landscape. The conclusion also outlines future research directions and
areas for further exploration to enhance the performance and usefulness
of the hybrid model in different industries and scenarios.

2. Data and methods

2.1. Data acquisition

In the present study, we utilised the Taiwan bankruptcy prediction
dataset from the University of California, Irvine machine learning
3 
repository, originally compiled from the Taiwan Economic Journal
and covering the period from 1999–2009. Additionally, the study in-
corporated the Polish Companies Bankruptcy data to evaluate the
effectiveness of the proposed hybrid model. The selection of these
datasets was based on their comprehensive coverage of financial in-
formation and provided a solid foundation for training and evaluation
of the hybrid model. For instance, in the case of the Taiwan data, a
rigorous procedure was followed during the data-gathering phase to
guarantee the accuracy and applicability of the dataset (Liang, Lu, Tsai,
& Shih, 2016). Two fundamental standards were utilised in the process
of collecting the data:

i. The selected companies were required to disclose their finan-
cial information for at least three years before the start of the
financial crisis. This criterion ensured that the dataset contained
sufficient temporal context so that the model could capture
pre-crisis trends and patterns.

ii. Another important criterion was to consider similar companies
within the same industry. This step was important for a nuanced
analysis of the financial picture, allowing the model to identify
industry-specific dynamics. The goal was to improve the model’s
ability to generalise insights across companies with comparable
economic conditions.


T. Ansah-Narh et al.

a
a
d
s
a
a
t
p
o
f

Expert Systems With Applications 258 (2024 ) 125133 
Table 2
Descriptive statistics of some financial ratios in the Taiwan bankruptcy data.

Variables Count Mean Std Min 25% 50% 75% Max

Roa(C) Before Interest And Depreciation Before ... 6819.00 0.51 0.06 0.00 0.48 0.50 0.54 1.00
Roa(A) Before Interest And % After Tax 6819.00 0.56 0.07 0.00 0.54 0.56 0.59 1.00
Roa(B) Before Interest And Depreciation After Tax 6819.00 0.55 0.06 0.00 0.53 0.55 0.58 1.00
Operating Gross Margin 6819.00 0.61 0.02 0.00 0.60 0.61 0.61 1.00
Realised Sales Gross Margin 6819.00 0.61 0.02 0.00 0.60 0.61 0.61 1.00
Operating Profit Rate 6819.00 1.00 0.01 0.00 1.00 1.00 1.00 1.00
Pre-Tax Net Interest Rate 6819.00 0.80 0.01 0.00 0.80 0.80 0.80 1.00
After-Tax Net Interest Rate 6819.00 0.81 0.01 0.00 0.81 0.81 0.81 1.00
Non-Industry Income And Expenditure/Revenue 6819.00 0.30 0.01 0.00 0.30 0.30 0.30 1.00
Continuous Interest Rate (After Tax) 6819.00 0.78 0.01 0.00 0.78 0.78 0.78 1.00
Operating Expense Rate 6819.00 1,995,347,312.80 3,237,683,890.52 0.00 0.00 0.00 4,145,000,000.00 9,990,000,000.00
Research And Development Expense Rate 6819.00 1,950,427,306.06 2,598,291,554.00 0.00 0.00 509,000,000.00 3,450,000,000.00 9,980,000,000.00
Cash Flow Rate 6819.00 0.47 0.02 0.00 0.46 0.47 0.47 1.00
Interest-Bearing Debt Interest Rate 6819.00 16,448,012.91 108,275,033.53 0.00 0.00 0.00 0.00 990,000,000.00
Tax Rate (A) 6819.00 0.12 0.14 0.00 0.00 0.07 0.21 1.00
Net Value Per Share (B) 6819.00 0.19 0.03 0.00 0.17 0.18 0.20 1.00
Net Value Per Share (A) 6819.00 0.19 0.03 0.00 0.17 0.18 0.20 1.00
Net Value Per Share (C) 6819.00 0.19 0.03 0.00 0.17 0.18 0.20 1.00
Persistent Eps In The Last Four Seasons 6819.00 0.23 0.03 0.00 0.21 0.22 0.24 1.00
Cash Flow Per Share 6819.00 0.32 0.02 0.00 0.32 0.32 0.33 1.00
Revenue Per Share (Yuan ¥) 6819.00 1,328,640.60 51,707,089.77 0.00 0.02 0.03 0.05 3,020,000,000.00
Operating Profit Per Share (Yuan ¥) 6819.00 0.11 0.03 0.00 0.10 0.10 0.12 1.00
Per Share Net Profit Before Tax (Yuan ¥) 6819.00 0.18 0.03 0.00 0.17 0.18 0.19 1.00
Realised Sales Gross Profit Growth Rate 6819.00 0.02 0.01 0.00 0.02 0.02 0.02 1.00
Operating Profit Growth Rate 6819.00 0.85 0.01 0.00 0.85 0.85 0.85 1.00
After-Tax Net Profit Growth Rate 6819.00 0.69 0.01 0.00 0.69 0.69 0.69 1.00
Regular Net Profit Growth Rate 6819.00 0.69 0.01 0.00 0.69 0.69 0.69 1.00
Continuous Net Profit Growth Rate 6819.00 0.22 0.01 0.00 0.22 0.22 0.22 1.00
Total Asset Growth Rate 6819.00 5,508,096,595.25 2,897,717,771.17 0.00 4,860,000,000.00 6,400,000,000.00 7,390,000,000.00 9,990,000,000.00
Net Value Growth Rate 6819.00 1,566,212.06 114,159,389.52 0.00 0.00 0.00 0.00 9,330,000,000.00
Total Asset Return Growth Rate Ratio 6819.00 0.26 0.01 0.00 0.26 0.26 0.26 1.00
Cash Reinvestment % 6819.00 0.38 0.02 0.00 0.37 0.38 0.39 1.00
Current Ratio 6819.00 403,284.95 33,302,155.83 0.00 0.01 0.01 0.02 2,750,000,000.00
Quick Ratio 6819.00 8376594.82 244,684,748.45 0.00 0.00 0.01 0.01 9,230,000,000.00
Interest Expense Ratio 6819.00 0.63 0.01 0.00 0.63 0.63 0.63 1.00
Total Debt/Total Net Worth 6819.00 4,416,336.71 168,406,905.28 0.00 0.00 0.01 0.01 9,940,000,000.00
Debt Ratio % 6819.00 0.11 0.05 0.00 0.07 0.11 0.15 1.00
Net Worth/Assets 6819.00 0.89 0.05 0.00 0.85 0.89 0.93 1.00
Long-Term Fund Suitability Ratio (A) 6819.00 0.01 0.03 0.00 0.01 0.01 0.01 1.00
Borrowing Dependency 6819.00 0.37 0.02 0.00 0.37 0.37 0.38 1.00
Contingent Liabilities/Net Worth 6819.00 0.01 0.01 0.00 0.01 0.01 0.01 1.00
Operating Profit/Paid-In Capital 6819.00 0.11 0.03 0.00 0.10 0.10 0.12 1.00
Net Profit Before Tax/Paid-In Capital 6819.00 0.18 0.03 0.00 0.17 0.18 0.19 1.00
Inventory And Accounts Receivable/Net Value 6819.00 0.40 0.01 0.00 0.40 0.40 0.40 1.00
Total Asset Turnover 6819.00 0.14 0.10 0.00 0.08 0.12 0.18 1.00
Accounts Receivable Turnover 6819.00 12,789,705.24 278,259,836.98 0.00 0.00 0.00 0.00 9,740,000,000.00
Average Collection Days 6819.00 9,826,220.86 256,358,895.71 0.00 0.00 0.01 0.01 9,730,000,000.00
Inventory Turnover Rate (Times) 6819.00 2,149,106,056.61 3,247,967,014.05 0.00 0.00 0.00 4,620,000,000.00 9,990,000,000.00
Fixed Assets Turnover Frequency 6819.00 1,008,595,981.82 2,477,557,316.92 0.00 0.00 0.00 0.00 9,990,000,000.00
Net Worth Turnover Rate (Times) 6819.00 0.04 0.04 0.00 0.02 0.03 0.04 1.00
Revenue Per Person 6819.00 2,325,854.27 136,632,654.39 0.00 0.01 0.02 0.04 8,810,000,000.00
Operating Profit Per Person 6819.00 0.40 0.03 0.00 0.39 0.40 0.40 1.00
Allocation Rate Per Person 6819.00 11,255,785.32 294,506,294.12 0.00 0.00 0.01 0.02 9,570,000,000.00
Working Capital To Total Assets 6819.00 0.81 0.06 0.00 0.77 0.81 0.85 1.00
Quick Assets/Total Assets 6819.00 0.40 0.20 0.00 0.24 0.39 0.54 1.00
Current Assets/Total Assets 6819.00 0.52 0.22 0.00 0.35 0.51 0.69 1.00
Cash/Total Assets 6819.00 0.12 0.14 0.00 0.03 0.07 0.16 1.00
Quick Assets/Current Liability 6819.00 3,592,902.20 171,620,908.61 0.00 0.01 0.01 0.01 8,820,000,000.00
Cash/Current Liability 6819.00 37,159,994.15 510,350,903.16 0.00 0.00 0.00 0.01 9,650,000,000.00
Current Liability To Assets 6819.00 0.09 0.05 0.00 0.05 0.08 0.12 1.00
Operating Funds To Liability 6819.00 0.35 0.04 0.00 0.34 0.35 0.36 1.00
Inventory/Working Capital 6819.00 0.28 0.01 0.00 0.28 0.28 0.28 1.00
Inventory/Current Liability 6819.00 55,806,804.53 582,051,554.62 0.00 0.00 0.01 0.01 9,910,000,000.00
Current Liabilities/Liability 6819.00 0.76 0.21 0.00 0.63 0.81 0.94 1.00
Working Capital/Equity 6819.00 0.74 0.01 0.00 0.73 0.74 0.74 1.00
Current Liabilities/Equity 6819.00 0.33 0.01 0.00 0.33 0.33 0.33 1.00
Long-Term Liability To Current Assets 6819.00 54,160,038.14 570,270,621.96 0.00 0.00 0.00 0.01 9,540,000,000.00
Retained Earnings To Total Assets 6819.00 0.93 0.03 0.00 0.93 0.94 0.94 1.00
Total Income/Total Expense 6819.00 0.00 0.01 0.00 0.00 0.00 0.00 1.00
Total Expense/Assets 6819.00 0.03 0.03 0.00 0.01 0.02 0.04 1.00
Current Asset Turnover Rate 6819.00 1,195,855,763.31 2,821,161,238.26 0.00 0.00 0.00 0.00 10,000,000,000.00
Quick Asset Turnover Rate 6819.00 2,163,735,272.03 3,374,944,402.17 0.00 0.00 0.00 4,900,000,000.00 10,000,000,000.00
Working Capital Turnover Rate 6819.00 0.59 0.01 0.00 0.59 0.59 0.59 1.00
Cash Turnover Rate 6819.00 2471976967.44 2,938,623,226.68 0.00 0.00 1,080,000,000.00 4,510,000,000.00 10,000,000,000.00
Cash Flow To Sales 6819.00 0.67 0.01 0.00 0.67 0.67 0.67 1.00
Fixed Assets To Assets 6819.00 1,220,120.50 100,754,158.71 0.00 0.09 0.20 0.37 8,320,000,000.00
Current Liability To Liability 6819.00 0.76 0.21 0.00 0.63 0.81 0.94 1.00
Current Liability To Equity 6819.00 0.33 0.01 0.00 0.33 0.33 0.33 1.00
Equity To Long-Term Liability 6819.00 0.12 0.02 0.00 0.11 0.11 0.12 1.00
t
n
(
a
b
t
c
l
l
I
m
S
i

The dataset covered a wide range of industries, such as the manufac-
turing sector (which includes industrial and electronics enterprises),
the service sector (which includes shipping, tourist, and retail compa-
nies), and other non-financial industry entities. The dataset comprises
a substantial sample of 6819 observations, each characterised by 96
ttributes. Within this dataset, 220 observations have been identified
s instances of bankruptcy. In Table 2, we present comprehensive
escriptive statistics for the chosen financial ratios in the dataset. The
tatistical metrics employed encompass fundamental measures such
s mean, standard deviation, minimum, maximum, and percentiles
t 25%, 50%, and 75%. These metrics offer a general overview of
he distribution and central tendencies of the selected financial ratios,
roviding valuable insights into their variability and the overall profile
f the Taiwan bankruptcy data under examination. Three important
actors in the current investigation necessitate domain adaptation.
 t

4 
First off, the dataset spans a sizable amount of time, from 1999
o 2009, and may include notable changes in industry dynamics, fi-
ancial reporting standards, or economic situations. The training data
before the financial crisis) and possible test data (perhaps obtained
fter the financial crisis or in another economic environment) can
ecome disconnected as a result of these temporal shifts, causing
he model to perform poorly when applied to new data. In fact, the
hallenge of dataset shifts affecting the performance of supervised
earning predictors has necessitated the development of a framework
ike DetectShift1 to quantify and address these shifts (Maia Polo,
zbicki, Lacerda, Ibieta-Jimenez, & Vicente, 2023). There are three
ain types of data shifts that can affect model performance: Covariate

hift, where the input features’ distribution changes between the train-
ng and testing datasets while the output variable’s distribution remains
he same, potentially leading to biased predictions; Concept Shift,


T. Ansah-Narh et al.

𝑘
s
t
1
p

w
b
s
m
∑

a
B
i
s
c

Expert Systems With Applications 258 (2024 ) 125133 
where the relationship between input features and the output variable
changes due to factors like economic conditions or industry trends; and
Prior Probability Shift, where the distribution of the target variable
changes, affecting the model’s predictive accuracy, particularly in cases
of imbalanced data (Quiñonero-Candela, Sugiyama, Schwaighofer, &
Lawrence, 2022).

Second, the diversity of the dataset across different industries high-
lights the necessity of understanding industry-specific dynamics. Each
industry can experience unique shifts due to various factors such as
technological advancements, regulatory changes, and evolving market
conditions. These shifts might not be captured in the training set,
leading to a model that performs well on historical data but poorly
on new data reflecting current industry conditions. For example, a
model trained on pre-2008 financial data might not account for post-
crisis regulatory changes that significantly impact financial reporting
and risk assessment. Similarly, a model trained on manufacturing data
from an era of manual processes may struggle to predict outcomes in
a modern, highly automated industry. The possibility of unanticipated
shifts in industry conditions or trends underscores the importance
of domain adaptability. Domain adaptation techniques allow models
to adjust to new industry environments by learning from both his-
torical and current data. This involves identifying industry-specific
features that remain relevant over time, weighting instances to pri-
oritise more recent and relevant data, and developing representations
that are robust to changes in industry dynamics. By incorporating
these techniques, models can generalise well across a range of industry
situations, maintaining their accuracy and reliability even as industry
conditions evolve.

Finally, a typical criticism of biased training data in the setting
of machine learning has been identified, pointing out the possible
influence of bias on the performance of the model, especially with
respect to minority target labels.

The DAL approach proves invaluable in tackling data temporal shifts
by strategically aligning features, weighting instances, and crafting
domain-invariant representations. These techniques ensure that ma-
chine learning models adapt effectively to changes in the temporal
distribution of data. By selecting and transforming features that re-
main stable across different time periods, assigning higher weights to
instances that reflect the target domain’s temporal characteristics, and
learning representations insensitive to temporal variations, models be-
come more resilient to temporal shifts. Additionally, harnessing transfer
learning strategies such as pre-training on diverse temporal data and
fine-tuning using domain adaptation methods enhances the model’s
ability to generalise and perform well across varying temporal contexts.

2.2. Handling outliers

Removing outliers from financial ratios before bankruptcy detection
is essential to ensure the accuracy and reliability of the analysis.
Outliers can skew statistical measures, distort trends, and mislead
the interpretation of financial data, which can have significant im-
plications, especially in critical decisions like bankruptcy prediction.
A recent study conducted by Nyitrai and Virág (2019) highlighted
the necessity of financial indicators in predicting bankruptcy and the
challenges posed by outliers in these indicators. The authors explored
different approaches to handling outliers, specifically focusing on win-
sorisation and the use of CHAID4-based categorisation of financial
ratios.

In this work, we adopted a hybrid Bayesian change point and Ham-
pel identifier (BCP-HI) method (Pehlivan, 2024). This amalgamation
scheme can potentially identify outliers more precisely than winsorisa-
tion. Winsorisation replaces extreme values with values from the tails
of the distribution, which can mask subtle outliers, especially when

4 Chi-squared Automatic Interaction Detector.
 s

5 
dealing with multiple change points. The first component, BCP analysis,
helps pinpoint these potential change points, allowing the HI portion
to better target outliers within those segments. Additionally, the hybrid
method utilises the data itself to identify outliers. By modelling the data
with a normal distribution before and after potential change points, it
can compute unique probabilities for each data point, leading to a more
data-driven approach to outlier detection compared to winsorisation’s
fixed threshold approach.

2.2.1. The integration of BCP-HI method
The BCP-HI outlier detection depicted in Algorithm 1 begins by

preprocessing the financial data and determining initial parameters
such as the window size (𝑤) and the number of change points (𝑐𝑝). The
notation 𝑤 represents the size of the window used for outlier detection,
while the number of 𝑐𝑝 indicates the expected number of shifts in the
data distribution. These parameters are crucial for the effectiveness of
the algorithm in identifying outliers accurately.

Algorithm 1 BCP-HI Outlier Detection Algorithm
1: Input: Financial ratio data
2: Output: Refined financial ratio with outliers corrected
3: procedure BCP-HI(𝑑𝑎𝑡𝑎)
4: Preprocess the input data
5: Determine initial parameters: window size 𝑤, number of change

points 𝑐𝑝
6: Perform Bayesian Change Point (BCP) analysis to identify

change points
7: Divide data into subsegments based on change points
8: for each subsegment do
9: Calculate median and MAD for the subsegment

10: Apply Hampel Identifier (HI) for outlier detection
11: Replace identified outliers with NaN values
12: Apply median filter of size 𝑤 to correct outliers
13: end for
14: return Refined time series data with outliers corrected
15: end procedure

Let denote the financial metric dataset over a given period as 𝑫 =
{𝑥1, 𝑥2,… , 𝑥𝑛} with 𝑛 data points. The hybrid method first employs
BCP analysis to detect significant shifts (𝑐𝑝𝑠) in the financial metric
data. These change points represent instances where there is a notable
change in the underlying distribution or behaviour of the financial met-
ric. Consider the set of change points as 𝑐𝑝𝑠 = {𝑐𝑝1, 𝑐𝑝2,… , 𝑐𝑝𝑘}, where

signifies the number of detected change points. The BCP analysis
egments the financial data into subsegments with distinct distribu-
ion properties. Suppose 𝑆𝑖 represent the 𝑖th subsegment, where 𝑖 =
, 2,… , 𝑘+1. Each subsegment 𝑆𝑖 is delineated by two adjacent change
oints 𝑐𝑝𝑖 and 𝑐𝑝𝑖+1, such that 𝑆𝑖 = {𝑥𝑐𝑝𝑖 , 𝑥𝑐𝑝𝑖+1−1,… , 𝑥𝑐𝑝𝑖+1−1}. This first

approach incorporates Bayesian probabilistic modelling to estimate the
likelihood of each data point belonging to a specific subsegment given
the observed financial metric data as defined in Eq. (1);

𝑃 (𝑆𝑖|𝑥𝑗 ) =
𝑃 (𝑥𝑗 |𝑆𝑖) ⋅ 𝑃 (𝑆𝑖)

∑𝑘+1
𝑚=1 𝑃 (𝑥𝑗 |𝑆𝑚) ⋅ 𝑃 (𝑆𝑚)

, (1)

here 𝑃 (𝑆𝑖|𝑥𝑗 ) represents the posterior probability of data point 𝑥𝑗
elonging to subsegment 𝑆𝑖, 𝑃 (𝑥𝑗 |𝑆𝑖) signifies the likelihood of ob-
erving data point 𝑥𝑗 under the distribution parameters of subseg-
ent 𝑆𝑖, 𝑃 (𝑆𝑖) denotes the prior probability of subsegment 𝑆𝑖, and
𝑘+1
𝑚=1 𝑃 (𝑥𝑗 |𝑆𝑚) ⋅ 𝑃 (𝑆𝑚) calculates the weighted sum of likelihoods

cross all subsegments, ensuring that probabilities sum to 1. This
ayesian methodology facilitates the probabilistic detection of signif-

cant changes (change points) in financial metric data and enables the
egmentation of the data into subsegments with distinct distributional
haracteristics, thereby enhancing the analysis and processing of each

ubsegment independently.


T. Ansah-Narh et al. Expert Systems With Applications 258 (2024 ) 125133 
Fig. 1. Outlier detection and correction in financial metrics using a hybrid BCP-HI scheme. The first column shows the original time series data for each financial metric before
outlier detection. The second Column highlights the outliers (in red circles) detected in the original data using the HI method. Column three shows the time series data after
outliers have been removed.
Within each subsegment 𝑆𝑖, the algorithm calculates the median,
denoted as median(𝑆𝑖). The Median Absolute Deviation (MAD) is then
computed, which measures the dispersion of the data points within the
subsegment. The MAD is defined as the median of the absolute devia-
tions from the median of the subsegment, mathematically expressed as
MAD(𝑆𝑖) = median(|𝑥𝑗 − median(𝑆𝑖)|) for all 𝑥𝑗 in 𝑆𝑖. After computing
the median and MAD for each subsegment, the algorithm applies the
HI for outlier detection. The HI flags a data point 𝑥𝑗 in subsegment
𝑆𝑖 as an outlier if its absolute deviation from the median exceeds
a specified threshold. This threshold is typically set as a multiple
of the MAD. Specifically, a data point 𝑥𝑗 is considered an outlier if
|𝑥𝑗 − median(𝑆𝑖)| > 𝑘 × MAD(𝑆𝑖), where 𝑘 is a constant multiplier
that determines the sensitivity of the outlier detection. By applying
this criterion, the algorithm effectively identifies outliers within each
subsegment based on the robust statistical properties of the median and
MAD.

Identified outliers in the financial ratios are replaced with NaN
values. Let {𝑜1, 𝑜2,… , 𝑜𝑚} denote the indices of the identified outliers
in the data 𝐷. For each outlier 𝑥 in the dataset, we set 𝑥 = NaN.
𝑜𝑖 𝑜𝑖

6 
To correct for these outliers, a median filter of size 𝑤 is applied to
the time series. The median filter processes the time series by sliding
a window of size 𝑤 across the data points. For each window position,
the median value of the data points within the window is computed.
If the window is centred at index 𝑗, the window includes data points
{𝑥𝑗−⌊𝑤∕2⌋, 𝑥𝑗−⌊𝑤∕2⌋+1,… , 𝑥𝑗+⌊𝑤∕2⌋}. The median of these data points,
excluding NaN values, is used to replace the NaN value at index 𝑗.
Mathematically, for each outlier index 𝑜𝑖, the corrected value is given
by:

𝑥𝑜𝑖 = median({𝑥𝑜𝑖−⌊𝑤∕2⌋, 𝑥𝑜𝑖−⌊𝑤∕2⌋+1,… , 𝑥𝑜𝑖+⌊𝑤∕2⌋} ⧵ {NaN}) (2)

This process ensures that outliers are replaced with more representative
values based on the local neighbourhood of data points, effectively
smoothing the financial ratios while preserving important trends and
patterns. The plots in Fig. 1 show the results of the hybrid outlier
detection method applied to several financial metrics over the observed
period of time. The 𝑦-axis represents the values of the financial metric,
and the 𝑥-axis represents the metric indices. Take note of how severe
outliers displayed in the figure are identified and calibrated leveraging
on the BCP-HI statistical technique.


T. Ansah-Narh et al. Expert Systems With Applications 258 (2024 ) 125133 
Fig. 2. A graph of correlation matrix describing the linear association between financial variables. Each element in the triangular matrix shows the correlation coefficient between
two variables.
2.3. Bivariate analysis

We propose a correlation matrix (in Fig. 2) to thoroughly exam-
ine the links between financial ratios. A quantitative indicator of the
relationships between these financial ratios is the Pearson correlation
coefficient (𝑟), which can be calculated using the following formula
in Eq. (3) (Cohen et al., 2009):

𝑟 =
∑𝑛

𝑖=1(𝑋𝑖 − �̄�)(𝑌𝑖 − 𝑌 )
√

∑𝑛
𝑖=1(𝑋𝑖 − �̄�)2 ⋅

∑𝑛
𝑖=1(𝑌𝑖 − 𝑌 )2

. (3)

The individual data points of the two financial ratios are represented
by 𝑋𝑖 and 𝑌𝑖, their respective means are shown by �̄� and 𝑌 , and the
total number of data points is indicated by 𝑛. We observe from the
matrix plot that most of the financial variables have values that are
approaching 0. This closeness suggests that there is little correlation
between any two of the chosen variables, confirming that there is no
discernible multicollinearity between the independent financial ratios.
These observed characteristics underscore the necessity of utilising all
financial ratios to ascertain their collective importance in determining
relevant features.

2.4. Genetic algorithm for feature selection

Genetic algorithms, or GAs, have proven time and time again to be
remarkably efficient at resolving a wide range of optimisation issues.
7 
(Nolfi, Floreano, Miglino, Mondada, et al., 1994) documented their
achievements in a variety of applications, including sophisticated robot
motion optimisation, control system parameter fine-tuning, and robotic
system path planning. In addition to their conventional uses, GAs
are flexible in machine learning, especially when it comes to feature
selection. This versatility makes GA a valuable tool for systematic
navigation in situations with complex combinations of features. The
heuristic search algorithm uses principles inspired by natural selection
and evolution to iteratively refine a subset of traits and gradually
converge to an optimal set. Adopting this approach effectively balances
model complexity and prediction metrics such as F1-score, precision,
recall, and AUC-ROC, while also improving classifier performance. In
the present work, we investigate the GA algorithm used to determine
the ideal subset of features on the financial dataset that maximises the
model classifier’s accuracy.

Let 𝐗 denote the feature matrix of a dataset with 𝑁 instances and
𝐿 features, and 𝐲 represent the corresponding target label vector. The
following characteristics describe the GA:

(i) Population size (𝑁): Representing the number of individuals in
each generation of the GA. The notation 𝑁 ∈ Z+ is a key factor
in determining population diversity and the trade-off between
exploration and exploitation.

(ii) Offspring production (𝜆): The number of offspring produced in
each generation, 𝜆 determines the rate at which new genetic


T. Ansah-Narh et al.

p
m
f
o
t
v
m
p
b

Expert Systems With Applications 258 (2024 ) 125133 
material is introduced into the population. Like 𝑁 , 𝜆 is also a
positive integer: 𝜆 ∈ Z+.

(iii) Crossover probability (𝑃𝑐): Reflecting the likelihood of mating
occurring between two individuals. The symbol 𝑃𝑐 influences the
exploration–exploitation balance by controlling the exchange
of genetic material between parents. Mathematically, 𝑃𝑐 is a
probability value between 0 and 1: 𝑃𝑐 ∈ [0, 1].

(iv) Mutation probability (𝑃𝑚): This parameter represents the likeli-
hood that a bit in an individual’s binary string will be flipped.
Mutation introduces genetic diversity and prevents premature
convergence. Mathematically, 𝑃𝑚 is a probability value between
0 and 1: 𝑃𝑚 ∈ [0, 1].

(v) Number of generations (𝐺): 𝐺 signifies the total iterations or
epochs for which the genetic algorithm will run, determining
how many times the evolutionary process (selection, crossover,
and mutation) is applied. Mathematically, 𝐺 is a positive integer:
𝐺 ∈ Z+.

Each individual in the population is represented by a binary string
of length 𝐿. The binary string encodes the presence (1) or absence
(0) of each feature in the subset. Mathematically, an individual 𝐼 is
represented as

𝐼 = [𝑔1, 𝑔2,… , 𝑔𝐿]. (4)

Here, 𝑔𝑖 is the 𝑖th gene in the binary string, indicating the presence
or absence of the 𝑖th feature. The binary encoding provides a concise
and flexible representation of feature subsets. Each gene 𝑔𝑖 is a binary
variable defined as 𝑔𝑖 ∈ {0, 1}. The initial population is then formed by
randomly generating binary strings of length 𝐿, ensuring diversity in
the initial set of individuals. The initialisation of an individual 𝐼 can
be expressed as

𝐼𝑖 ∼ Bernoulli(0.5) for 𝑖 = 1, 2,… , 𝐿. (5)

Algorithm 2 Genetic Algorithm for Feature Selection
1: Input: Dataset features 𝐗, target labels 𝐲, Population size 𝑁 ,

Lambda 𝜆, Crossover probability 𝑃𝑐 , Mutation probability 𝑃𝑚,
Number of generations 𝐺

2: Output: Selected features selected_features_ga
3: Initialise Population:
4: Each individual 𝐼 is represented as a binary string of length 𝐿,

where 𝐿 is the number of features.
5: Initialise Genetic Algorithm Set:
6: Define fitness function fitness_function, crossover function,

mutation function, and selection function
7: Initialise Population:
8: Create a population of 𝑁 individuals
9: Evaluate Initial Population:

10: Evaluate the fitness of each individual using the fitness function
11: for 𝑡 = 1 to 𝐺 do
12: Apply Crossover and Mutation:
13: Generate offspring using crossover and mutation operations

with probabilities 𝑃𝑐 and 𝑃𝑚
14: Evaluate Offspring:
15: Evaluate the fitness of each offspring using the fitness

function
16: Select Individuals for the Next Generation:
17: Use the selection function to choose individuals for the next

generation based on their fitness
18: end for
19: Select Best Individual:
20: Choose the individual with the highest fitness as the best

individual
21: Extract Selected Features:
22: Extract indices of selected features from the best individual:

selected_features_ga
8 
The expression Bernoulli(𝑝) represents a Bernoulli distribution with
robability 𝑝, and 𝐼𝑖 is the 𝑖th gene in the binary string. Keeping in
ind that genetic operations (namely, crossover and mutation) are

undamental processes in genetic algorithms that shape the evolution
f the population over generations. In the context of feature selection,
hese operations manipulate the binary string representations of indi-
iduals to explore new solutions. Crossover involves combining genetic
aterial from two parent individuals to create offspring. The crossover
oint is randomly selected along the binary string, and genetic material
eyond that point is swapped between parents. For instance, let 𝐼1 and

𝐼2 be two parent individuals with binary string representations:

𝐼1 = [𝑔11 , 𝑔
1
2 ,… , 𝑔1𝑖 ,… , 𝑔1𝐿],

𝐼2 = [𝑔21 , 𝑔
2
2 ,… , 𝑔2𝑖 ,… , 𝑔2𝐿].

(6)

The crossover point 𝐶 is randomly selected, and offspring 𝑂1 and 𝑂2
are created:
𝑂1 = [𝑔11 , 𝑔

1
2 ,… , 𝑔1𝐶 , 𝑔

2
𝐶+1, 𝑔

2
𝐶+2,… , 𝑔2𝐿],

𝑂2 = [𝑔21 , 𝑔
2
2 ,… , 𝑔2𝐶 , 𝑔

1
𝐶+1, 𝑔

1
𝐶+2,… , 𝑔1𝐿].

(7)

The offspring replaces the genetic material beyond the crossover point
with the parents. After the crossover operation, the next step is the
application of mutation. Mutation involves randomly changing the
value of one or more genes in an individual. This introduces genetic
diversity into the population and prevents premature convergence to
suboptimal solutions. In this work, the mutation is applied to each gene
independently with a mutation probability 𝑃𝑚:

MutatedGene𝑖 =
{

1 − 𝑔𝑖, with probability 𝑃𝑚

𝑔𝑖, with probability 1 − 𝑃𝑚

This mutated gene replaces the original gene in the binary string.
If, for example, a mutation with 𝑃𝑚 occurs at position 𝑗 = 5 in 𝑂1, the
binary string might be updated as follows:

Original 𝑂1 = [𝑔11 , 𝑔
1
2 , 𝑔

1
3 , 𝑔

2
4 , 𝑔

2
5 ,… , 𝑔2𝐿],

Mutated 𝑂1 = [𝑔11 , 𝑔
1
2 , 𝑔

1
3 , 𝑔

2
4 , 𝟏,… , 𝑔2𝐿]

(8)

Note that after the crossover and mutation operations, the popula-
tion is updated with the newly created individuals. These individuals
have genetic material inherited from their parents, with potential
variations introduced through mutation. Now, to provide a quantitative
measure of the individual’s performance, we introduce a fitness func-
tion. The fitness function evaluates how well an individual performs
the task at hand. In the context of feature selection, the fitness function
measures the effectiveness of a subset of features. The GA aims to find
the subset of features 𝑆 that maximises the accuracy of a machine
learning model. The fitness function is typically defined as the accuracy
achieved by the model trained on the dataset with the selected features:

Fitness (𝐼) = Accuracy(Model(𝑋train,𝑆 , 𝑦train),

𝑋test,𝑆 , 𝑦test)
(9)

Here, 𝑋train and 𝑦train are the training features and labels, and 𝑋test
and 𝑦test are the test features and labels. Model is the machine learning
model used in this case the Extra Trees classifier, and 𝑆 represents the
selected features according to the binary string 𝐼 . The ultimate goal
is to maximise the fitness function, thereby identifying the subset of
features that results in the highest accuracy. Practical development
and execution of the GA are depicted in Algorithm 2. The visual
representation in Fig. 3 provides a comprehensive view of the key
metrics crucial for assessing financial stability and forecasting potential
financial distress. Additionally, the lollipop chart not only showcases
the significance of each ratio but also emphasises their hierarchical
importance in the context of bankruptcy prediction. Relative feature
importance for each feature is determined by averaging its importance
across multiple GA runs and normalising these averages.

Next, we discuss the various machine learning models used in the

study.


T. Ansah-Narh et al.

s

Expert Systems With Applications 258 (2024 ) 125133 
Fig. 3. A lollipop chart illustrating the distribution of the top 50 financial ratios, highlighting their relative importance in predicting the likelihood of bankruptcy.
t
p
t
i
v

𝐹

2

f
b
i
i
m
a

2.5. Machine learning models

2.5.1. Random Forest (RF)
The RF algorithm is an effective ensemble learning method that

avoids over-fitting by combining random feature selection with bag-
ging to manipulate complex data patterns. The resulting mathematical
framework illustrates the essential ideas that govern how successful
random forests are as machine learning techniques.

Let  denote the original dataset housing  samples. For every de-
cision tree within the ensemble (comprising a total of  trees), a boot-
strap sample 𝑏 of size  is created by iteratively selecting samples
with replacement from  such that 𝑏 = {(𝐱1, 𝑦1), (𝐱2, 𝑦2),… , (𝐱𝑁 , 𝑦𝑁 )}.
Each decision tree undergoes training on its respective bootstrap sam-
ple, yielding  independently trained trees. This algorithm introduces
an additional level of randomness by including only a subset of fea-
tures at each node when building the decision tree. This deliberate
selection of random features aims to decorrelate the trees, prevent
excessive similarity, and allow us to capture different aspects of the
data. Mathematically, at each node 𝑗 of a decision tree, a random
ubset of features 𝑚 is chosen from the complete feature set 𝑀 such
 o

9 
that 𝑚 ≤ 𝑀 . The purpose of this stochastic selection is to increase the
ree’s diversity, fortify the algorithm’s resilience, and enhance its ca-
acity for generalisation. Taking into account the ensemble of decision
rees 𝑓1(𝐱), 𝑓2(𝐱),… , 𝑓𝑇 (𝐱), the final prediction 𝐹 (𝐱) for a new input 𝐱
s determined through a combination mechanism, typically involving
oting for classification as given in Eqs. (10).

(𝐱) = mode(𝑓1(𝐱), 𝑓2(𝐱),… , 𝑓𝑇 (𝐱))

(for classification) (10)

.5.2. Support Vector Machine (SVM)
The basic idea behind SVM is finding the best hyperplane in the

eature space to separate different classes in the data effectively. This
ecomes especially important when dealing with binary classification
ssues, where the classes are usually designated as 0 and 1. Since
ts introduction by Vapnik (1982), SVM has been a major force in
achine learning systems, outperforming many of its competitors in
short amount of time. Its ascendancy is attributed to the dual factors

f simplicity and superior performance, as evidenced by studies such


T. Ansah-Narh et al.

f

M

i
n
i

T
w
e
r

2

s
b
b
m
N
t
a
s
b
2
r
f
p

g
p
c
t

T
n
v

Expert Systems With Applications 258 (2024 ) 125133 
as Peng and Xu (2013). The widespread adoption of SVM is under-
scored by its successful application across diverse research domains.
Noteworthy fields where SVM has demonstrated its efficacy include
finance, as exemplified by Luo, Yan, and Tian (2020), Tay and Cao
(2001); chemistry, as explored by Li, Liang, and Xu (2009); renewable
energy prediction, with contributions from Zendehboudi, Baseer, and
Saidur (2018); medicine, as demonstrated by Wang, Zheng, Yoon, and
Ko (2018); text classification, a domain addressed by Tong and Koller
(2001); and face recognition, with seminal work by Osuna, Freund, and
Girosit (1997).

Given a set of training data points (𝐱1, 𝑦1), (𝐱2, 𝑦2),… , (𝐱𝑛, 𝑦𝑛), where
𝐱𝑖 is the feature vector for the 𝑖th data point, and 𝑦𝑖 is the corresponding
class label such that 𝑦𝑖 ∈ {0, 1}. The decision function of an SVM is
given by Eq. (11):

𝑓 (𝐱) = 𝐰 ⋅ 𝐱 + 𝑏 (11)

Here, 𝐰 is the weight vector, 𝐱 is the input feature vector, and 𝑏 is
the bias term. The goal of SVM is to find the optimal hyperplane that
maximises the margin between the two classes. The margin is defined as
the distance between the hyperplane and the nearest data point from
each class. Mathematically, the margin (𝑀) is given by the formula
in Eq. (12):

𝑀 = 2
‖𝐰‖

, (12)

where ‖𝐰‖ is the Euclidean norm of the weight vector. To ensure
that the SVM correctly classifies the training data and maximises the
margin, it must satisfy the following constraints:

i. For each positive training example (𝐱𝑖 with 𝑦𝑖 = 1):

𝐰 ⋅ 𝐱𝑖 + 𝑏 ≥ 1

ii. For each negative training example (𝐱𝑖 with 𝑦𝑖 = 0):

𝐰 ⋅ 𝐱𝑖 + 𝑏 < 0

The above constraints can be combined into a single expression to get
Eq. (13):

𝑦𝑖(𝐰 ⋅ 𝐱𝑖 + 𝑏) ≥ 1 (13)

This is the standard formulation of the linear SVM optimisation prob-
lem. Nonetheless, when confronted with intricate systems, such as
the one under consideration, we enhance the foundational principles
established in linear scenarios by incorporating a kernel function.

Employing SVMs with kernel methods involves working in high-
dimensional feature spaces, allowing for the construction of non-linear
decision boundaries. The decision function for SVM with a kernel 𝐾
can be expressed as Eq. (14):

𝑓 (𝐱) =
𝑛
∑

𝑖=1
𝛼𝑖𝑦𝑖𝐾(𝐱𝑖, 𝐱) + 𝑏 (14)

Here, 𝛼𝑖 are the Lagrange multipliers obtained during the optimisation
process, and 𝐾(𝐱𝑖, 𝐱) is the kernel function. The optimisation problem
or SVM with kernel methods is given by:

inimise 1
2

𝑛
∑

𝑖=1

𝑛
∑

𝑗=1
𝛼𝑖𝛼𝑗𝑦𝑖𝑦𝑗𝐾(𝐱𝑖, 𝐱𝑗 ) −

𝑛
∑

𝑖=1
𝛼𝑖

subject to the constraints:
𝑛
∑

𝑖=1
𝛼𝑖𝑦𝑖 = 0

0 ≤ 𝛼𝑖 ≤ 𝐶 for 𝑖 = 1, 2,… , 𝑛

where 𝐶 is the regularisation parameter that controls the trade-off
between achieving a low training error and a large margin. The decision
boundary is determined by the support vectors, which are the data
points 𝐱 corresponding to non-zero Lagrange multipliers 𝛼 . The kernel
𝑖 𝑖 p

10 
function 𝐾(𝐱𝑖, 𝐱𝑗 ) implicitly computes the dot product of the data points
n a higher-dimensional space, allowing SVMs to capture complex,
on-linear relationships in the data. Commonly used kernel functions
nclude:

i. Linear Kernel (𝐾(𝐱𝑖, 𝐱𝑗 ) = 𝐱𝑖 ⋅ 𝐱𝑗): Corresponds to the standard
linear SVM described in Eq. (11).

ii. Polynomial Kernel (𝐾(𝐱𝑖, 𝐱𝑗 ) = (𝐱𝑖 ⋅ 𝐱𝑗 + 𝑐)𝑑): Introduces non-
linearity through polynomial terms.

iii. Radial Basis Function or Gaussian Kernel
(

𝐾(𝐱𝑖, 𝐱𝑗 ) = exp

(

−
‖𝐱𝑖 − 𝐱𝑗‖2

2𝜎2

))

(Ding, Liu, Yang, & Cao,

2021): Provides a smooth, non-linear decision boundary.
iv. Sigmoid Kernel (𝐾(𝐱𝑖, 𝐱𝑗 ) = tanh(𝛽𝐱𝑖 ⋅ 𝐱𝑗 + 𝜃)): Represents a

hyperbolic tangent function, introducing non-linearities.

hese expressions capture the essence of SVM with kernel methods,
hich leverage the mathematical concept of the kernel to implicitly op-
rate in a high-dimensional space, enabling the modelling of non-linear
elationships in the data.

.5.3. k-nearest neighbours (k-NN)
The basic idea underlying the nearest neighbour algorithm is rather

imple: instances are grouped based on the class of their nearest neigh-
ours. It is frequently advantageous to take into account not just one
ut several neighbours in order to increase the robustness of this
ethod. Therefore, the commonly known approach is the k-Nearest
eighbour (k-NN) algorithm, where the class is determined based on

he consensus of k nearest neighbours. The algorithm requires the
vailability of training examples during runtime, meaning they must be
tored in memory at the time of execution. Consequently, it can also
e referred to as a memory-based algorithm (Cunningham & Delany,
020). As the real learning or model construction is deferred until
untime when predictions are needed, this technique is considered a
orm of lazy learning, making it flexible and adaptive to varying data
atterns encountered during runtime.

Analytically, we can express the k-NN algorithm by considering a
iven dataset 𝐷 with 𝑛 data points in a feature space, where each data
oint 𝑖 is represented by a feature vector 𝐱𝑖 and is associated with a
lass label 𝑦𝑖 (for classification). For a new data point 𝐱 that we want
o classify or predict, the k-NN algorithm operates as follows:

i. Distance Metric:- Let 𝑑(𝐱𝑖, 𝐱) be a distance metric that measures
the distance between data point 𝐱𝑖 and the query point 𝐱. Com-
mon distance metrics include Euclidean distance, Manhattan
distance, or other suitable measures based on the problem at
hand.

ii. Nearest Neighbours:- Identify the k nearest neighbours of the
query point 𝐱 from the dataset 𝐷 based on the chosen distance
metric. Let 𝑁(𝐱) represent the set of indices of these k nearest
neighbours.

iii. For k-NN classification:- For classification tasks, assign the class
label 𝑦 to the query point based on majority voting among the
class labels of its k nearest neighbours as given in Eq. (15):

𝑦 = argmax
𝑐

∑

𝑖∈𝑁(𝐱)
𝛿(𝑦𝑖, 𝑐) (15)

where 𝛿(𝑦𝑖, 𝑐) is the Kronecker delta function that equals 1 if
𝑦𝑖 = 𝑐 and 0 otherwise.

he k-NN algorithm essentially relies on the assumption that points
earby in the feature space are likely to have similar labels or target
alues. The choice of the distance metric and the value of k are critical

arameters that influence the algorithm’s performance.


T. Ansah-Narh et al.

H
t
t
c
i
c
a

s
i
d
a
t
t
i
m

2

i
s
i
G
D
t
u

t
a
b
i

𝐹

w
w
l
m
r
u

𝐹

w
f
t
g
i

t
p
b
I
a

𝑓

T
i
a

i
e
s
s
g
g
f
t
t
e
o

a
i

𝑃

p
d

p
m
r
p
m
S
g

2

f
A
a
w
i
d
i
c

Expert Systems With Applications 258 (2024 ) 125133 
2.5.4. Logistic Regression (LR)
The LR is a widely used statistical method in machine learning for

binary classification tasks. This model predicts the probability that an
instance belongs to a particular class, and it is particularly useful in
scenarios where the dependent variable is categorical and binary, such
as predicting whether an event will occur or not. Statistically, the LR
model is expressed in Eq. (16):

𝑃 (𝑌 = 1) = 1
1 + 𝑒−(𝛽0+𝛽1𝑋1+𝛽2𝑋2+⋯+𝛽𝑛𝑋𝑛)

(16)

ere, 𝑃 (𝑌 = 1) represents the probability of the positive class, 𝛽0 is
he intercept, 𝛽1, 𝛽2,… , 𝛽𝑛 are the coefficients, and 𝑋1, 𝑋2,… , 𝑋𝑛 are
he feature values. Keep in mind that each coefficient represents the
hange in the log-odds of the dependent variable for a one-unit change
n the corresponding independent variable, holding other variables
onstant. Also, the intercept represents the log-odds of the event when
ll independent variables are zero.

This method is advantageous in machine learning for several rea-
ons. Firstly, LR inherently produces probabilities, making the model
nterpretable and allowing decision-makers to understand the confi-
ence level of predictions. Additionally, the LR method handles linear
nd non-linear relationships between features and the log-odds of
he outcome, providing flexibility in capturing complex patterns in
he data. Altman’s Z-score, a popular bankruptcy prediction model,
ncorporates LR to assess the financial health of companies based on
ultiple financial ratios (Altman, 1968).

.5.5. Gradient Boosting (GB)
The initial GB approach, often referred to as the GB Machine, was

ntroduced by Friedman (1999, 2002). This method acts as the corner-
tone algorithm that lays the groundwork for subsequent advancements
n boosting techniques such as XGBoost (Chen & Guestrin, 2016), Light-
BM (Ke et al., 2017), and CatBoost (Prokhorenkova, Gusev, Vorobev,
orogush, & Gulin, 2018). GB is a powerful machine learning ensemble

echnique that combines the predictions of multiple weak learners,
sually decision trees, to create a robust and accurate model.

Given a training dataset (𝑥𝑖, 𝑦𝑖) where 𝑥𝑖 represents the input fea-
ures and 𝑦𝑖 the corresponding target labels, the objective is to construct
n ensemble of weak learners ℎ(𝑥). The final prediction is obtained
y combining these weak learners in an additive manner as presented
n Eq. (17):

(𝑥) =
𝑀
∑

𝑚=1
𝛽𝑚ℎ𝑚(𝑥) (17)

here 𝑀 denotes the number of weak learners, 𝛽𝑚 represents the
eight assigned to each learner, and ℎ𝑚(𝑥) is an individual weak

earner. The fundamental idea behind GB is to iteratively fit new
odels to the errors of the existing ensemble, thereby reducing the

esidual errors in predictions. In each iteration 𝑚, the model 𝐹 (𝑥) is
pdated to get Eq. (18):

𝑚(𝑥) = 𝐹𝑚−1(𝑥) + 𝜆𝑚 ⋅ ℎ𝑚(𝑥) (18)

here 𝐹𝑚(𝑥) is the composite model at iteration 𝑚, 𝐹𝑚−1(𝑥) is the model
rom the previous iteration, and 𝜆𝑚 is the learning rate that controls
he contribution of each weak learner. At each iteration, the negative
radient of the loss function with respect to the current model 𝐹𝑚−1(𝑥)
s calculated, denoted as − 𝜕𝐿(𝐹𝑚−1(𝑥))

𝜕𝐹𝑚−1(𝑥)
. The weak learner ℎ𝑚(𝑥) is then

trained to fit the negative gradient, minimising the local approximation
of the loss. This iterative process continues until a predefined number
of weak learners are incorporated into the ensemble.

The effectiveness of GB in bankruptcy prediction has been demon-
strated in various studies. For instance, in a comprehensive analysis
by Carmona, Climent, and Momparler (2019), GB algorithms were
employed to develop predictive models for bankruptcy, showcasing
superior performance compared to other machine learning techniques.
The study emphasised the importance of ensemble methods, such as GB,
in achieving high predictive accuracy and robustness in the context of

bankruptcy prediction. w

11 
2.6. Bayesian hyperparameter tuning

The basis of Bayesian hyperparameter tuning is Bayesian optimisa-
tion–a stochastic model-based scheme for maximising costly opaque
functions. The goal of this work is to determine which combination
of hyperparameters maximises an objective function (in this case, the
performance metric for the machine learning models discussed in Sec-
tion 2.5.

The first step of this method is to define the objective function
𝑓 (𝑥) with hyperparameters denoted by 𝑥. For instance, in machine
learning classification, the model’s hyperparameters are represented by
𝑥, whilst the accuracy, F1 score, or any other assessment metric could
be represented by 𝑓 (𝑥). We adopted the famous Gaussian process ()
o model the objective function. This is because the function creates a
robabilistic estimate with associated uncertainty by finding the distri-
ution of likely values of the objective function at a particular location.
t consists of a covariance function 𝜅(𝑥, 𝑥′) and a mean function 𝑚(𝑥)
s depicted in Eq. (19).

(𝑥) ∼ (𝑚(𝑥), 𝜅(𝑥, 𝑥′)) (19)

he notation 𝜅(𝑥, 𝑥′) models the correlation between various points
n the input space, whereas the 𝑚(𝑥) records the objective function’s
verage behaviour.

The next step is to iteratively select the next point for evaluation
n the parameter space based on an acquisition function. This strat-
gy aims to maintain a balance between exploration, which involves
ampling in areas of high uncertainty, and exploitation, which involves
ampling in areas where optimal solutions are expected to exist. The
oal is to systematically navigate the search space through an intelli-
ent selection of evaluation points and effectively combine the search
or new information with the use of existing knowledge to control
he optimisation process. This careful balance allows the algorithm
o efficiently explore unknown regions of the parameter space whilst
xploiting regions where the objective function is likely to reach its
ptimal value.

After specifying the new hyperparameter configuration and evalu-
ting the objective function, the  model is updated using Bayesian
nference. The updated  posterior distribution is given by Eq. (20):

(𝑓 (𝑥)|𝑫) =
𝑃 (𝑫|𝑓 (𝑥)) ⋅ 𝑃 (𝑓 (𝑥))

𝑃 (𝑫)
(20)

Here, 𝑫 represents the data collected so far, 𝑃 (𝑫|𝑓 (𝑥)) is the
robability of the data given in the  model, 𝑃 (𝑓 (𝑥)) is the prior
istribution of  and 𝑃 (𝑫) is the marginal probability.

Table 3 provides a comprehensive summary of the optimal hy-
erparameters for the selected machine learning algorithms. For each
odel, details are provided on the specific hyperparameters tuned, the

ange or choices considered during the tuning process, and the best
arameter values found. For example, the best parameters of the RF
odel include 64 estimators and a maximum depth of 6, while the

VM model achieved optimal performance with a 𝐶 value of 1e6 and a
amma value of 0.00143 using an RBF kernel.

.7. Domain adaptation learning

In our developed model, we addressed the challenge of imbalanced
inancial data by partitioning the information generated by the Genetic
lgorithm (GA) into source and target domains. The source domain
imed to maintain a balanced representation of the majority class,
hile the target domain deliberately mirrored the imbalances present

n the initial dataset. To establish a balanced subset for the source
omain, we applied a clustering approach to the imbalanced train-
ng dataset, specifically focusing on the majority class. The cluster
entroids, representing characteristic samples of the dominant class,

ere retained. This clustering process utilised the imblearn Python


T. Ansah-Narh et al.

m
p
d

t
t
l
(
a

Expert Systems With Applications 258 (2024 ) 125133 
Table 3
Best hyperparameters for different models.

Algorithm/Model name Hyperparameter Best parameter

Random forest

n_estimators: (10, 100) n_estimators: 64
max_depth: (1, 20) max_depth: 6
min_samples_split: (2, 10) min_samples_split: 2
min_samples_leaf: (1, 10) min_samples_leaf: 1

Support vector machine
C: (1e−6, 1e+6, log-uniform) C: 1e+6
gamma: (1e−6, 1e+1, log-uniform) gamma: 0.00143
kernel: [linear, poly, rbf, sigmoid] kernel: rbf

Logistic regression

C: (1e−6, 1e+6, log-uniform) C: 1.7086
penalty: [l1, l2] penalty: l2
solver: [lbfgs, newton-cg, sag, saga] solver: sag
max_iter: (100, 1000, 10000) max_iter: 1000
tol: (1e−6, 1e−3, log-uniform) tol: 4.4260e−5
fit_intercept: [True, False] fit_intercept: False
class_weight: [None, balanced] class_weight: None

Gradient boosting

n_estimators: (10, 100) n_estimators: 42
learning_rate: (1e−6, 1e+1, log-uniform) learning_rate: 0.5144
max_depth: (1, 20) max_depth: 7
min_samples_split: (2, 10) min_samples_split: 4
min_samples_leaf: (1, 10) min_samples_leaf: 10

k-nearest neighbours
n_neighbours: (1, 20) n_neighbours: 1
weights: [uniform, distance] weights: uniform
p: [1, 2] p: 2
1
1
1
1
1

package.5 The resulting balanced subset served as the foundation for
building a model with improved representativeness, capturing various
patterns and features from the initially imbalanced data. The crucial
step in developing a balanced source domain demonstrated a more
thorough understanding of underlying patterns and made it easier for
subsequent domain adaptation, which allowed the model to gain strong
and widely applicable properties. It also reduced the possibility of bias
towards particular cases. Conversely, the target domain was created
from the unbalanced training dataset by deliberately manipulating the
distribution to reflect imbalances, with a focus on the minority class.
This deliberate imbalance aimed to simulate challenging circumstances
for the model during adaptation. The method employed a random
sampling procedure on minority class instances, ensuring that the target
domain retained the intrinsic complexity and biases of the original data.

Following the creation of source and target domains, the challenge
of class imbalance was addressed by introducing sample weights during
model training. This proactive approach aimed to prevent the dominat-
ing majority class from overshadowing the learning process, promoting
a more effective and balanced model. For the source domain, sample
weights were calculated based on class frequencies, with less frequent
classes receiving higher weights. In the target domain, sample weights
were designed to address imbalances, assigning higher weights to the
minority class. Both source and target domain sample weights un-
derwent normalisation to maintain relative importance, ensuring their
sum equalled 1. By integrating sample weights, the model prioritised

inority class instances during training, enhancing its adaptability and
erformance in scenarios with prevalent class imbalances in the target
omain.

To ensure that the selected features are evenly distributed in both
he source and target domains, we implement a quantile transforma-
ion as a means of standardising these features. According to studies
ike Gallón, Loubes, and Maza (2013), Liu et al. (2019), Pan and Zhang
2018), Peterson and Cavanaugh (2019), this approach is preferred over
lternative standardisation methods like mean normalisation or z-score

5 https://pypi.org/project/imblearn/.
12 
Algorithm 3 Quantile Transformation Normalisation
Require: 𝑋: Input dataset with 𝑛 samples and 𝑚 features
Ensure: Normalised dataset 𝑋norm
1: Initialise an empty array 𝑋norm to store the normalised values
2: for 𝑖 ← 1 to 𝑚 do ⊳ Iterate over each feature
3: Sort the values of feature 𝑖 in ascending order
4: Calculate the quantiles for each value based on its rank and the

total number of samples
5: Initialise an empty array 𝑋norm_feature to store the normalised

values of feature 𝑖
6: for 𝑗 ← 1 to 𝑛 do ⊳ Iterate over each sample
7: Calculate the rank of the 𝑗th sample in feature 𝑖
8: Calculate the percentile of the 𝑗th sample based on its rank

and the total number of samples
9: Map the percentile to a standard normal distribution using

the inverse cumulative distribution function (CDF) of the normal
distribution

0: Store the mapped value in 𝑋norm_feature
1: end for
2: Append 𝑋norm_feature to 𝑋norm
3: end for
4: return 𝑋norm

scaling because of its robustness in handling outliers and non-Gaussian
distributions. The provided pseudocode described in Algorithm 3 out-
lines the process of quantile transformation normalisation, a method
used to transform data such that it follows a standard normal dis-
tribution. It starts by iterating over each feature in the dataset and
sorts the values of each feature in ascending order. Then, it calculates
the quantiles for each value based on its rank and the total number
of samples. For each sample, it calculates its percentile based on its
rank and the total number of samples and maps this percentile to a
standard normal distribution using the inverse cumulative distribution
function (CDF) of the normal distribution. These mapped values are
stored in new arrays for each feature, and the normalised feature arrays

are appended to form the normalised dataset, which is returned as the

https://pypi.org/project/imblearn/


T. Ansah-Narh et al.

1
1
1
1
1
1
1
1
2
2
2

t
s
o
r
o
h
c
e
d
d
d
d
w
d

k
m

3

3

D
t

w
o
m
c
f
1
r
l
t
p

a
l
i
f

B

Expert Systems With Applications 258 (2024 ) 125133 
output. By using quantile transformation, feature distributions between
domains are more robustly and efficiently aligned.

Algorithm 4 Domain Adaptation Pipeline
1: Load Datasets:
2: Load the source and target datasets, 𝑋source, 𝑦source and

𝑋target, 𝑦target, respectively.
3: Balance the source dataset and create an imbalanced target

dataset.
4: Calculate Sample Weights:
5: Calculate sample weights for source and target domains:
6: sample_weights_source ←

0.5
∑

𝑖(𝑦source == 𝑖)

7: sample_weights_target_train ←
0.7

∑

𝑖(𝑦target_train == 𝑖)

8: sample_weights_target_test ← 0.3
∑

𝑖(𝑦target_test == 𝑖)
9: Standardise Data:

10: Standardise the features using Quantile transformation:
11: 𝑋source_standardised ← 𝑋source
2: 𝑋target_train_standardised ← 𝑋target_train
3: 𝑋target_test_standardised ← 𝑋target_test
4: Bayesian Optimisation:
5: Iterate over classifiers and perform Bayesian optimisation.
6: Find optimal hyperparameters.
7: Evaluate Optimised Classifiers:
8: For each optimised classifier:
9: Fit the classifier on the source domain.
0: Transform source features.
1: Train a transfer model on the target domain.
2: Make predictions on the target domain testing set.

With regards to DAL, Bayesian optimisation is used to optimise
he hyperparameters described in Section 2.6 by defining a parameter
pace tailored to the model classifiers in Section 2.5. In order to
ptimise based on cross-validated accuracy, each classifier’s hyperpa-
ameter space is methodically explored and exploited using Bayesian
ptimisation. This method makes it possible to determine the ideal
yperparameters that improve each classifier’s performance when it
omes to DAL. Once the classifiers are optimised, a model transfer is
xecuted by fitting these fine-tuned classifiers to the source domain
ata. After that, the pertinent characteristics are taken out of the source
omain and used to train a new model on the unbalanced target
omain. The main focus here is on using the data from the source
omain to modify the model to fit the specifics of the target domain,
hich will in turn improve performance on the testing set for the target
omain. Algorithm 4 depicts a comprehensive pipeline for DAL.

This proposed process involves several key steps aimed at leveraging
nowledge from relevant source domains to adapt machine learning
odels to work effectively in the target domain.

. Performance metrics

.1. Confusion metrics

In assessing the performance of a classification model, such as in our
AL classifier for bankruptcy prediction, the evaluation is based on the

est data, and the results are presented using a confusion matrix.
The confusion matrix, denoted as (𝜑) for a model classifier 𝜑, is

defined as follows:

(𝜑) =
𝑡

∑

𝑖,𝑗
𝑚𝑖𝑗 (𝜑), (21)

where (𝜑) is a 𝑡 × 𝑡 square matrix for a target domain test set with 𝑡
target labels. Each entry 𝑚 (𝜑) of (𝜑) represents the number of values
𝑖𝑗

13 
belonging to a target label 𝑖 but have been assigned to a different target
label 𝑗 by 𝜑. Specific computations derived from Eq. (21) include:

a. ∑𝑡
𝑖,𝑗=1 𝑚𝑖1(𝜑), the sum of all values of target label 𝑖 ∈ 𝑡.

b. ∑𝑡
𝑖=1,𝑗 𝑚1𝑗 (𝜑), the sum of all values of target label 𝑗 ∈ 𝑡.

c. ∑𝑡
𝑖,𝑗=𝑖 𝑚𝑖𝑖(𝜑), depicting the crosswise field that validly classifies

all the target labels.

The introduction of Eq. (21) provides a fundamental framework for
comprehending the performance metrics of our classification model.
This formula is essential because it captures the essence of a confusion
matrix, which is a vital instrument for evaluating the precision and
effectiveness of the model. Each entry of this matrix signifies the
instances where the model correctly or incorrectly assigned a label,
forming the basis for various performance measures.

Given our binary classification focus, the confusion matrix becomes
a 2 × 2 matrix, encompassing true positives (𝑇𝑃𝑠), false positives
(𝐹𝑃𝑠), false negatives (𝐹𝑁𝑠), and true negatives (𝑇𝑁𝑠). These compo-
nents denote the counts of correctly identified positive (no bankruptcy)
and negative (bankruptcy) instances and their misclassifications, re-
spectively. For clarity, in our binary classification scenario: “Positive”
signifies the absence of bankruptcy (no bankruptcy), whilst “Negative”
corresponds to the presence of bankruptcy. Table 4 summarises various
classification measures derived from the confusion matrix, such as
sensitivity, specificity, precision, recall, and accuracy, along with their
respective computations (Hossin & Sulaiman, 2015).

3.2. Probability calibration

Probability calibration is a concept in machine learning that refers
to the alignment of predicted probabilities with the true likelihood of
the corresponding events. In classification tasks, machine learning mod-
els often output predicted probabilities that represent the model’s con-
fidence in its predictions. Ideally, these predicted probabilities should
accurately reflect the actual probabilities of the events being predicted.
One common method for probability calibration is Platt scaling (Böken,
2021; Niculescu-Mizil & Caruana, 2005), which involves fitting a lo-
gistic regression model to the predicted probabilities generated by the
original model. This additional calibration step can help in refining the
predicted probabilities to be more accurate. The mathematical formula
for Platt scaling can be expressed as follows:

Let 𝛬(𝑥) be the output of your machine learning model (before
calibration) for a given instance 𝑥. 𝛬(𝑥) is often a raw score or logit. The
calibrated probability (𝑝𝑐) is then obtained using the sigmoid (logistic)
function defined in Eq. (22):

𝑝𝑐 (𝑥) =
1

1 + 𝑒(𝛼⋅𝛬(𝑥)+𝛽)
(22)

here 𝑝𝑐 (𝑥) is the calibrated probability, 𝛬(𝑥) is the output of the
riginal model, 𝛼 and 𝛽 are the parameters of the logistic regression
odel, which are learned during the calibration process. During the

alibration process, we typically use a set of labelled data to train the
unction in Eq. (22). The labels would be the true class labels (0 or
), and the input to the logistic function would be 𝛬(𝑥). The logistic
egression model is trained to minimise the log-likelihood of the true
abels given the calibrated probabilities. Once the model is trained,
he learned parameters (𝛼 and 𝛽) are used to calibrate new predicted
robabilities.

The Brier score is a metric used to assess the calibration of prob-
bilistic predictions. For binary classification, the Brier score is calcu-
ated as the mean squared difference between the predicted probabil-
ties and the actual outcomes. The formula for the Brier score is as
ollows:

rier score = 1
𝑁
∑

(𝑦𝑖 − 𝑝𝑖)2 (23)

𝑁 𝑖=1


T. Ansah-Narh et al.

w
o
p
p

B

p
t
o

4

s
a
m
s
f
a
a
t
m
e
m
r
w
p

Expert Systems With Applications 258 (2024 ) 125133 
Table 4
Classification measures for DAL classifier performance evaluation.

Name Description Computation

Sensitivity The ability of the classifier to correctly identify all positive scenarios. 𝑇𝑃∕(𝑇𝑃 + 𝐹𝑁)
Specificity The ability of the classifier to correctly reject all negative scenarios. 𝑇𝑁∕(𝑇𝑁 + 𝐹𝑃 )
Precision The ratio of relevant scenarios correctly identified by the classifier. 𝑇𝑃∕(𝑇𝑃 + 𝐹𝑃 )
Recall The ratio of all relevant scenarios correctly identified by the classifier 𝑇𝑃∕(𝑇𝑃 + 𝐹𝑁)

Accuracy The overall ability of the classifier to make correct decisions, considering both positive and negative scenarios. 𝑇𝑃 + 𝑇𝑁
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
Fig. 4. Comparative confusion matrices of bankruptcy prediction models in DAL: The performance differences between the classifiers (RF, SVM, LR, GB, k-NN, and SE) in recognising
cases that are bankrupt and those that are not are depicted in this figure. Notable patterns include SVM and LR exhibiting good results, and RF and SE demonstrating improved
sensitivity with larger true positives and reduced false positives. On the other hand, k-NN and GB show comparatively more false positives.
s
o
t
d
(

a
n
c
t
i
b
t
w
d
N
3
t
r
a
t
f
w
b
m
r
b
c
m

here 𝑁 is the number of instances in the dataset, 𝑦𝑖 is the true binary
utcome (0 or 1) for instance 𝑖, 𝑝𝑖 is the predicted probability for the
ositive class for instance 𝑖. In the case of a multi-class classification
roblem, we can generalise Eq. (23) to get Eq. (24):

rier Score = 1
𝑁

𝑁
∑

𝑖=1

𝐾
∑

𝑘=1
(𝑦𝑖𝑘 − 𝑝𝑖𝑘)2 (24)

where 𝐾 is the number of classes, 𝑦𝑖𝑘 is an indicator variable equal to 1
if the true class for instance 𝑖 is 𝑘, and 0 otherwise, 𝑝𝑖𝑘 is the predicted
robability for class 𝑘 for instance 𝑖. This metric Score ranges from 0
o 1, where a lower score indicates better calibration (better alignment
f predicted probabilities with actual outcomes).

. Results and discussion

The approach adopted in this work uses a 5-fold cross-validation
trategy within the Bayesian optimisation process to ensure reliable
nd generalisable hyperparameter tuning and model evaluation. This
ethod divides the dataset into five subsets, iteratively using four

ubsets for training and one for validation. This reduces the risk of over-
itting and provides a comprehensive assessment of model performance
cross different data splits. Additionally, by employing balanced source
nd target datasets and applying sample weighting and resampling
echniques such as ClusterCentroids, RandomOverSampler, and Rando-
UnderSampler, the validation process is further refined. These steps

ffectively address class imbalance issues, ensuring that evaluation
etrics like accuracy, ROC AUC, and classification reports accurately

eflect the models’ capabilities in handling imbalanced data scenarios
ithin the domain adaptation framework. The discussion that follows

rovides insights into the model’s performance and its implications for p

14 
takeholders in the financial domain. For ease of comprehension, in
ur binary classification scenario: “0” represents the negative class or
he absence of the event being measured (non-bankrupt), whilst “1”
enotes the positive class or the presence of the event being measured
bankrupt).

In Fig. 4, multiple classifiers, including RF, SVM, LR, GB, k-NN, and
stacking ensemble (SE), exhibited varying confusion matrices. Take

ote that the SE algorithm amalgamates predictions from multiple base
lassifiers–in our case, SVM, LR, GB, and k-NN. These predictions are
hen processed by a meta-learner, with RF serving as the meta-learner
n this case. The meta-learner is trained on the predictions gleaned
y the base classifiers to make the final prediction. The consistent
rend across these matrices reveals that the models generally perform
ell in identifying true negatives (non-bankrupt companies) but show
ifferences in their ability to correctly classify bankrupt instances.
otably, RF and SE models produced higher true positives (355 and
51, respectively) and lower false positives (118 and 111, respec-
ively), indicating a better ability to identify companies at financial
isk. SVM and LR also demonstrated favourable results, while k-NN
nd GB showed relatively higher false positives, potentially leading
o unnecessary concerns for non-bankrupt entities. Stakeholders in the
inancial sector should consider these nuances when selecting a model,
ith a focus on minimising false negatives to avoid overlooking actual
ankruptcies. The absence of evident class-imbalance in the confusion
atrix suggests that the classification models are performing well in

ecognising both bankrupt and non-bankrupt cases without a significant
ias towards one class over the other. This balanced performance is
rucial for accurate predictions and indicates the effectiveness of the
odels in handling the imbalanced target data.

Fig. 5 reveals that RF and the SE models achieved well-rounded

erformance across all metrics (precision, recall, and F1-score) for


T. Ansah-Narh et al.

b
e
a
f
S
e
C
n
b
d
r
s
i
c
a
s
c
o
c
h
b
m
p
p

i
a
v
o
g
i
h
a
l
A
s

Expert Systems With Applications 258 (2024 ) 125133 
Fig. 5. Comparative analysis of classification results for bankruptcy prediction models. This figure presents performance insights for RF, SVM, LR, GB, k-NN, and a SE based on
precision, recall, and F1-score metrics for both non-bankrupt (0) and bankrupt (1) classes.
oth bankrupt and non-bankrupt classifications. This suggests their
ffectiveness in accurately identifying companies in financial distress
nd those that are financially stable. While SVM and LR also per-
ormed well, they exhibited a trade-off between precision and recall.
VM prioritised identifying most bankrupt companies (high recall)
ven if it meant misclassifying some healthy ones (lower precision).
onversely, LR focused on avoiding false positives (high precision for
on-bankrupt companies) at the expense of potentially missing some
ankrupt firms (lower recall). The GB followed a similar trend to LR,
emonstrating high precision for non-bankrupt companies but lower
ecall for bankrupt ones. k-NN achieved a reasonable balance but fell
lightly behind RF and SE in both precision and recall. For financial
nstitutions, these results suggest that RF and SE might be preferable
hoices due to their balanced performance in identifying both bankrupt
nd non-bankrupt companies. However, the optimal model selection
hould be tailored to specific priorities and operational needs. Finan-
ial stakeholders should carefully consider the potential consequences
f misclassifications. A high tolerance for false positives (mistakenly
lassifying a healthy company as bankrupt) might favour models with
igh precision like LR. Conversely, situations where missing a truly
ankrupt company (false negative) is more concerning might call for
odels with high recall like SVM. Ultimately, the trade-off between
recision and recall should be weighed based on risk tolerance and the
otential impact of misclassifications.

The ROC curve results showcased in Fig. 6 for various models,
ndicate their discrimination ability in distinguishing between bankrupt
nd non-bankrupt instances. The mean Area Under the Curve (AUC)
alues displayed in the ROC curve provide a summary of each model’s
verall performance across multiple evaluations. Higher AUC values
enerally suggest better model performance in terms of correctly rank-
ng instances by their predicted probabilities. RF and SVM exhibit the
ighest mean AUC, indicating robust discriminative power. LR, GB,
nd the SE also perform well, with AUC values in the high 80 s to
ow 90 s. k-NN, while still respectable, shows a slightly lower mean
UC. These AUC results suggest that RF and SVM are particularly
trong performers in terms of distinguishing between bankrupt and
15 
Fig. 6. ROC curve analysis of DAL models for bankruptcy prediction: The ROC curve
results showcase the discrimination abilities of RF, SVM, LR, GB, k-NN, and a SE. The
AUC values quantify overall model performance, with RF and SVM demonstrating the
highest AUC (93%).

non-bankrupt cases based on their predicted probabilities. The choice
between these models may depend on other considerations, such as
interpretability and computational efficiency. LR, GB, and the SE also
demonstrate reliable discriminatory capabilities, making them suitable
alternatives. k-NN, while showing decent performance, has a slightly
lower mean AUC and may require additional scrutiny.

In the graphical representation (Fig. 7), the decision boundary is
a crucial aspect of understanding how machine learning models make
predictions by separating different classes in the feature space. The
decision boundary of RF (first graph from the left) is nonlinear and can
adapt well to intricate patterns. In this graph, we can observe that RF
creates a piecewise constant decision boundary as it combines multiple
decision trees to form a consensus prediction. The SVM plot (second
graph from the left) aims to find a hyperplane that maximally separates
classes in the feature space. In the context of the high AUC results,


T. Ansah-Narh et al. Expert Systems With Applications 258 (2024 ) 125133 
Fig. 7. Decision boundaries of DAL models for bankruptcy prediction: This figure illustrates the decision boundaries of RF (first from the left), SVM (second from the left), LR
(third from the left), GB (fourth from the left), and k-NN (last from the left) based on their performance characteristics. RF and SVM are expected to have flexible, potentially
nonlinear boundaries, while LR exhibits linear structures. GB combines simple decision boundaries, contributing to a complex overall boundary. k-NN’s locally adaptive boundary
is influenced by the data point distribution. Feature 1 refers to the foremost significant financial ratio, ‘Return on Assets (ROA) Before Interest and % After Tax’, while Feature 2
represents the subsequent crucial financial ratio for predicting bankruptcy, ‘Net Income to Stockholder’s Equity’.
Fig. 8. Illustration of calibration curves and Brier scores for binary classifiers. The Brier
scores (RF: 0.12, SVM: 0.13, LR: 0.14, GB: 0.15, k-NN: 0.14) indicate well-calibrated
probabilistic predictions. The calibration curves showcase the alignment of predicted
probabilities with true positive class probabilities.

SVM establishes a discriminative hyperplane that effectively separates
bankrupt and non-bankrupt instances. Here, the decision boundary is
nonlinear, because the best kernel function we chose was ‘rbf’. LR
models the log-odds of the probability of belonging to a particular class.
The decision boundary for LR is a linear function of the input features.
Since the relationship between features and the log-odds is assumed to
be linear, the decision boundary is a hyperplane. GB builds an ensemble
of weak learners, typically decision trees, sequentially, with each tree
aiming to correct the errors of the previous ones. The decision boundary
is a combination of simpler decision boundaries, leading to a complex
and adaptive overall boundary. The k-NN classifies instances based
on the majority class among its k nearest neighbours. The decision
boundary is flexible and nonlinear, adapting to the local density of
instances in the feature space.

A well-calibrated model should produce predicted probabilities that
are indicative of the true probability of belonging to the positive class.
One way to assess calibration is by examining a calibration curve and
calculating the Brier score, as illustrated in Fig. 8. The Brier scores for
each model are as follows: RF with a Brier score of 0.12, SVM with
0.13, LR with 0.14, GB with 0.15, and k-NN with 0.14. These scores
are relatively low, suggesting that the models provide well-calibrated
probabilistic predictions. To gain further insights into the behaviour of
each classifier, we analyse the distribution of samples across predicted
probability bins. For example, dividing the predicted probabilities into
bins (e.g., 0 − 0.1, 0.1 − 0.2,… , 0.9 − 1.0) and counting the number of
samples in each bin would reveal how concentrated or dispersed the
predicted probabilities are. In general, a well-calibrated model would
exhibit a calibration curve that closely follows the diagonal line (𝑦 = 𝑥)
and this is exactly what the calibration curves exhibit.

The histograms shown in Fig. 9 visually illustrate the central tenden-
cies and spatial distribution of predicted probabilities for each model
16 
(RF, SVM, LR, GB, and k-NN) across all instances. In general, a well-
calibrated model’s histogram of mean predicted probabilities would
ideally exhibit a balanced distribution. Specifically, a peak or concen-
tration around 0.5 indicates that the model is uncertain about the class
assignment for many instances, while values closer to 0 or 1 signify
higher confidence in the predicted class. Models that are poorly cali-
brated may exhibit skewed histograms, indicating a mismatch between
predicted probabilities and the true likelihood of positive outcomes.
This aligns with the concept of calibration discussed by Kull, Filho, and
Flach (2017). For RF (first from the left) and SVM (second from the
left), both models are known for their discriminative power; therefore,
we expect to observe a histogram with a clear peak or concentration
towards the extremes (0 or 1). This suggests that these models are
confident in their predictions and have effectively separated instances
into distinct classes. In the case of LR, the model exhibits a smoother
and more spread-out histogram, reflecting its inherent probabilistic
nature. The mean predicted probability histogram for GB demonstrates
a more refined distribution compared to models like RF. Hence, we
expect to observe a histogram with peaks or concentrations that reflect
the model’s ability to iteratively improve its predictions. Peaks at 0 or
1 in the histogram are indicative of instances where boosting rounds
consistently strengthen a specific class assignment, signifying a higher
level of confidence in the predictions. For k-NN, which is based on
local patterns in the data, produces a histogram with a more varied
distribution. Instances where the nearest neighbours are consistently
of the same class would result in peaks at 0 or 1, indicating higher
confidence, while instances with mixed neighbours might lead to a peak
around 0.5.

Tables 5 and 6 present a comprehensive comparison of tradi-
tional machine learning approaches and DAL techniques to predict
bankruptcy for Taiwanese and Polish companies, respectively. The
evaluation metrics used provide information about the performance of
the models under different conditions. In Table 5, accuracy is generally
higher in traditional learning in most classifiers. For example, the RF
classifier achieves an accuracy of 0.95 in traditional learning, compared
to 0.82 in DAL. Similar trends are observed for SVM, LR, and GB. Pre-
cision for Class 0 remains consistently high in both learning paradigms,
with traditional learning showing a slight edge. For example, SVM has
a precision of 0.96 in traditional learning versus 0.95 in DAL. However,
DAL occasionally shows higher precision for Class 1, such as with LR
(0 vs. 0.77). Recall for Class 1 is notably poor in traditional learning,
as seen with SVM (0) compared to DAL (0.96). This indicates that
traditional models struggle with identifying Class 1 instances. Domain
adaptation techniques significantly improve recall for both Classes
across all classifiers. The F1 score for Class 1 in DAL is considerably
higher than in traditional learning. To illustrate, SVM’s F1 score is
0.36 in traditional learning compared to 0.82 in DAL, highlighting
the improved balance between precision and recall. The DAL method
demonstrates competitive AUC-ROC values and improved calibration
(Brier scores), suggesting better reliability and discrimination in pre-
dictions under shifting data distributions. The accuracy remains higher
in traditional learning for Polish bankruptcy data as well, with SVM


T. Ansah-Narh et al.

m

T
P

Expert Systems With Applications 258 (2024 ) 125133 
Fig. 9. Histograms depicting the central tendencies and spatial distribution of predicted probabilities for RF, SVM, LR, GB, and k-NN models across all instances. Well-calibrated
odels exhibit a balanced distribution around 0.5, indicating uncertainty in class assignment, while values closer to 0 or 1 signify higher