Neural Computing and Applications (2021) 33:11661–11672
https://doi.org/10.1007/s00521-021-05881-3(0123456789().,-volV)(0123456789,-().volV)
ORIGINAL ARTICLE
An advance ensemble classification for object recognition
Ebenezer Owusu1 • Isaac Wiafe1
Received: 24 December 2019 / Accepted: 22 February 2021 / Published online: 11 March 2021
 The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021
Abstract
The quest to improve performance accuracy and prediction speed in machine learning algorithms cannot be overem-
phasized, as the need for machines to outperform humans continue to grow. Accordingly, several studies have proposed
methods to improve prediction performance and speed particularly for spatio-temporal analysis. This study proposes a
novel classifier that leverages ensemble techniques to improve prediction performance and speed. The proposed classifier,
Ada-AdaSVM uses an AdaBoost feature selection algorithm to select small features of input datasets for a joint support
vector machine (SVM)–AdaBoost classifier. The proposition is evaluated against a selection of existing classifiers (SVM,
AdaSVM and AdaBoost) using the Jaffe, Yale, Taiwanese facial expression database (TFEID) and CK ? 48 datasets with
Haar features as the preferred method for feature extraction. The findings indicated that Ada-AdaSVM outperforms SVM,
AdaSVM and AdaBoost classifiers in terms of speed and accuracy.
Keywords SVM  Adaboost  AdaSVM  Ada-AdaSVM
1 Introduction classification models. This is because, in ensembles, mul-
tiple classifiers are used to train the model. This allows
Existing datasets that are used for model training are each classifier to make use of the other’s strengths to
characterized by class imbalances (with some target classes produce a robust model that is more accurate [4]. Cur-
containing fewer examples), noise, under-sampling and rently, the two main methods of ensemble construction are
over-sampling. To address these challenges, researchers based on boosting and bagging. Although, some studies
continue to propose newer algorithms that outperform have advocated for bagging as a preferred choice [7],
existing ones. For instance, recent studies have demon- others have argued that in cases where datasets have less
strated that distributed bandit-based algorithms reduce noise, boosting outperforms bagging [8]. For instance,
network data and are also capable of finding clusters faster Adaptive boosting (AdaBoost) can be generalized since its
with higher accuracies in recommender systems [1]. Others margins can be enlarged and thus performs better. Ada-
have demonstrated that performance accuracy can be Boost creates a collection of classifiers by maintaining a set
improved in imbalance dataset by applying the Synthetic of weights over the training data and adaptively adjust
Minority Over-sampling Technique (SMOTE) with boost- these weights during each boosting iteration.
ing procedures [2]. Particularly, by transforming the Admits all the success, the need to achieve higher levels
selection problem in SMOTE to a multi-objective opti- of classification accuracy remains. This is because the
mization problem [3]. Similarly, others [4–6] have fundamental trade-off between accuracy and classification
explained procedures that can be adopted to improve pre- speed serves as an impediment in classification model
diction accuracy. It has been demonstrated that ensemble training. As such, domains that benefit from existing
provides a promising solution for training more accurate machine learning algorithms continue to propose methods
that seek to improve prediction accuracy. A domain that
benefits most from classification algorithms and improved
& Isaac Wiafe modeling performance is object recognition. Object
iwiafe@ug.edu.gh recognition is relevant and has potentials in other areas of
1 study including facial recognitions and expressions, fin-Department of Computer Science, University of Ghana,
Accra, Ghana gerprint recognition and retina recognition. However, the
123
11662 Neural Computing and Applications (2021) 33:11661–11672
domain inherits generic machine learning model training adopted or applied based on needs or expectations. They
challenges. As such, the ability to classify or detect objects require fewer statistical training and also can detect com-
more accurately with lesser computational complexity plex nonlinear relationships between dependent and inde-
continues to be one of the main challenges in the domain. pendent variables. Yet, they produce multiple solutions
This challenge is more visible in image recognition associated with local minima, hence they are not consid-
because image datasets are characterized by several issues ered to be robust over different samples [11] or general-
including pigmentation, aspect size and resolution that purpose algorithms since they require huge data to train.
compounds the classification problem. In this regard, this They operate in a ‘‘black box’’ and thus have a greater
study seeks to explore the strengths of using ensemble computational burden and are also prone to overfitting [12].
techniques with support vector machine (SVM) and Ada- Deep learning algorithms are also outperformed by tree
Boost to improve classification accuracy without compro- ensembles for even typical machine learning problems.
mising classification speed. Our motivation for using SVM Deep learning methods have improved over the years
and AdaBoost is derived from studies conducted by Owusu [13], although its theoretical underpinnings are not well
et al. [9] that demonstrated that both model performance understood. Also, due to the complexity involved in
and speed can be improved by using AdaBoost feature training deep learning models, they are considered inap-
extraction and SVM to train a classifier. In this study, it is propriate for applications that run in ‘‘real-time’’.
hypothesized that model performance and training speed Nonetheless, they are considered to be one of the suc-
can further be improved by using AdaBoost for feature cessful spatial techniques available [9].
extraction and a fusion of multiclass support vector
machine and AdaBoost as a hybrid classifier. 2.2 SVM and AdaBoost techniques
In the next section, a discussion on relevant literature on
algorithms for object recognition, classification and Adaptive boosting (AdaBoost) is a spatial learning algo-
ensembles is presented. This is followed by a description of rithm that selects a restricted number of weak classifiers
the datasets and the pre-processing stages. The proposed from a large weak classifier hypothesis space, to construct a
technique (Ada-adaSVM) is then discussed. The model strong classifier. The boosting process attempts to improve
evaluation and performance measure, discussions and the model by focusing on where it is not learning properly.
conclusions are then presented. It is effective for reducing bias error. However, it has been
argued that although it performs better than bagging, it can
overfit [14]. Yet, some studies have proven otherwise [15].
2 Related work In boosting, parameters are tuned, and the algorithms are
restricted to reducing the multiclass classification problem
2.1 Algorithms for object recognition to multiple two-class problems [16–18].
and classification SVM on the other hand provides a good out-of-sample
generalization. They can be made robust by choosing
Machine learning methods and algorithms have become appropriate generalization grades in the presence of biases
rampant and more successful in recent times because among training samples. SVMs can be considered as
computing hardware cost and processing times have maximal margin hyperplane classification techniques that
reduced exponentially. Therefore, training complex algo- depend on the outcomes of statistical learning theories to
rithms are now relatively easier and faster. As mentioned certify good generalization performance [19]. Yet, Ada-
earlier, the relevance of object recognition and image Boost is considered to be better on generalization, and this
classification cannot be overemphasized. Accordingly, makes it preferable for many classification problems
several studies have proposed methods that have proven to although it is sensitive to noisy data and outliers. Some
be effective with higher accuracies for classifying or rec- studies have argued that SVM is superior in recognition
ognizing images. Methods that leverage on Deep Learning accuracies when compared to AdaBoost, nevertheless, the
algorithms, Bayesian Network, Decision Trees, k-Nearest latter is faster in terms of model training speed [20].
Neighbors, Support Vector Machine, among others have all Generally, there is a trade-off between performance (in
been used in image classification and recognition research. terms of accuracy) and model complexity (in terms of
Deep learning is effective in image classification as they training speed). In addressing this, different classification
are capable of extracting the appropriate features and also techniques are integrated to enhance performance and
discriminate at the same time [10]. They are broadly possibly training speed. As mentioned earlier, recent
classified into four main categories; Restricted Boltzmann studies conducted by Owusu et al. [9] demonstrated that
Machines (RBMs), Convolutional Neural Networks SVM and AdaBoost can be ensembled to improve model
(CNNs), Autoencoders and Sparse Coding. Each method is performance and speed.
123
Neural Computing and Applications (2021) 33:11661–11672 11663
2.3 Research contribution outperforms existing algorithms. The proposed procedure
resigns from simple integration of data pre-processing with
As has been conferred in the previous section, AdaBoost unchanged bootstrap sampling technique. As compared to
performs better in terms of model training speed, yet poor standard bootstrap sampling, the probability of drawing
in accuracy when compared to SVM. Thus, this study different types of examples is changed. Rather, the sam-
seeks to use AdaBoost as (1) initial feature selection pling is focused on the minority class, specifically those
algorithm and (2) amalgamate AdaBoost and SVM located in challenging sub-regions of the minority class.
(AdaSVM) to form a classifier. It is therefore expected that This probability is dependent on the neighborhood class
the resultant classifier will perform better in terms of speed distribution of the sample as suggested.
and accuracy. The first part of the proposed method is an AdaBoost-
based feature selection tool that selects a series of
2.4 Ensemble techniques uncomplicated classifiers to form a strong classifier. This
procedure speeds up detection and avoids misclassification.
Research on the application of ensembles in machine At each stage, feature extraction filters such as Gabor [20]
learning and artificial intelligence application continues to and Haar [23] are treated as weak classifiers. This ensures
advance as the need to discover newer approaches with that the AdaBoost feature selection algorithm selects the
competitive levels of accuracy, speed and lesser compu- finest of classifiers and boosts the weights on the examples,
tational complexity becomes prominent. Several studies to weigh the errors further. Afterwards, a subsequent filter
have focused on variations of SVM and AdaBoost for is selected to provide the most appropriate result on the
model performance. Some researchers have argued that due errors of the preceding filter. At each step, the selected
to the complexities in SVM, AdaBoostSVM ensemble may filter is uncorrelated with the output of the preceding filter.
not be viable [21]. Nonetheless, studies conducted by After a cycle of training, samples are re-weighted, and the
Owusu et al. [20] and Li et al. [21] have experimented the weight of the error classified samples is accentuated. The
possibilities available when SVM is ensembled with Ada- diverse weighted weak learners are merged to produce a
Boost. Studies have concluded that AdaBoostSVM per- strong feature selection hypothesis (see Fig. 1).
forms better when used as component classifiers [22], and The second part of the technique is the classification.
it can improve classification performance on imbalanced This involves the fusion of the multiclass SVM and Ada-
data. Boost to form a hybrid classifier. For this optimization
A key objective in the use of ensemble is its ability to problem, SVM with the Radial Basis Function (RBF SVM)
balance bias (underfitting) and variance (overfitting). It kernel is used as a weak classifier. This weak SVM clas-
employs the power of diversity to enhance training per- sifier is trained to produce the optimum Gaussian value for
formance. However, evaluating resultant models from the scale parameter d and the regularization parameter o.
ensembles are more challenging and demands more com- This method was adopted from [8, 18], and the decision
putation than those of single models. That notwithstanding, hyperplane k is represented by the equation:
the effort required for computation can be argued to be P PDðiÞ:/ðl;v;zÞ DðiÞ:/ðl;v;zÞ
worthwhile as they perform better than single models. Fast ðþÞ ðÞ
algorithms (such as decision trees) benefit more than k ¼
8i2/ þ 8i2/       ð1Þ
 P   P 
slower algorithms in ensembles. They perform better  DðiÞ:/
 ðl;v;zÞ
  DðiÞ:/
  ðl;v;zÞ

ðþÞ ðÞ 
in situations where there is significant diversity between 8i2/ 8i2/
the individual base models and effective in capturing both where i 2 ð1; 2; . . .;NÞ are extracted features of Gabor or
linear and nonlinear patterns. Accordingly, this study
Haar filters. An image I, is represented as Ui ¼
adopts an ensemble technique to train an object recognition N
model. The next section is a discussion on the proposed fðxn; ynÞgn¼1 and z, l, v represents the parameters of the
model. image. The positive sets /
ðþÞ and the negative sets /ðÞ are
/ðþÞ ¼ fðx ; y ÞgN Jdenoted by n n n¼1  <J  ð1Þ and
/ðÞ ¼ fðx ; y ÞgN Jn n n¼1  <  ð1Þ, respectively, where
3 Proposed algorithm: Ada-AdaBoost xn is the n-th data sample containing J features, and yn is its
corresponding class label.
A major challenge in ensembles is the art of selecting the The vectors /ðu;v;zÞ over a distribution D are trained to
various base models to optimize model training perfor-
find the weights of all feature vectors, U ¼ /ðþÞ þ /ðÞ.
mance and speed. The novelty and contribution of this i
This gives us a threshold k, which is the decision hyper-
study are in the ability to construct an ensemble that
plane. The positive half of k (which is the majority
123
11664 Neural Computing and Applications (2021) 33:11661–11672
Fig. 1 Process flow of the
proposed Ada-AdaSVM
decision) represents the clients, and the negative half rep- ð1Þ ¼ 1 8ð ¼ þ Þ ð1Þw ; y 1 or w ¼ 1 ; 8ðy ¼ 1Þ:
resents the imposter. The status is reversed if the minority i i i2a i 2b
of the positive instances is rather located in the positive ð3Þ
half-space. The final feature selection tool is computed as
follows: Do while d[ dmin.
Let c be the clients and p be the imposters. For a given Step 3 Apply the RBFSVM kernel to train the weighted
training dataset containing both positive and negative training data sets by applying the Leave-One-Subject-
samples, each sample is denoted by ðUi; yiÞ, where y 2 Out Cross-Validation (LOSOCV) approach and compute
f1g is the corresponding class label. The feature selection the training error for the weak classifier ht:
hypothesis HðUÞ X, which is a function of the selected fea- Nnt ¼ wti; yi 6¼ htðxiÞ ð4Þ
tures as suggested by [24] is formulated as below: i¼1
" #
XT Step 4 At n 1t [ , reduce d by a factor of and d then2 step
HðUÞ ¼ sgn x h1ðU1t t t Þ ð2Þ jump to Step 1.
t¼1 Step 5 Place the weight of the constituent classifier ht
where T is the final iteration, h1t is a hypothesis with the such that:
most discriminating information,xt represents the weights  11 2
that weigh h1t by its classification performance. ht : at ¼ ln  1 ð5Þnt
The second part of the algorithm is the learning process
which is formulated as follows: Step 6 Update the weights by computing:
t
Step 1 Input the training sets, fðy1; x1Þ; . . .; ðyN ; xNÞg, wtþ1 ¼ wi expfatyihtðxiÞg ; ð6Þ
N ¼ aþ b; where datasets have yi ¼ þ1 and b datasets i Nt
have yi ¼ 1. The initial d ¼ dini; dmin; dstep. The scale Pn
where N is a normalization constant and wtþ1t
parameter, d and x and y are the feature vectors selected i¼1 i
¼ 1.
Step 7 The final classifier is given by:
by the AdaBoost algorithm.
Step 2 Initialize the training sets weights as:
123
Neural Computing and Applications (2021) 33:11661–11672 11665
hX i
ð Þ ¼ T a ð Þ ð Þ Step 1 All images were resized to a window of sizeH x sgn
t¼ tht x : 71 25 9 15, and RGB images (the case of TFEID) were
converted into grayscale images.
The leave-one cross-validation approach is used to
Step 2 Feature extraction by Haar, H was processed
partition datasets into training and testing. The leave-
using:
one-out error E is determined by:
Xn ¼ 1
XNB
¼ j ð Þ  j ð Þ H ð Þ ¼ ½SgnðBiÞ:SðBiÞ ð9ÞE 1=2n fi xi li 8 N u; v :r i 1B
i¼1
where rB is the variance for pixel feature block B, NB is
where n is the number of times data is trained. the number of subblocks, Bi is the sign of the coefficients
to the subblock (refer to [23] for more detail).
Step 3 Images were downsampled using the Bessel
4 Datasets method [23], and it is calculated as:
Xa Xb  a   a m1 m2
The proposed model was evaluated using the Jaffe, Yale, xdðt1; t2Þ ¼ cðm1;m2ÞJ0 t1 J0 t2a m b n
the CK ? 48 and the Taiwanese facial expression database m1¼1 m2¼1
(TFEID). The Jaffe database contains 213 images of 10 0 t1  a m; 0 t2  b n
female Japanese persons. Two (2) images of each indi-
vidual from each class of expression namely neutral, sad, where a and b are original image sizes, ða bÞ 
fear, anger, disgust, happy and surprise were randomly ðb nÞ is the downsampling image, where m and n are
selected for training; leading to a total of 140 images any positive integers such that m\n and n\b, J0 is a
(65.7%), and the remainder was used for testing. first-order Bessel function. The size of the original
A tenfold cross-validation was used to obtain an average images is reduced to one-fourth of the original.
recognition rate for the dataset. The Yale facial expression Step 4 Normalization was performed using anisotropic
database contains 165 grayscale images in GIF format of smoothing, and the normalized image Iðx1; y1Þ is com-
15 individuals. One hundred and thirty (130) images cor- puted as:
responding to happy, neutral, sad and surprise were man-  I x1; y1 ¼ scaleðaRðx; yÞ þ HeqðLðx; yÞÞÞ
ually extracted. The dataset was partitioned into training
and testing using the same procedure for the Jaffe dataset. where
This resulted in 77% of the images for training and the
Rðx; yÞ ¼ Iðx; yÞ=Lðx; yÞ;
remainder for testing. The CK ? 48 comprises 981
grayscale datasets of seven facial expressions, namely, Heq represents histogram equalization with /¼ 2:
anger, contempt, disgust, fear, happy, sadness and surprise.
The composition of the images is anger, 135; contempt, 54;
disgust, 177; fear, 75; happy, 207; sadness, 84; surprise, 5 Results and analysis
249. Over seventy-one per cent (i.e., 71.4%) of the datasets
comprising 700 subjects were used for training and the
The computed results from the respective confusion
remaining 29.6 comprising of 281 for testing. TFEID
matrices (Figs. 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
consists of 7200 stimuli, captured from 40 models (20
17, 18) indicate that the recognition accuracy of the pro-
males), each with eight facial expressions: neutral, anger,
posed technique (Ada-AdaSVM) is superior to the others.
contempt, disgust, fear, happiness, sadness and surprise.
The recognition accuracy-performance analysis (Figs. 19,
Models gazed at two different angles (0 and 45). Each
20, 21, 22) demonstrates that the propsed method (Ada-
expression includes two kinds of intensities (high and
AdaSVM) outperformed all the other methods tested (i.e.,
slight) and was captured by two CCD-cameras simultane-
  AdaSVM, AdaBoostSVM and SVM) on all the datasets.ously with different viewing angles (0 and 45 ). Seventy
The performance test conducted showed that the perfor-
per cent (70%) of the TFEID comprising of 5040 were used
mance of Ada-AdaSVM is statistically significant
for training, while the remaining 30% (2160) were used for
(p\ 0.05).
testing. Figure 2 shows samples of the four different
datasets. 5.1 Computations of recognition accuracies
All the datasets above were processed using the fol-
lowing steps:
The recognition accuracy of all the methods is computed
using a confusion matrix. The results are displayed in
123
11666 Neural Computing and Applications (2021) 33:11661–11672
Fig. 2 Sample images in Jaffe (a), Yale (b), TFEID (c) and CK ? 48 (d)
Fig. 3 Confusion matrix for Ada-AdaSVM (JAFFE dataset) Fig. 4 Confusion matrix for SVM (JAFFE dataset)
Figs. 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17 and 18 the JAFFE dataset, it can be observed that Ada-AdaSVM
(where neutral = nu, anger = an, disgust = di, sad = sa, outperforms all the other methods. For instance, it recorded
surprise = su, fear = fe, happiness = ha). With regard to a true positive rate of 0.95 for neutral, 0.98 for anger, 1.00
123
Neural Computing and Applications (2021) 33:11661–11672 11667
Fig. 5 Confusion matrix for AdaSVM (JAFFE dataset) Fig. 7 Confusion matrix for Ada-AdaSVM (YALE dataset)
Fig. 6 Confusion matrix for AdaBoost (JAFFE dataset) Fig. 8 Confusion matrix for SVM (YALE dataset)
for disgust, 0.97 for fear, 1.00 for happiness, 0.96 for sad although it has only four (4) classes. The algorithm per-
and 1.00 for surprise. Comparing these values to that of formed better on the Jaffe dataset when compared to the
AdaSVM, SVM and AdaBoost indicate that it is a superior others. However, with regard to the prediction of the
model (see diagonals of the various matrices of the dif- ‘‘happy’’ class, the model performed better on the Yale
ferent datasets for verification). dataset than the others.
5.2 Accuracy performances of methods 5.3 Processing speed
Relatively, the model performed poorly for predicting The algorithm was run on an HP Pavilion Notebook with
‘‘neutral’’ facial expression in the JAFFE dataset. However, AMD A8-7410 APU Radeon R5 Graphics, (4 CPUs), *
this situation is similar for all the other algorithms that 2.2 GHz processor. The memory capacity of the system is
were tested. Again, it can be observed that Ada-AdaSVM 8192 MB RAM. The comparative assessment of the pro-
outperformed SVM with an average of margin of 2% in all posed method (Ada-AdaSVM) against others in terms of
the classes (see Figs. 19, 20, 21, 22). A similar pattern was training and testing times is shown in Figs. 23 and 24,
observed in all the datasets including the Yale dataset, respectively.
123
11668 Neural Computing and Applications (2021) 33:11661–11672
Fig. 9 Confusion matrix for AdaSVM (YALE dataset) Fig. 11 Confusion matrix for Ada-AdaSVM (TFEID dataset)
Fig. 10 Confusion matrix for AdaBoost (YALE dataset) Fig. 12 Confusion Matrix for SVM (TFEID dataset)
To evaluate the speed of the model, the four datasets higher (p\ 0.05). This characteristic was evident in all the
were combined, and Ninety-six (96) images were selected dataset evaluated for both training and testing.
for training and 72 for testing. This included 5 spatial
scales and 8 orientations, and a 96 9 96 image location 5.4 Statistical evidence of Ada-AdaSVM
was used. This produced 368,640 possible features. A supremacy
fragment of the Gabor feature with the best classification
performance for the current boosting distribution amounted A one-way ANOVA analysis was conducted for all the
to 576 features was selected using AdaBoost feature recognition accuracies to determine whether there is a
selection algorithm. statistically significant difference between the output of
The combined dataset was partitioned into training and Ada-AdaSVM versus AdaSVM, Ada-AdaSVM versus
testing using a tenfold cross-validation method. The results SVM and Ada-AdaSVM versus AdaBoost in all four
indicated that when all the classifiers are trained with the databases. The analysis of the variances and the corre-
same thresholds of Gabor features and tested with equal sponding Tukey simultaneous plot at a 95% confidence
datasets, the performance of Ada-AdaSVM is significantly interval indicated that the corresponding means are
123
Neural Computing and Applications (2021) 33:11661–11672 11669
Fig. 15 Confusion matrix for Ada-AdaSVM (CK ? 48 dataset)
Fig. 13 Confusion matrix for AdaSVM (TFEID dataset)
Fig. 14 Confusion matrix for AdaBoost (TFEID dataset) Fig. 16 Confusion matrix for SVM (CK ? 48 dataset)
significantly different (p\ 0.05) and that Ada-AdaSVM is 6 Conclusions and future work
the best.
Again, for the testing speed, there was a statistically This study proposes a novel algorithm ‘‘Ada-AdaSVM’’
significant difference between groups as was shown by that seeks to outperform existing facial expression classi-
one-way ANOVA analysis (F (3, 12) = 4.101, p = 0.032). fication algorithms. Specifically, it argues that by extending
A Tukey post hoc test revealed that the time to complete the original linearity of two classifiers (i.e., AdaBoost and
the problem was significantly lower in Ada-AdaSVM than SVM) to a multiclass problem and applying the Error-
all the other methods, after taking the Ada-AdaSVM and Correcting Output Codes (ECOC), one can improve pre-
SVM (p = 0.022), Ada-AdaSVM and AdaSVM diction accuracy and speed. In this, an AdaBoost-based
(p = 0.040) and Ada-AdaSVM and AdaBoost (p = 0.037). feature selection tool was used to select a series of
Similarly, the training speed was significantly lower uncomplicated classifiers to form a strong classifier. Then a
(p\ 0.05) in Ada-AdaSVM than all the other methods. fusion of multiclass SVM and AdaBoost hybrid classifier
was used to train the model. To validate the expected
improvement in prediction accuracy and speed, this method
123
11670 Neural Computing and Applications (2021) 33:11661–11672
Fig. 20 Performance in Yale database
Fig. 17 Confusion matrix for AdaSVM (CK ? 48 dataset)
Fig. 21 Performance in TFEID
Fig. 18 Confusion matrix for AdaBoost (CK ? 48 dataset)
Fig. 22 Performance in CK ? 48
CK ? 48 facial expression datasets. The findings show that
Ada-AdaSVM outperforms existing algorithms with regard
to prediction accuracy and speed.
Although Ada-AdaSVM has been demonstrated to out-
Fig. 19 Performance in Jaffe database
perform other algorithms in this study, the findings cannot
be generalized. This is because the study focused on facial
was evaluated by comparing it to SVM, AdaSVM and
expression classifications, and therefore, the findings may
AdaBoost on the Yale, Jaffe, Taiwanese (TFEID) and
not apply to other forms of image classifications.
123
Neural Computing and Applications (2021) 33:11661–11672 11671
References
1. Mahadik K, Wu Q, Li S, Sabne A (2020) Fast distributed bandits
for online recommendation systems. In: Proceedings of the 34th
ACM international conference on supercomputing, pp 1–13
2. Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) SMO-
TEBoost: improving prediction of the minority class in boosting.
In: Lavrač N, Gamberger D, Todorovski L, Blockeel H (eds)
Lecture Notes in Computer Science, vol 2838, pp 107–119.
Springer Verlag, Berlin Heidelberg
3. Huang Z, Yang C, Chen X, Huang K, Xie Y (2019) Adaptive
over-sampling method for classification with application to
imbalanced datasets in aluminum electrolysis. In: Neural com-
puting and applications, pp 1–17
4. Twala B, Cartwright M, Shepperd M (2006) Ensemble of missing
data techniques to improve software prediction accuracy. In:
Fig. 23 Comparing the training speed of different methods Proceedings—international conference on software engineering,
May 2014, pp 909–912. https://doi.org/https://doi.org/10.1145/
1134285.1134449
5. Kim D (2005) Improving prediction performance of neural net-
works in pattern classification. Int J Comput Math
82(4):391–399. https://doi.org/10.1080/0020716042000301806
6. Wei S, Zuo D, Song J (2012) Improving prediction accuracy of
river discharge time series using a wavelet-NAR artificial neural
network. J Hydroinform 14(4):974–991. https://doi.org/10.2166/
hydro.2012.143
7. Błaszczyński J, Stefanowski J (2015) Neighbourhood sampling in
bagging for imbalanced data. Neurocomputing 150(PB):529–542.
https://doi.org/10.1016/j.neucom.2014.07.064
8. Kuncheva LI, Skurichina M, Duin RPW (2002) An experimental
study on diversity for bagging and boosting with linear classifiers.
Inf Fusion 3(4):245–258. https://doi.org/10.1016/S1566-
2535(02)00093-3
9. Owusu E, Zhan Y, Mao QR (2014) A neural-AdaBoost based
facial expression recognition system. Expert Syst Appl
Fig. 24 Comparing testing speed of different method 41(7):3383–3390. https://doi.org/10.1016/j.eswa.2013.11.041
10. Deng L (2014) A tutorial survey of architectures, algorithms, and
Accordingly, there is a need for further studies to be con- applications for deep learning. APSIPA Trans Signal Inf Process
3(2):1–29. https://doi.org/10.1017/ATSIP.2013.99
ducted to validate the efficacy of the model on other types 11. Auria L, Moro RA (2008) Support vector machines (SVM) as a
of image classification. Some advanced machine learning technique for solvency analysis (No. 811), Berlin
techniques such as the clustering of bandits may be 12. Tu JV (1996) Advantages and disadvantages of using artificial
explored as a feature selection to investigate its behavior in neural networks versus logistic regression for predicting medical
outcomes. J Clin Epidemiol 49(11):1225–1231. https://doi.org/
future work. 10.1016/S0895-4356(96)00002-9
In addition, there is the need to evaluate how this 13. Guo Y, Liu Y, Oerlemans A, Lao S, Wu S, Lew MS (2016) Deep
algorithm will perform when compared to some of the most learning for visual understanding: a review. Neurocomputing.
popular models for spatio-temporal data analysis including https://doi.org/10.1016/j.neucom.2015.09.116
14. Mayr A, Binder H, Gefeller O, Schmid M (2014) The evolution
convolutional neural network (CNN) and recurrent neural of boosting algorithms: from machine learning to statistical
network (RNN). modelling. Methods Inf Med 53(6):419–427. https://doi.org/10.
3414/ME13-01-0122
15. Schapire RE (2013) Explaining adaboost. Empirical Inference:
Festschrift in Honor of Vladimir N Vapnik. https://doi.org/10.
Declarations 1007/978-3-642-41136-6_5
16. Wang Y, Ai H, Wu B, Huang C (2004) Real time facial
expression recognition with adaboost. Proc Int Conf Pattern
Conflict of interest Conflict of interest and authorship confirmation
Recognit 3:926–929. https://doi.org/10.1109/ICPR.2004.1334680
form please check the following as appropriate: all authors have
17. Narsky I, Porter FC (2013) Reducing multiclass to binary. Stat
participated in (a) conception and design, or analysis and interpreta-
Anal Tech Part Phys 1:371–379. https://doi.org/10.1002/
tion of the data; (b) drafting the article or revising it critically for
9783527677320.ch16
important intellectual content; and (c) approval of the final version.
18. Friedman J, Hastie T, Tibshirani R (2000) Additive logistic
This manuscript has not been submitted to, nor is under review at,
regression: a statistical view of boosting (with discussion and a
another journal or other publishing venue. The authors have no
rejoinder by the authors). Ann Stat 28(2):337–407. https://doi.
affiliation with any organization with a direct or indirect financial
org/10.1214/aos/1016218223
interest in the subject matter discussed in the manuscript.
123
11672 Neural Computing and Applications (2021) 33:11661–11672
19. Lin C-J, Hsu C-W (2002) A comparison of methods for multi- 23. Satiyan M, Nagarajan R (2010) Recognition of facial expression
class support vector machine. IEEE Trans Neural Netw using Haar-like feature extraction method. In: 2010 international
13(2):415–425 conference on intelligent and advanced systems, ICIAS 2010,
20. Senior A (1998) A hidden Markov model fingerprint classifier. In: pp 1–4. https://doi.org/https://doi.org/10.1109/ICIAS.2010.
Signals, system and computers, pp 306–310 5716228
21. Amami R, Ben Ayed D, Ellouze N (2013) The challenges of 24. Owusu E, Zhan YZ, Mao QR (2014) An SVM-AdaBoost-based
SVM optimization using Adaboost on a phoneme recognition face detection system. J Exp Theor Artif Intell 26(4):477–491.
problem. In: 4th IEEE international conference on cognitive https://doi.org/10.1080/0952813X.2014.886300
infocommunications, CogInfoCom 2013—proceedings,
pp 463–468. https://doi.org/https://doi.org/10.1109/CogInfoCom. Publisher’s Note Springer Nature remains neutral with regard to
2013.6719292
jurisdictional claims in published maps and institutional affiliations.
22. Li X, Wang L, Sung E (2008) AdaBoost with SVM-based
component classifiers. Eng Appl Artif Intell 21(5):785–795.
https://doi.org/10.1016/j.engappai.2007.07.001
123