University of Ghana http://ugspace.ug.edu.gh
UNIVERSITY OF GHANA 
COLLEGE OF BASIC AND APPLIED SCIENCES 
 
  
DETECTING ANGER IN PERSUASIVE SPACES: AN EVALUATION OF FACIAL 
EXPRESSION ALGORITHMS 
 
 
BY 
JACQUELINE ASOR KUMI 
(10485756) 
 
 
 
THIS THESIS IS SUBMITTED TO THE UNIVERSITY OF GHANA, LEGON IN 
PARTIAL FULFILLMENT OF THE REQUIREMENT FOR THE AWARD OF 
MASTER OF PHILOSOPHY IN COMPUTER SCIENCE DEGREE 
 
 
DEPARTMENT OF COMPUTER SCIENCE 
 
 
 
 
JULY 2020 
 
University of Ghana http://ugspace.ug.edu.gh
Declaration  
I hereby declare that this dissertation is entirely my work unless otherwise indicated. No part 
of this dissertation work has been presented as the basis for the award of any degree to this or 
any other university.  
 ………………………. 
DATE: …October 5, 2020 
DR ISAAC WIAFE 
SUPERVISOR 
 
 ………………. 
DR EBENEZER OWUSU 
DATE: …October 5, 2020 
CO-SUPERVISOR 
 
………………………….. 
DATE: … October 5, 2020 
KUMI JACQUELINE ASOR 
STUDENT 
 
  
  
 
 
 
 
 
 
 
 
 
 
I 
 
University of Ghana http://ugspace.ug.edu.gh
Abstract 
Darwin’s influential work on the recognition of the emotions in man and animals served as the 
starting point for emotion research. Based on his work, the basic emotion was theorized, from 
which several other emotions have been conceptualised. These emotions are recognised by 
both verbal and non-verbal form of communication. Facial expression is the significant and 
leading measure for recognition of emotions as 55% of what we communicate is expressed on 
our facial expressions. Therefore, facial expression has been applied in diverse fields to detect 
emotions such as lie detection and in the medical field for pain analysis; resulting in a plethora 
of algorithms or techniques.  
Among the negative emotions, anger is said to be the most frequently experienced emotion yet 
the most unsatisfactorily handled emotion in both personal and social relations. It is also said 
to be the emotion that considerably affects the mental state of an individual. Further, its 
intensities like temper, hostility, annoyance, tantrum, agitation, and rage foster harm to the 
individual and the surrounding environment as well as have disruptive interpersonal and 
intrapersonal consequences. 
Currently, anger recognition is performed as part of the multiclass emotion classification or 
done using physiological signals or speech data. To the best of our knowledge, there have not 
been any studies on how to detect only anger using facial expressions. And even with the 
existing approach, there have been some identified issues such as the overlapping of emotion: 
anger, fear, and disgust and the difficulty of some of the facial expressions algorithms in 
performing a multiclass recognition of emotions. For this reason, we argue that it is key that 
the recognition of the anger is done accurately. As, the detection of anger would be useful as it 
will provide useful information about peoples’ intensity of anger to manage or control it as 
unregulated anger sometimes result in aggression or violence.  
II 
 
University of Ghana http://ugspace.ug.edu.gh
As such, we want to determine how these facial expression algorithms will perform when used 
for binary classification, specifically anger recognition. Therefore, we propose a framework of 
three models: SVM (machine learning algorithm), CNN (deep learning algorithm) and a novel 
ensemble learning algorithm and PCA as our dimensionality reduction function. The 
performance of our models was evaluated on JAFFE, CK+ and KDEF datasets. It was observed 
that our proposed models outperformed the state-of-the-art methods, with mention of our novel 
ensemble learning model which attained an accuracy of 100% on the JAFFE dataset.Thus we 
conclude that the proposed methods are effective for the recognition of anger using facial 
expressions and future work will look at evaluating the performance of these algorithms on a 
created database of Africans as well as employing these algorithms to detect anger in a 
persuasive space and persuade the individual from angry to another emotion for example 
happy. 
Keywords: anger, persuasive spaces, anger recognition, facial expression, facial expression 
recognition, facial expression algorithms, deep learning, machine learning, ensemble learning 
algorithm, SVM, CNN 
 
 
 
 
 
 
 
 
III 
 
University of Ghana http://ugspace.ug.edu.gh
Dedication  
I dedicate this dissertation to my father and all who served a motivating and driving force for 
the successful completion of this work.    
   
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
IV 
 
University of Ghana http://ugspace.ug.edu.gh
Acknowledgements  
First and foremost, I would like to express my gratitude to my father who consistently 
motivated me and made provisions available throughout this course. God bless you.  
I am also grateful to all who assisted me in one way or the other to the fruitful completion of 
this course. 
 
  
  
  
  
  
  
  
  
 
 
  
 
 
 
 
 
 
 
 
 
V 
 
University of Ghana http://ugspace.ug.edu.gh
Table of Contents 
Declaration ................................................................................................................................. I 
Abstract ..................................................................................................................................... II 
Dedication ................................................................................................................................ IV 
Acknowledgements ................................................................................................................... V 
Table of Contents ..................................................................................................................... VI 
List of figures ........................................................................................................................... XI 
List of tables .......................................................................................................................... XIII 
List of abbreviations ............................................................................................................. XIV 
Chapter 1 .................................................................................................................................... 1 
Introduction ............................................................................................................................ 1 
1.1 Motivation and overview .............................................................................................. 1 
1.2 Current approach .......................................................................................................... 3 
1.3 Challenges .................................................................................................................... 4 
1.4 Our approach and expected contribution ...................................................................... 4 
1.5 Aims and objectives ...................................................................................................... 5 
1.6 Structure of the thesis ................................................................................................... 5 
Chapter 2 .................................................................................................................................... 7 
Literature review .................................................................................................................... 7 
2.1 Introduction .................................................................................................................. 7 
2.2 Psychological background of emotions ........................................................................ 7 
2.2.1 Models of emotions ................................................................................................... 7 
2.3 Modalities of emotion recognition ............................................................................... 8 
2.4 Facial expression modality ......................................................................................... 10 
2.6 Typical Facial Expression Recognition (FER) System .............................................. 17 
2.6.1 Machine and deep learning FER ............................................................................. 20 
VI 
 
University of Ghana http://ugspace.ug.edu.gh
2.6.2 Ensemble learning algorithms ................................................................................. 26 
2.7 Databases for facial expression recognition ............................................................... 33 
2.8 Limitation of current work and contribution .............................................................. 34 
2.9 Chapter summary ........................................................................................................ 35 
Chapter 3 .................................................................................................................................. 37 
Methodology ........................................................................................................................ 37 
3.0 Introduction ................................................................................................................ 37 
3.1 Workflow .................................................................................................................... 37 
3.1.1 Image acquisition ..................................................................................................... 37 
3.1.1.1 JAFFE database .................................................................................................... 38 
3.1.1.2 CK+ database ........................................................................................................ 39 
3.1.1.3 KDEF database ..................................................................................................... 40 
3.1.2 Pre-processing ......................................................................................................... 40 
3.1.2.1 Face detection ....................................................................................................... 41 
3.1.2.2 Image enhancement .............................................................................................. 43 
3.1.2.2.1 Median blur ........................................................................................................ 43 
3.1.2.2.1 Histogram equalisation ...................................................................................... 44 
3.1.3 Feature extraction .................................................................................................... 44 
3.1.3.1 Local Binary Pattern ............................................................................................. 45 
3.1.3.2 Histogram of Oriented Gradients ......................................................................... 47 
3.1.4 Feature selection ...................................................................................................... 48 
3.1.5 Classification models ............................................................................................... 49 
VII 
 
University of Ghana http://ugspace.ug.edu.gh
3.1.5.1 Support Vector Machine ....................................................................................... 50 
3.1.5.2 Convolutional Neural Network (CNN) ........................................................................ 51 
3.1.5.3 Ensemble method ................................................................................................. 53 
3.1.6 Elements involved ................................................................................................... 54 
3.1.6.1 Programming language ......................................................................................... 54 
3.1.6.2 Packages and development environments ............................................................ 55 
3.1.6.2.1 Anaconda ........................................................................................................... 56 
3.1.6.2.2 Open Source Computer Vision Library (OpenCV) ........................................... 56 
3.1.6.2.3 Tensorflow ......................................................................................................... 57 
3.1.6.2.4 Keras .................................................................................................................. 57 
3.1.6.2.5 Scikit-learn ......................................................................................................... 57 
3.1.7 Development ............................................................................................................ 58 
3.1.8 Chapter summary ..................................................................................................... 60 
Chapter 4 .................................................................................................................................. 61 
Experimental setup ............................................................................................................... 61 
4.1 Introduction ................................................................................................................ 61 
4.2 Hardware specification ............................................................................................... 61 
4.3 Pre-processing stage ................................................................................................... 62 
4.3.1 Database pre-processing .......................................................................................... 62 
4.3.2 Image pre-processing ............................................................................................... 63 
4.3.2.1 Grayscaling and resizing of images ...................................................................... 64 
4.3.2.2 Face detection and cropping ................................................................................. 64 
4.3.2.3 Image enhancement .............................................................................................. 65 
VIII 
 
University of Ghana http://ugspace.ug.edu.gh
4.4 Feature extraction ....................................................................................................... 66 
4.4.1 LBP feature extraction ............................................................................................. 66 
4.4.2 HOG feature extraction ........................................................................................... 66 
4.4.3 Hybrid feature extraction ......................................................................................... 67 
4.5 Feature selection ......................................................................................................... 67 
4.6 Classification .............................................................................................................. 67 
4.6.1 Model selection ........................................................................................................ 67 
4.6.2 Hyperparameters tuning .......................................................................................... 68 
4.6.3 Settings and protocols .............................................................................................. 70 
4.6.4 Data augmentation ................................................................................................... 70 
4.7 Chapter summary ........................................................................................................ 70 
Chapter 5 .................................................................................................................................. 72 
Experimental results and discussion .................................................................................... 72 
5.1 Introduction ................................................................................................................ 72 
5.2 Evaluation metrics ...................................................................................................... 72 
5.3 Experiment I (training without data augmentation) ................................................... 73 
5.4 Experiment II .............................................................................................................. 78 
5.5 Discussion ................................................................................................................... 86 
5.5.1 Introduction ............................................................................................................. 86 
5.5.2 Performance of the models ...................................................................................... 86 
5.5.3 Comparison of the state-of-the-art ........................................................................... 87 
5.6 Limitations .................................................................................................................. 90 
IX 
 
University of Ghana http://ugspace.ug.edu.gh
5.7 Chapter summary ........................................................................................................ 91 
Chapter 6 .................................................................................................................................. 92 
Conclusion ............................................................................................................................... 92 
Bibliography ............................................................................................................................ 94 
Appendix ................................................................................................................................ 118 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
X 
 
University of Ghana http://ugspace.ug.edu.gh
List of figures 
Figure 2.1: modalities of emotion recognition ........................................................................... 9 
Figure 2.2 displays the basic emotions from the JAFFE database .......................................... 13 
Figure 2.5: Measurements of facial expressions ...................................................................... 13 
Figure 2.3: processes of facial expression recognition ............................................................ 20 
Figure 3.1 shows sample images of the JAFFE database. ....................................................... 39 
Figure 3.2: displays a sample CK+ images. ............................................................................. 40 
Figure 3.3: shows KDEF images captured from different angles ............................................ 40 
Figure 3.4 shows the rectangle features. .................................................................................. 42 
Figure 3.5 displays the calculation of area using integral image ............................................. 42 
Figure 3.6: operation of the LBP operator. .............................................................................. 46 
Figure 3.7: Three neighbour sets for different (P, R) used to construct a circularly symmetric 
LBP .......................................................................................................................................... 46 
Figure 3.8: displays the operation of the LBP operator ........................................................... 46 
Figure 3.9 displays a HOG feature extraction process ............................................................ 48 
Figure 3.10: Steps involved in CNN FER ............................................................................... 53 
Figure 3.11: the workflow of our research work ..................................................................... 59 
Figure 4.1: Example of a gray scaled and resized CK+ angry image ...................................... 64 
Figure 4.2: viola jones face detection on the left and a detected and cropped face on the right.
.................................................................................................................................................. 65 
Figure 4.3: a denoised JAFFE image using median blur. ........................................................ 66 
Figure 4.4 displays a CLAHE enhanced JAFFE image. .......................................................... 66 
Figure 4.5: transformations applied to a JAFFE image ........................................................... 71 
Figure 5.1: Confusion matrices for the models: SVM, CNN and Ensemble learning for the 
JAFFE dataset – Experiment I ................................................................................................. 76 
XI 
 
University of Ghana http://ugspace.ug.edu.gh
Figure 5.2: Confusion matrices for the models: SVM, CNN and Ensemble learning for the CK+ 
dataset – Experiment I ............................................................................................................. 77 
Figure 5.3: Confusion matrices for the models: SVM, CNN and Ensemble learning for the 
KDEF dataset – Experiment I .................................................................................................. 78 
Figure 5.4: Confusion matrices for the models: SVM, CNN and Ensemble learning for the 
JAFFE dataset – Experiment II ................................................................................................ 80 
Figure 5.5: Confusion matrices for the models: SVM, CNN and Ensemble learning for the CK+ 
dataset – Experiment II ............................................................................................................ 81 
Figure 5.6: Confusion matrices for the models: SVM, CNN and Ensemble learning for the 
KDEF dataset – Experiment II ................................................................................................. 82 
Figure 5.7: ROC curves on the JAFFE dataset ........................................................................ 83 
Figure 5.8: ROC curve on CK+ dataset ................................................................................... 84 
Figure 5.9: ROC curve on the KDEF dataset. ......................................................................... 85 
 
 
  
XII 
 
University of Ghana http://ugspace.ug.edu.gh
List of tables 
Table 2.1: Descriptions of Action Units, FACS description and their associated facial muscle
.................................................................................................................................................. 15 
Table 2.2: summarises algorithms utilised by researchers for facial expression recognition. . 27 
Table 3.1: summary of the package development and environments. ..................................... 57 
Table 4.1: a summary of the hardware specification. .............................................................. 62 
Table 4.2: summary of the datasets. ......................................................................................... 63 
Table 4.3: summary of the datasets after data augmentation. .................................................. 71 
Table 5.1 shows the performance (accuracy, recall, precision and F1-score) of the JAFFE 
dataset on the SVM, CNN and ensemble learning models. ..................................................... 74 
Table 5.2 displays the performance (accuracy, recall, precision and F1-score) of the CK+ 
dataset for the models: SVM, CNN and ensemble learning. ................................................... 75 
Table 5.3: KDEF dataset performance (accuracy, precision, recall and f1-score) for CNN, SVM 
and ensemble learning models. ................................................................................................ 75 
Table 5.4: JAFFE dataset performance (accuracy) on the CNN, SVM and ensemble learning 
models ...................................................................................................................................... 79 
Table 5.5: Comparison of approaches on the JAFFE dataset .................................................. 88 
Table 5.6: Comparison of the proposed method with state-of-the-art. .................................... 89 
Table 5.7: Comparison of different techniques on JAFFE+ dataset ........................................ 89 
 
  
XIII 
 
University of Ghana http://ugspace.ug.edu.gh
List of abbreviations 
AAM --Active Appearance Models  
Adadelta -- Adaptive Delta 
Adagrad -- Adaptive Gradient 
Adam -- Adaptive Momentum 
ANN -- Artificial Neural Network 
AUC -- Area Under ROC Curve 
BU-3DFE -- Binghamton University 3D Facial Expression  
BVP -- Blood Pressure Rate 
CK -- Cohn Kanade 
CK+ -- Extended Cohn Kanade 
CLAHE -- Contrast Limited Adaptive Histogram Equalisation 
CNN -- Convolutional Neural Network 
CPU -- Central Processing Unit 
DCNN -- Deep Convolutional Neural Network 
DCT -- Discrete Cosine Transform 
DGLTP -- Directional Gradient Local Ternary Pattern  
DNN -- Deep Neural Network  
ECG -- Electrocardiography 
EEG -- Electroencephalography 
ELM -- Extreme Learning Machine  
EMG -- Electromyography 
EOG -- Electrooculogram 
FER -- Facial Expression Recognition 
Fmris -- Functional Magnetic Resonance Imaging 
Google Colab -- Google Colaboratory  
HOG -- Histogram of Oriented Gradients 
IFED -- Indian Facial Expression Image Database  
JAFFE -- Japanese Female Facial Expression 
KDEF -- Karolinska Directed Emotional Faces 
XIV 
 
University of Ghana http://ugspace.ug.edu.gh
KNN -- K-Nearest Neighbour 
LBP -- Local Binary Pattern  
LDA -- Linear Discriminant Analysis 
LD-MGAD -- Local Descriptor with Modified Gray value Accumulation Value  
LDN -- Local Directional Number 
LDP -- Local Directional Pattern 
LG -- Logistic Regression 
LPQ -- Local Phase Quantization 
LSTM -- Long Short-term Memory  
MEG -- Magnetoencephalography 
MLP -- Multilayer Perceptron 
MOGA -- Multiobjective Genetic Algorithm 
MRI -- Magnetic Resonance Imaging 
MUFE -- Mevlana University Facial Expression  
Nadam -- Nestrov Accelerated Gradient 
NB -- Naïve Bayes 
NIRS -- Near-Infrared Spectroscopy 
NSGA -- Nondominated Sorting Genetic Algorithm 
PCA – Principal Component Analysis 
PCA-LDA -- Principal Component Analysis - Linear Discriminant Analysis  
PET -- Positron Emission Tomography 
PSO-KNN -- Particle Swarm Optimization based K-Nearest Neighbour  
RaFD -- Radboud Faces Database 
RAM -- Random Access Memory 
RBF -- Radial Basis Function 
RF -- Random Forest 
RMSprop -- Root Mean Square Propagation 
 RNN -- Recurrent Neural Network 
ROC -- Receiver Operating Characteristic 
SFEW -- Static Faces in the Wild 
SFFS -- Sequential Feed Feature Selection  
XV 
 
University of Ghana http://ugspace.ug.edu.gh
SIFT -- Scale Invariant Feature Transform 
SOM -- Self-Organising Maps 
SRC -- Sparse Representation based Classifier  
SVM -- Support Vector Machine 
TFEID -- Taiwanese Facial Expression Database  
TPU -- Tensor Processing Unit 
 
 
 
XVI 
 
University of Ghana http://ugspace.ug.edu.gh
Chapter 1 
Introduction 
1.1 Motivation and overview 
The interest in human emotions has spanned many centuries. Darwin’s influential work on the 
expression of the emotions in man and animals served as the starting point for emotion research 
(Darwin, 1872; Petrushin, 2000). Subsequently, there has been a significant contribution from 
multidisciplinary fields such as psychology, computer science, medicine, sociology and so on 
(Mitsuyoshi & Ren, 2013).  
Emotion is a complex feeling stimulated by internal and external stimuli which influences 
behaviour and mental processes resulting in the output of physical and physiological changes 
(Domínguez-Jiménez, Campo-Landines, Martínez-Santos, Delahoz, & Contreras-Ortiz, 2020). 
Variants of emotions have been proposed based on discrete theories by different psychologists, 
although they vary based on the number and type (Ortony & Turner, 1990). However, the most 
employed forms of emotion recognition are the basic emotions of: surprise, anger, happiness,  
fear, sadness and disgust, propounded by (Ekman, 1999). Emotions can be distinguished by 
modalities such as facial expressions, body postures or movement and physiological signals 
such as electroencephalography (EEG), electromyography (EMG), and electrocardiography 
(ECG) (Feidakis, 2016; Mitsuyoshi & Ren, 2013). Nevertheless, the facial expression is the 
most utilised measure as it is non-invasive, inexpensive due to the nonrequirement for 
hardware, gives easy and accurate detection of emotions as it serves as the medium through 
which 55% of human communication is displayed (Gonzalez-sanchez, Baydogan, Chavez-
echeagaray, Robert, & Burleson, 2017; Mehrabian, 1968). Facial expression is a key factor in 
human communication revealing an individual’s thoughts and emotions naturally during 
communication (Jameel, Singhal, & Bansal, 2016). Therefore, it can be concluded that the face 
1 
 
University of Ghana http://ugspace.ug.edu.gh
is an important feature of the body as it conveys an individual’s personality, emotions, thoughts 
and ideas even before it has been verbalized; playing a significant role in human 
communication and social interaction (Dhall & Sethi, 2014; Mitsuyoshi & Ren, 2013). The 
components of the face that help in the expression of emotions include the eyes, eyebrows, 
mouth, forehead, lips, cheeks, chin, nose. For example, an angry face is characterized by a 
brow lowering, raising of the upper lid, and tightening of the lid and lip, respectively.  
As such, the invention of human-centred interfaces by the next generation computing namely: 
persuasive computing has brought immense benefits which project the human user to the 
foreground. This next-generation computing readily responds to human communication, as 
these interfaces can perceive and understand human emotions and intentions which are 
communicated by the social and affective signals (Pantic, Pentland, Nijholt, & Hunag, 2007). 
These human-computer interaction interfaces seek to reshape the behaviour and intentions of 
individuals as well as seek to improve their health; hence the proposal of the construction of 
persuasive spaces or the use of persuasive technology to change an individual behaviour or 
emotion to a predetermined one (Stibe & Wiafe, 2018). Inspired by this vision, the field of 
human-computer interaction, computer vision and pattern recognition has witnessed colossal 
transformation including the automated analysis of facial expressions or facial behaviour with 
machine learning and deep learning algorithms.  
Among the negative emotions, anger is said to be the most frequently experienced emotion yet 
the most unsatisfactorily handled emotion in both personal and social relations (Moritz, 2006). 
It is also said to be the emotion that considerably affects the mental state of an individual 
(Kudiri, Said, & Nayan, 2013). Anger is revealed to be a strong emotion influenced by several 
elements such as biological factors, psychological factors, physiological factors and 
environmental factors which include family, society and culture as well as also other emotions 
like fear, which serves as a springboard for anger (Shahsavarani, Noohi, Jafari, Kalkhoran, & 
2 
 
University of Ghana http://ugspace.ug.edu.gh
Hatefi, 2015; Zhan et al., 2018). Even though anger is noted solely as a negative emotion, it is 
a natural emotion which when expressed without restraint enhances an individual health. Anger 
can serve as a self-defence mechanism under stressful conditions as well as facilitates 
behaviour as it possesses motivating properties which drives an individual towards a  goal-
centred action (Moritz, 2006; Shahsavarani et al., 2015).  
Despite, anger being a natural emotion and is expressed positively, its intensities like temper, 
hostility, annoyance, tantrum, agitation and rage can foster harm to the individual and the 
surrounding environment as well as bring about disruptive interpersonal and intrapersonal 
consequences (Kassinove, Sukhodolsky, Eckhardt, & Tsytsarev, 1997). On the other hand, 
when anger is repressed, it causes an individual to be meek and mild, lacking strength and 
initiative, passive in all circumstances and appearing lifeless always. For this reason, there is 
the need to recognise anger as it will provide useful information about peoples’ intensity of 
anger so that there can be a regulation or management, as unregulated anger sometimes results 
in aggression or violence (Moritz, 2006; Shahsavarani et al., 2015). 
1.2 Current approach  
Presently, anger recognition is performed using the following approaches: based on the 
utilization of either physiological signals or audio or speech data (Chang, Lin, & Zheng, 2012; 
Chhabra, Vyas, Chatterjee, & Vob, 2017; Deng, Eyben, Schuller, & Burkhardt, 2018) or facial 
expressions for recognising general or a subset of emotions. To the best of our knowledge, 
there have not been any studies on how to detect only anger using facial expressions. 
Therefore, our work has focused on recognising anger using facial expressions since it is the 
leading and significant measure among the modalities for anger recognition.  
 
 
3 
 
University of Ghana http://ugspace.ug.edu.gh
1.3 Challenges  
Although the facial expression algorithms used for the recognition of the basic emotions or a 
subset of it has produced excellent results, some problems persist which need to be addressed.  
1. Generally, there are some limitations with the multiclass classification of emotions, 
such as an overlap among the facial expressions namely disgust, anger, and fear. This 
is as a result of slight distinction among them which gives an untrue representation of 
the emotions when classified (Pell & Richards, 2011).  
2. Further, it is postulated that a majority of the facial expression algorithms have 
difficulty in performing a multi-class classification, which is attested to with regards to 
the training time, computational time and the insufficient memory space (Kiran & 
Kushal, 2016; Shah, Sharif, Yasmin, & Fernandes, 2017). 
1.4 Our approach and expected contribution 
The current systems can detect facial expressions in general or a subset of emotions. To the 
best of our knowledge, there have not been any studies on how to detect only anger. We argue 
that anger detection needs to be done accurately, giving a true representation of the emotion. 
As the detection of anger could provide useful information about the intensity of anger and 
help to manage or control it, as unregulated anger sometimes results in aggression or violence 
(Moritz, 2006; Shahsavarani et al., 2015). To test and validate our arguments, our proposed 
framework will employ both machine learning and deep learning algorithm as well as a novel 
ensemble learning algorithm for our work (see chapters 3,4 and 5 for details). Our proposed 
models, in general, outperformed the state-of-the-art methods. 
 
 
 
4 
 
University of Ghana http://ugspace.ug.edu.gh
1.5 Aims and objectives  
The aim of this study to conduct solely anger recognition using facial expression and compare 
the outcome to the state-of-the-art experiments to determine if the former would obtain higher 
accuracy. 
Objectives: 
1. To research on emotions and determine the significant measure for the measuring of 
emotions. 
2. To research literature on facial expression algorithms. 
3. To investigate and understand the algorithms used for facial expressions recognition 
(FER). 
4. Integrate state-of-the-art machine learning and deep learning techniques into 
frameworks for anger recognition. 
5. Discuss the performance of the various facial expression algorithms and databases 
utilised for anger recognition. 
6. Compare our results with the outcome of the general FER experiments. 
1.6 Structure of the thesis 
The thesis is organized as follows:  
Chapter 1 gives a succinct overview of the background of the study, the problem statement, 
aims and objectives and expected contribution.  
Chapter 2 presents a review of the different algorithms at the three stages in facial expression 
recognition and provide insight into the background of emotion research as well as some 
psychology emotions theories. 
In chapter 3 of this study, the experimental methodology is described. It involves the 
exploratory discussion of our different methodologies and datasets.  
5 
 
University of Ghana http://ugspace.ug.edu.gh
Chapter 4 details the implementation process. It presents details on the experimental setup, pre-
processing, feature extraction and selection and classification.  
Chapter 5 begins by investigating the results, evaluate the performance metrics used in 
adjudicating the performance of the algorithms; highlight what was done, what was achieved, 
or found, the implications of the results, strengths and limitations as well as recommendations 
for future research.  
Our conclusions, as well as a summary of the work, are drawn in the final chapter of the thesis. 
 
  
6 
 
University of Ghana http://ugspace.ug.edu.gh
Chapter 2 
Literature review 
2.1 Introduction 
The expression of emotions in human and animals by Darwin (1872) in the nineteenth century, 
served as the premise for research on emotions. In his work, Darwin indicated that both humans 
and animals exhibit emotions of similar behaviour (Petrushin, 2000). Since then, there has been 
significant progress in the research on emotions and the past two decades has witnessed 
contributions from multidisciplinary fields such as psychology, medicine, sociology, 
neuroscience, endocrinology and computer science and with a colossal number of algorithms 
for automatic facial expression recognition being developed (Mitsuyoshi & Ren, 2013).  
Therefore, in this chapter, we review and discuss relevant literature on the psychological 
background of emotion research, facial expression recognition and its resulting algorithms.  
 2.2 Psychological background of emotions 
  2.2.1 Models of emotions  
Emotions can be described as things we feel which are caused by neurons that shoot electrons 
around the tiny pathways inside the amygdala, the emotion centre of the brain. Emotion can 
also be described as a complex experience involving related feelings which tends to move one 
out of his or her individuality (Moritz, 2006; Shahsavarani, Noohi, Jafari, Kalkhoran, & Hatefi, 
2015). They come with physical and physiological changes which regulate our behaviour, due 
to reactions to internal and external stimuli (Domínguez-Jiménez et al., 2020). Emotion is a 
salient characteristic of humans. It plays a useful role in human communication as well as in 
the growth and regulation of interpersonal relationships (Ekman, 1999; Kirange & Deshmukh, 
2012; Mangalagowri & Raj, 2017). It also affects thoughts, actions as well as the making of 
decisions (Izard, 2007). 
7 
 
University of Ghana http://ugspace.ug.edu.gh
2.3 Modalities of emotion recognition 
In recognising emotions, several sources of emotional information have been proposed. These 
sources of emotion information serve as the primary data from which emotions can be inferred. 
They can be broadly classified into three groups namely biological indicators, behavioural 
indicators and physiological signals (see figure 1) (Feidakis, 2016; Mitsuyoshi & Ren, 2013).  
The biological indicators comprise facial expressions and body postures or gestures. The 
physiological signals are measurements based on electrical signals recording produced by the 
heart, skin, muscles, and brain. They include electroencephalography (EEG), 
electromyography (EMG), electrocardiography (ECG), respiration rate, skin conductance, 
electrooculogram (EOG), blood pressure rate, Positron Emission Tomography (PET), 
Magnetic Resonance Imaging (MRI), Magnetoencephalography (MEG), Functional Magnetic 
Resonance Imaging (fMRI), Near-Infrared Spectroscopy (NIRS). Further, speech signals and 
text represent the behavioural indicators for emotion recognition.  
 
8 
 
University of Ghana http://ugspace.ug.edu.gh
 
Figure 2.1: modalities of emotion recognition (Mitsuyoshi & Ren, 2013). 
Therefore, when performing emotion recognition, either a unimodal or bimodal or multimodal 
source of the emotional information can be utilised. The unimodal approach involves the use 
of a technique whilst the bimodal or the multimodal involves the use of 2 or more sources of 
emotional information. However, from literature, bimodal emotion recognition can also be 
referred to as multimodal emotion recognition (Busso, Deng, Yildirim, & Bulut, 2004).  
As such, drawing on the application of the unimodal and multimodal approach for emotion 
detection and classification by researchers, several studies for instance (Lalitha, Geyasruti, 
Narayanan, & Shravani, 2015; Rajvanshi, 2018; L. Sun, Zou, Fu, Chen, & Wang, 2019; 
Tzirakis, Trigeorgis, Nicolaou, Schuller, & Zafeiriou, 2017; S. Zhang, Zhang, Huang, & Gao, 
2018) utilised speech signals for the classification of emotions. Also, researchers (Kaur, Singh, 
& Roy, 2018; Patil & Behele, 2018) employed Electroencephalography (EEG) whilst (Ahmed, 
9 
 
University of Ghana http://ugspace.ug.edu.gh
Bari, & Gavrilova, 2020) employed body gestures or movements and physiological signals 
deployed by (Chakladar & Chakraborty, 2018; H. Huang, Hu, Wang, & Wu, 2020). In addition, 
for multimodal emotion recognition, researchers  (Chaparro et al., 2018; X. Huang et al., 2015; 
Yongrui Huang, Yang, Liao, & Pan, 2017; Matlovic, Gaspar, Moro, Simko, & Bielikova, 2016; 
T. Zhang, Zheng, Cui, Zong, & Li, 2019)  adopted facial expressions and electroencephalogram 
signals. Also, Kudiri, Said, & Nayan (2016) used facial expressions and speech for emotion 
detection and classification. Keshari & Palaniswamy (2019) also utilised facial expression and 
body gestures and (Zheng, Liu, Lu, Lu, & Cichocki, 2019) performed speech emotion 
recognition fusing facial expressions and speech signals. Further,  Nguyen, Nguyen, Sridharan, 
Dean, & Fookes (2018) performed a multimodal emotion recognition comprising of the 
measures: facial expression, pose, body movements, and voice. 
2.4 Facial expression modality 
Research on facial expression dates back to the ancient times, making facial expressions a 
recognised and important modality among the non-verbal forms of communication as it can be 
inferred from the literature above that facial expressions is mostly combined with other 
modalities when performing emotion recognition (Zheng et al., 2019). Darwin’s work on the 
universality of facial expressions of emotions across different cultures and different tribes 
served as a foundation for the empirical study on facial expressions (Darwin, 1872). Thus, 
making facial expressions the only measure with developed frameworks as it has been 
researched thoroughly in the past few decades (Keshari & Palaniswamy, 2019). Additionally, 
among the indicators for emotion recognition, facial expressions are argued to be a significant 
measure as well as the leading measure for emotion recognition as facial expressions convey 
55% of what humans communicate and 7% and 38% through language and speech respectively 
(Mehrabian, 1968; Pantic et al., 2007). Emotions can be easily and accurately detected from 
the face (Abhang, Gawali, & Mehrotra, 2016; Happy, Patnaik, Routray, & Guha, 2017). 
10 
 
University of Ghana http://ugspace.ug.edu.gh
Furthermore, the use of the facial expression for emotion recognition has several advantages 
such as its non-invasiveness and relatively cheapness, as it does not involve any physical 
contact with the user employing sensors in comparison to the case of collecting EEG signals 
or has any requirement for expensive hardware (Gonzalez-sanchez et al., 2017). Facial 
expressions are useful in deciphering an individual’s thoughts or state of mind during a 
conversation (Jameel et al., 2016). It also serves as the most real indicators that lend 
information on age, truthfulness, temperament, personality and the emotional state of a person 
(Apte, Basavaraj, & Nithin, 2016; Pantic & Bartlett, 2007). Hence, it can be concluded that the 
face is an important feature of the body, as it conveys an individual’s personality, emotions, 
thoughts and ideas even before it has been verbalized; playing a significant role in human 
communication and social interaction (Dhall & Sethi, 2014; Mitsuyoshi & Ren, 2013). 
Further, Darwin’s research established the foundation for the conceptualisation of emotions 
and thus it received attention among various psychologists.  Ekman (1970) validated Darwin 
theory on the universality of emotions irrespective of the tribes and cultures when he proposed 
the discrete theory of emotion namely the basic emotion. From that, several psychologists have 
theorised variants of emotions based on the basic theories, for example, Plutchik and Russell 
models (Ortony & Turner, 1990; Plutchik, 1987; Russell & Pratt, 1980). These conceptualised 
emotions vary according to the type and number, even though they are all borne out of Darwin 
and Ekman’s universality of emotions. Nonetheless, the most employed emotions for emotion 
research based on these discrete theories are the basic emotions which are modelled by six 
classes: happiness, disgust, fear, surprise, sadness and anger (Ekman, 1999). The basic 
emotions are considered to be universal across different cultures and different people and are 
used in describing the affective states of individuals (Ekman, 1970; Haq & Jackson, 2010). 
Each basic emotion is characterised by a unique facial expression (Ekman, 1977) ( refer to 
section 2.6 for details).  
11 
 
University of Ghana http://ugspace.ug.edu.gh
Conventionally, emotions are usually classified based on positivity and negativity (An, Ji, 
Marks, & Zhang, 2017). However, there are other classifications, such as the two-dimensional 
(2D) model proposed by Russell & Pratt (1980) and the eight primary emotions by (Plutchik, 
1987). The 2D model is based on valence and arousal, pleasantness, and unpleasantness, on the 
other hand, the eight primary emotions represent the positivity and negativity of emotions as 
having been grouped in pairs such as joy and sadness.  
Among the negative emotions, anger is revealed to be the most frequently experienced emotion 
yet the one most unsatisfactorily handled emotion in both personal and social relations (Moritz, 
2006). It is also said to be the emotion that considerably affects the mental state of an individual 
(Kudiri et al., 2013).  
 
 
12 
 
University of Ghana http://ugspace.ug.edu.gh
Figure 2.2 displays the basic emotions from the JAFFE database (Lyons, Akamatsu, Kamachi, 
& Gyoba, 1998). Left-to-right from top row: anger, disgust, fear, happiness, neutral and sadness 
2.5 Measurements of facial expressions 
The facial expression serves as the representation of signals which send forth messages 
(emotions) such as disgust, anger, happiness, surprise, fear, and sadness (Ekman, 1977). In 
detecting emotions from facial expressions, several methods have been proposed over the past 
two decades. Duchenne de Boulogne (1862) employed electric shocks to identify the various 
muscles motions. This approach helped identify the combination of the muscle motions that 
express an emotion. Moving on, photography was used to aid in deciphering emotions from 
facial expressions (Darwin, 1872). Ekman (1977) introduced the Facial Affect Scoring 
Technique (FAST), harnessing the discovery by Duchenne and Darwin. FAST’s method of 
measuring emotions was not able to depict the different facial appearances for the basic 
emotions, nonetheless, it could correctly distinguish between pleasant and unpleasant facial 
expressions. Ekman (1977) further proposed the Facial Action Coding System (FACS) which 
is based on the study of the anatomical basis of facial expression. Thus coining a definition for 
facial expression, which is defined as the movement of one or more muscles of the face which 
conveys the emotions of an individual (Ekman, 1970). FACS could distinguish all visible facial 
behaviour; utilising a set of action units (AU) to describe every facial muscle activity. An AU 
represents a certain component of facial muscles movement (see Table 2.1) and emotions are 
described by a set of AU’s. The facial muscles are displayed in Figure 2.2.  
The components of the face that help in the expression of emotions include the eyes, eyebrows, 
mouth, forehead, lips, cheeks, chin, nose. For example, an angry face is characterized by a 
brow lowering, raising of the upper lid and tightening of the lid and lip respectively which 
corresponds to the following action units and muscles: AUs 4,5,7 and 23 and depressor 
13 
 
University of Ghana http://ugspace.ug.edu.gh
glabellae, depressor supercilli, corrugator supercilli, levator palpebrae superioris, orbicularis 
oculi and orbicularis orris respectively.   
Yet, there were drawbacks with the use of FACS and this included:  
1. early researchers who employed the use of Facial Action Coding System had to 
manually code the action units to unravel the basic emotion, making this process both 
labour and cost-intensive (Ekman, 1977). Also, learning to be proficient in coding in 
FACS  could take 100 hours of training and an extra 2 hours for coding each image 
sequence (Littlewort, Bartlett, & Lee, 2007). 
2. Irrelevant information is exposed by the FACS codes which setbacks data-driven facial 
expression recognition methods. For the sufficient number of facial expressions, it is 
not possible to have a training database which contains the existing number of about 
7000 AU combinations, as such it leads to poor generalisation performance (Fasel, 
Monay, & Gatica-Perez, 2004).  
Advances in technology have contributed immensely to the analysis of emotions, begetting 
automated facial expression recognition. This improvement in technology helps in extending 
the use of FACS beyond the behavioural research disciplines and helping in assessing the 
specific muscle movements associated with facial expressions at a faster and reliable way 
(Frank, 2001).  
14 
 
University of Ghana http://ugspace.ug.edu.gh
Figure 2.2:  description of facial muscles.  
Table 2.1: Descriptions of Action Units, FACS description and their associated facial muscle 
(Ekman & Friesen, 1976). 
AU NUMBER FACS DESCRIPTION ASSOCIATED FACIAL 
MUSCLE 
1 Inner brow raiser Frontalis, Par Medialis 
2 Outer brow raiser Frontalis, Pars Lateralis 
4 Brow Lowering Depressor Glabellae; 
Depressor Supercilli; 
Corrugator 
5 Upper Lid Raiser Levator Palpebras Superioris 
6 Cheek Raiser Orbicularis Oculi, Pars 
Orbitalis 
15 
 
University of Ghana http://ugspace.ug.edu.gh
7 Lid Tightener Orbicularis Oculi, 
Orbicularis Palebralis 
9 Nose Wrinkle Levator Labii Superioris, 
Alaeque Nasi 
10 Upper Lid Raiser Levator Labii Superioris, 
Caput Infraorbitalis 
11 Nasolabial Fold Deepener Zygomatic Minor 
12 Lip Corner Puller Zygomatic Major 
13 Cheek Puffer Caninus 
14 Dimpler Buccinnator  
15 Lip Conner Depressor  Triangularis 
16 Lower Lip Depressor Depressor Labii 
17 Chin Raiser Mentalis 
18 Lip Puckerer Incisivii Labii Superoris. 
Incisive Labii Inferioris 
20 Lip Stretcher Risorius 
22 Lip Funneler Orbicularis Oris 
23 Lip Tightener Orbicularis Oris 
24 Lip Pressor Orbicularis Oris 
16 
 
University of Ghana http://ugspace.ug.edu.gh
25 Lips Part  Depressor Labii or 
Relaxation of Mentalis 
26 Jaw Drop Masetter, Temporary and 
Internal Pterygoid Relaxed 
27 Mouth Stretch Pterygoids; Digastric 
28 Lip Suck Orbicularis Oris 
 
 
2.6 Typical Facial Expression Recognition (FER) System 
The general framework of facial expression classification or recognition involves the following 
stages (refer figure 4): image acquisition, pre-processing, feature extraction, feature selection 
and classification (Valero, 2016). The stages will briefly be described below. 
The first and foremost step in FER is the acquisition of either static or dynamic images. These 
images are either in the form of two-dimensional or three-dimensional spontaneous or posed 
images, captured either under controlled settings or “in the wild” conditions. The most utilised 
databases are the posed two-dimensional (2D) databases which includes Japanese Female 
Facial Expressions, Extended Cohn Kanade (CK+), Cohn Kanade (CK) (I . M Revina & 
Emmanuel, 2018) just to mention a few although these databases are still challenged with 
constraints of head pose and rotation variations (Pantic & Rothkrantz, 2000). 
Pre-processing is performed to address the challenges associated with the 2D images as well 
as other issues such as image segmentation, deformation, and illumination variation as well as 
the removal of background noise from the images (Yunxin Huang, Chen, Lv, & Wang, 2019).   
17 
 
University of Ghana http://ugspace.ug.edu.gh
The next crucial stage in pre-processing is face detection. Face detection is the first step in any 
face image analysis or facial expression classification. It is useful in determining the existence 
of face in an image as well as aligning the face for the efficient extracting of the relevant 
features (Bhardwaj & Dixit, 2016; Martinez & Valster, 2016). Face detection can be grouped 
into four categories namely: knowledge-based, appearance-based, template-matching and 
feature invariant methods (M. H. Yang, Kriegman, & Ahuja, 2002). The knowledge-based 
method detects faces based on rules from human knowledge of facial images. It is mostly used 
for face localisation. Feature-invariant methods detect images based on learning from the 
structural features on the faces which does not change irrespective of illumination or pose 
variation. Furthermore, for template-matching face detection, manually pre-determined by 
experts patterns or models or templates of either whole or part of the facial image are stored 
and a correlation is found between the input images and the stored facial patterns whilst for 
appearance-based method detects face from the comparing facial image to a set of training 
image data.  The knowledge-based method and feature-invariant method are mostly used for 
face localisation whilst the appearance-based method is mainly employed for both face 
detection (M. H. Yang et al., 2002).  
After face detection, feature extraction is performed. Feature extraction is considered the most 
important step in facial expression classification. It helps in representing the facial image 
effectively by extracting the subtle changes of a facial image into a feature vector (Abouyahya, 
El Fkihi, Thami, & Aboutajdine, 2016; Bhardwaj & Dixit, 2016). Generally, feature extraction 
is categorised into geometric features, appearance features and hybrid feature method. 
Geometric features extract features using facial shape location. Appearance features extract 
features based on the pixel intensity information or texture (Yu & Liu, 2015). Hybrid feature 
method fuses geometric and appearance methods. Further, the hybrid method can be classified 
into decision-level and feature-level. The decision-level hybrid method utilises voting classifier 
18 
 
University of Ghana http://ugspace.ug.edu.gh
to ensemble the decision of all the feature sets whilst the feature-level method concatenate all 
the feature sets into one feature vector (X. Huang, 2014). Each type of feature extraction 
method has its associated advantages and limitations. Geometric methods have low 
computational cost; however, they are sensitive to noise and image spatial transform whilst the 
appearance feature methods are stable and accurate to image spatial transform, but they are 
computation expensive in comparison to geometric feature methods. 
With the introduction of deep learning algorithms, these algorithms serve as their feature 
extractors or descriptors. Notable feature extraction methods include Histogram of Oriented 
Gradients (HOG), Gabor filter, Local Directional Pattern (LDP), Scale Invariant Feature 
Transform (SIFT), Linear Discriminant Analysis (LDA), Discrete Cosine Transform (DCT), 
Linear Binary Pattern (LBP) and Active Appearance Models (AAM). 
Due to the high dimensionality of extracted features, feature selection is performed to discard 
irrelevant features, retaining the important feature vectors for accurate and acceptable 
classification. Feature selection is useful in reducing the computational cost and memory usage, 
improving data quality hence the predictive accuracy and increasing the speed of the algorithm 
(Khalid, Khalil, & Nasreen, 2014; Ladha & Deepa, 2011). Feature selection techniques worthy 
of mentioning are Adaboost, Linear Discriminant Analysis, Independent Component Analysis, 
Whitened Principle Component Analysis Laplacian Eigenmaps, Local Linear Embedding and 
Principal Component Analysis (PCA). 
Classification is the final stage when conducting a facial expression classification experiment. 
Classification is in two folds: directly classifying into the various affective states or classifying 
into affective states after the detection of a particular action unit (Rizwan, 2013). The employed 
classifier categorizes the facial expressions into emotions: sadness, anger, joy, fear, happiness, 
smile, disgust and so on (I. M. Revina & Emmanuel, 2018). The algorithms utilised for 
19 
 
University of Ghana http://ugspace.ug.edu.gh
recognition of facial expressions are grouped into machine learning and deep learning 
algorithms. K-Nearest Neighbours, Naïve Bayes, Random forest, Hidden Markov Model, 
Extreme Learning Machine (ELM), Self-organising Maps (SOM), Sparse Representation 
based Classifier (SRC), Recurrent Neural Network (RNN), Deep Neural Network (DNN), 
Long Short-term Memory (LSTM) and Convolution Neural Network (CNN) are some variants 
of classification algorithms. 
 
Figure 2.3: processes of facial expression recognition (FER). 
Automatic facial expression recognition has become a hot research topic over the past two 
decades, with the utilisation of a plethora of algorithms. Thus, we seek to examine specifically 
the various algorithms utilized at the different phases of facial expression recognition 
experiments.  
2.6.1 Machine and deep learning FER 
In their work Zhong, Chen, & Liu (2014) proposed a novel method, Extended Nearest 
Neighbour for the classification of the facial expressions of the Japanese Female Facial 
expression databases. Gabor filter was deployed for the feature extraction process and Principal 
Component Analysis (PCA) for feature dimensionality reduction. The proposed method 
resulted in an accuracy of 93.01%, which in comparison to Gabor plus Support Vector Machine 
20 
 
University of Ghana http://ugspace.ug.edu.gh
(SVM) and Gabor with Neural Network attained accuracies of 91% and 90.01% respectively. 
Following, SVM and K-Nearest Neighbour (KNN) were utilized by the authors as classifiers 
to investigate on the performance of the feature extractors: Principal Component Analysis 
(PCA) and local binary pattern (LBP) on the JAFFE and Mevlana University Facial Expression 
(MUFE) database. SVM and JAFFE outperformed KNN and MUFE with regards to classifier 
and database respectively (Abdulrahman & Eleyan, 2015). Similarly, SVM outperformed KNN 
in a comparative analysis of their performance experimented on Extended Cohn Kanade (CK+) 
and Binghamton University 3D Facial Expression Database (BU-3DFE) databases. It was 
observed there was difficulty in classifying, as anger, fear and disgust produced similar results 
(Saeed, Al-Hamadi, Niese, & Elzobi, 2014). Likewise, Michel & El Kaliouby (2015) presented 
SVM for the classification of images of Cohn Kanade (CK) database. The presented algorithm 
attained an accuracy of 87.9%. Vo & Le (2016) proposed a fusion of CNN and SVM on Cohn 
Kanade (CK) database, achieving an accuracy of 96.04%. CNN served as the feature extraction 
method and SVM as the classification algorithm. Additionally, Mayya, Pai, & Manohara Pai 
(2016) performed facial expression recognition utilising Deep Convolution Neural Network 
(DCNN)  and SVM for feature extraction and classification respectively on JAFFE and CK+ 
datasets. The proposed method attained an accuracy of 98.12%.  In the work by Y. D. Zhang 
et al. (2016), biorthogonal wavelet entropy and fuzzy multiscale SVM were employed for facial 
expression classification to extract multiscale features as well as solve issues of noise and 
outliers. The proposed method achieved an accuracy of 96.77%. Furthermore, Kiran & Kushal 
(2016) presented Support Vector Machine for a multiclass facial expression classification on 
Japanese Female Facial Expression Database (JAFFE), Indian Facial Expression Image 
Database (IFED) and Taiwanese facial Expression Database (TFEID). The features were 
extracted using Bidirectional Local Binary Pattern resulting in recognition accuracies of 
94.77%, 94.77% and 90.41% respectively. Also, the study indicated, anger, disgust and fear 
21 
 
University of Ghana http://ugspace.ug.edu.gh
had the same accuracy value of 88.88%. Additionally, Z. Wang, Jiang, Jiang, & Zhou (2016) 
employed SVM with a radial basis function kernel to distinguish the facial expressions of 
JAFFE database into 7 different facial expressions after learning the sparse representations of 
the facial images with K-SVD. More, a multi-class SVM with radial basis kernel was deployed 
in classifying facial expressions of CK+ after the pre-processing and extraction of the features 
with Viola Jones and Edge-Histogram Oriented Gradient (E-HOG) separately attaining 
accuracy of 96.4% (Candra, Yuwono, Chai, Nguyen, & Su, 2016). Furthermore, from a 
comparative study of their performance on the classification of facial expressions into the basic 
emotions, weighted feature gaussian kernel function SVM (WF-SVM) outperformed SVM 
with a gaussian kernel function with an average precision value of 93% to 83% (Wei & Jia, 
2016). Besides, Borui, Liu, & Xie (2017) grouped the images of JAFFE database into the basic 
emotions using a multi-class SVM and Local Binary Pattern (LBP), Local Phase Quantization 
(LPQ) based on Gabor wavelet and Principal Component Analysis plus Linear Discriminant 
Analysis for extraction and selection of features respectively. The proposed method achieved 
an accuracy of 98.57 having anger and fear with the same accuracy value of 100. More, a 
comparative study of the performance of facial expression classification algorithms namely: 
Support Vector Machine (SVM), K-Nearest Neighbour (KNN) and Random Forests were 
conducted using extended Cohn-Kanade dataset (CK+). SVM surpassed KNN and Random 
Forest with accuracies of 80%,75.15% and 76.97% respectively for considerably small amount 
of dataset. However, for large dataset, KNN and Random Forest outperformed SVM with 
accuracies of 98.85%, 98.85% and 90% respectively. Further, the results indicated some 
misclassification of anger and disgust facial expressions (Nugrahaeni & Mutijarsa, 2017). 
Further, Rashid et al. (2017) employed  KNN and SVM with radial basis function to classify 
facial expressions of the JAFFFE dataset, having Viola Jones algorithm and cross-correlation 
as the pre-processing and feature extraction method accordingly. The results indicated KNN 
22 
 
University of Ghana http://ugspace.ug.edu.gh
was an optimal algorithm for the experiment with an overall accuracy of 92.48%. Again, anger, 
disgust and fear had matrix values of 93.33, 93.10 and 93.75. Verma & Khunteta (2017) on the 
other hand, employed Gabor filter and ANN to categorize the facial expression of JAFFE 
database, achieving an accuracy of 85.7%. Likewise, Qayyum, Majid, Anwar, & Khan (2017) 
deployed Artificial Neural Network (ANN)  to perform a facial expression classification of the 
databases: JAFFE, CK+ and MS-Kinect databases. Stationary wavelet transforms and Discrete 
Cosine Transform (DCT) were used for feature extraction and feature selection respectively. 
JAFFE surpassed the other databases with an accuracy of 98.83% from the outcome of the 
experiment. Breuer & Kimmel (2017) conducted facial expression classification and 
performance analysis on Extended Cohn Kanade, FER2013 and NovaEmotions databases 
using convolutional neural network (CNN). CNN on the Extended Cohn Kanade (CK+) 
database outperformed the other classifiers such as Gabor with SVM, LBPSVM with an 
accuracy of 98.62%, 89.8% and 95.1% respectively. Likewise, Lopes, de Aguiar, De Souza, & 
Oliveira-Santos (2017) compared the performance of six facial expressions to seven facial 
expressions using Extended Cohn Kanade (CK+), JAFFE and Binghamton University 3D 
Facial Expression (BU-3DFE) databases using CNN. The result indicated CNN performs best 
on CK+ when classifying six facial expressions with an accuracy of 98.92%. Also, Alizadeh 
& Fazel (2017) developed a convolutional neural network (CNN) for a facial expression 
recognition task and classified the facial expressions into anger, happiness, fear, neutral, 
surprise, sad and disgust using facial FER-2013 dataset. Likewise, Revina & Emmanuel (2018) 
classified facial expressions of the JAFFE database utilizing Particle Swarm Optimization 
based K-Nearest Neighbour (PSO-KNN). The features were extracted using Local Descriptor 
with Modified Gray value Accumulation Value (LD-MGAD). The proposed model attained an 
accuracy of 97.1%, with anger and fear obtaining similar values as anger and sad facial 
expressions misclassified as fear. M. I. Revina & Emmanuel (2018) presented SVM for the 
23 
 
University of Ghana http://ugspace.ug.edu.gh
classification of facial expressions of JAFFE and CK+ databases, achieving an accuracy of 
88.63%. Local Directional Number (LDN) Pattern and Directional Gradient Local Ternary 
Pattern (DGLTP) were adopted for the feature extraction process. Further, Zarbakhsh & 
Demirel (2018) investigated the use of 3D images for facial expression detection to find an 
optimum low-dimensional feature sub-space for 3D facial expressional detection on 
Binghamton University known as BU- 3DFE dataset. Support Vector Machine (SVM) and 
Fuzzy SVM (FSVM) were utilized for the classification along with sequential feed feature 
selection (SFFS) and conventional t-test for the feature selection process. The results indicated 
an average accuracy of 87.67% for SFFS and FSVM and a matrix value of 85 for both anger 
and disgust. Likewise, SVM was adopted in classifying the facial expressions along with 
performing a comparative assessment of MMI, extended Cohn-Kanade (CK+) and static face 
in the wild (SFEW) databases. Weber local descriptor, a dual-fusion feature extraction method 
as well as discrete cosine transform (DCT) were utilised for both feature extraction and 
selection accordingly. The results confirmed CK+ as the excelling database from a comparative 
assessment of the databases. Also, it was difficult in classifying anger due to its 
misclassification as disgust or neutral. A deep convolutional neural network inspired by 
XCEPTION was proposed by Raksarikorn & Kangkachit (2018) to classify seven facial 
expressions using FER-2013 dataset. The suggested model outperformed the XCEPTION 
attaining an accuracy of 71.69%, 72.91% and 70% accordingly. In addition, Kumar, Kumar, & 
Sanyal (2018) deployed convolutional neural network (CNN) for training and classification of 
facial expressions of FERC-2013 and Extended Cohn Kanade databases (CK+) into seven 
emotions namely anger, neutral, sad, happy, disgust, surprised and fear, achieving an accuracy 
of 90+%. Similarly, Mohammadpour, Khaliliardali, Hashemi, & Alyannezhadi (2018) adopted 
CNN to group facial expressions of CK+, JAFFE and BU-3DFE databases. The proposed 
method achieved an accuracy of 97.01% having CK+ as the excelling database. Li (2018) used 
24 
 
University of Ghana http://ugspace.ug.edu.gh
convolutional neural network (CNN) to classify facial expressions of JAFFE, CK+ and FER-
2013 databases. JAFFE database achieved the topmost accuracy of 97.65. The results also 
showed anger and fear having closely related values. Also, CNN was utilized in classifying the 
images of the databases: JAFFE and CK+. The algorithm performed better on JAFFE than on 
CK+ database; having anger, fear and disgust have similar values (Farajzadeh & Hashemzadeh, 
2018). Similarly, conditional neural network enhanced random forest (CoNERF) was utilized 
in classifying facial expressions of CK+, JAFFE, BU-3DFE and Labelled faces in the wild 
(LFW) databases. The proposed algorithm on JAFFE and CK+ obtained an accuracy of 
99.02%, having anger and disgust with closely related values (Y. Liu et al., 2018). In addition, 
artificial neural network specifically multilayer perceptron with back propagation was utilized 
in categorization facial expressions of JAFFE, CK+ and Radboud Faces Database (RaFD) 
databases. Accuracies of 94.81%, 99.51% and 99.15% were obtained for the databases 
respectively (Islam, Mahmud, Hossain, Mia, & Goala, 2019). More, SVM excelled than KNN 
and multilayer perceptron (MLP) in a classification of facial expressions: normal, happy, angry, 
contempt, surprise, sad, fear and disgust of the Cohn Kanade database with accuracies of 
93.53%, 82.97% and 79.79% separately. Histogram of oriented gradients was utilized for the 
feature extraction and PCA for the feature selection (Dino & Abdulrazzaq, 2019). Fan & 
Tjahjadi (2019) used combination of handcrafted and convolutional features with the 
classification algorithm SVM for facial expression recognition, achieving an accuracy of 
92.15%. Also, Bellamkonda & Gopalan (2019) detected and classified facial expressions of the 
following databases: JAFFE, Cohn Kanade, MMI and Karolinska Directed Emotional Faces 
(KDEF) employing SVM with either local binary classifier or Gabor wavelet as the feature 
extraction algorithm. From the outcome, SVM plus Gabor wavelet gave a surpassing accuracy 
of 98.83% on the KDEF database.  
25 
 
University of Ghana http://ugspace.ug.edu.gh
Furthermore, Dubey & Dixit (2019) classified the facial expressions of JAFFE, CK+ and FER-
2013 using CNN. The proposed algorithm achieved a classification accuracy of 97.90% on the 
images of CK+ and JAFFE into the various expressions. A deep convolutional neural network 
(DNN) was proposed to group the images of CK+ and JAFFE into the various expressions. 
Precision, recall, ROC and accuracy were utilized in evaluating the experiment; the algorithm 
performed excellently on the JAFFE database with an accuracy of 95.23% to 93.24% on CK+ 
dataset (D. K. Jain, Shamsolmoali, & Sehdev, 2019). In a quest to improve the performance of 
end-to-end frameworks for facial expression recognition for deep learning methods, Minaee & 
Abdolrashidi (2019) proposed attentional convolutional neural network using databases: 
JAFFE, FER-2013, CK+ and Facial Expression Research Group Database (FERG) and CK+. 
The proposed algorithm performs best on CK+ with an accuracy of 98%. Notwithstanding the 
advantages of the convolutional neural network, Sharma & Jain (2019) identified CNN to have 
a drawback of handling spatial information. Hence, the researchers proposed a bidirectional 
Long Short-Term Memory (LSTM) to categorize the facial expressions of the Cohn Kanade 
database as LSTM has an advantage of memory efficiency.  
2.6.2 Ensemble learning algorithms 
T H H Zavaschi & Koerich (2011) utilised fused feature of Gabor and LBP and an ensemble 
of base classifiers SVM and used a multiobjective genetic algorithm (MOGA) as the pareto-
optimal for selecting the best of classifiers. The proposed method achieved an accuracy of 
96.2%. Also, Pons & Masip (2018) proposed an ensemble of CNN committee classifiers for 
facial expression recognition achieving an accuracy of 39.3%. More, D. H. Nguyen et al. 
(2019) performed FER using an ensemble of multilayer CNN obtaining an accuracy of 74.09%. 
Likewise, W. Sun, Zhao, & Jin (2019) proposed an ensemble of CNN for FER attained an 
accuracy of 96.15%. Xu, Pang, & Jiang (2019) conducted FER using a fusion of geometric 
features (HOG and DHOG) and achieved an accuracy of 96.4%. 
26 
 
University of Ghana http://ugspace.ug.edu.gh
Table 2.2: summarises algorithms utilised by researchers for facial expression recognition. 
Work Feature Feature Algorithm Database Performance 
extraction selection metric used 
(Zhong et Gabor PCA Extended JAFFE accuracy 
al., 2014) Nearest 
Neighbour 
(Abdulrah PCA/LBP  SVM /KNN JAFFE/MUFE Confusion 
man & matrix/accurac
Eleyan, y (77% and 
2015) 87%) 
(Saeed et Geometry-  SVM/KNN CK+ Confusion 
al., 2014)  based  matrix 
(Nugrahae   SVM/KNN/ CK+ Accuracy 
ni & Random 
Mutijarsa, Forest 
2017) 
 
 
(Rashid et  cross-  SVM/KNN JAFFE Confusion 
al., 2017) correlation/ matrix 
MBWM 
27 
 
University of Ghana http://ugspace.ug.edu.gh
(Wang et K-SVD  SVM with JAFFE Accuracy 
al., 2016)  radial basis (97.138%) 
function 
kernel 
(Candra et E-HOG  SVM with CK+ Accuracy 
al., 2016) radial basis (96.4%) 
function /confusion 
kernel matrix 
(Wei &   Weighted CK+ Precision (93%) 
Jia, 2016) feature 
gaussian 
kernel SVM 
(Kiran & Bidirectiona  Multi-class JAFFE/TFEID/ Accuracy 
Kushal, l LBP SVM IFED (97.10%) 
2016) 
(Mahmoo Weber local DCT SVM MMI/CK+/SFE Confusion 
d, descriptor W matrix/accurac
Hussain, y (98.62%) 
Iqbal, & 
Elkilani, 
2019) 
28 
 
University of Ghana http://ugspace.ug.edu.gh
(Qayyum Stationary DCT Artificial JAFFE/CK+/M Confusion 
et al., wavelet neural s-Kinect matrix 
2017) transform network 
(Dino & Histogram PCA SVM/KNN/ CK+ accuracy 
Abdulrazz oriented MLP (93.53%) 
aq, 2019) gradient 
(Islam et Gabor PCA ANN JAFFE/CK+/Ra Confusion 
al., 2019) (Multilayer FD matrix/accurac
Perceptron) y 
(99.51%) 
(Raksarik   CNN FER-2013 Accuracy/preci
orn & sion/recall 
Kangkachi
t, 2018) 
(Kumar et CNN  CNN FER-2013/CK+ accuracy 
al., 2018) 
(Minaee & Attentional  Attentional JAFFE/CK+/F accuracy 
Abdolrash CNN CNN ERG/FER-2013 
idi, 2019) 
(Vo & Le, CNN  SVM CK Accuracy 
2016) (96.04%) 
29 
 
University of Ghana http://ugspace.ug.edu.gh
(Sharma & LSTM  Bidirectiona CK  
Jain, l LSTM 
2019) 
(Breuer & CNN  CNN CK+/FER- accuracy 
Kimmel, 2013/NovaEmo
2017a) tions. 
(Lopes et CNN  CNN CK+/JAFFE/B accuracy 
al., 2017) U-3DFE 
(Dubey & CNN  CNN JAFFE/CK+/F accuracy 
Dixit, ER-2013 
2019) 
(Verma & Gabor filter  ANN JAFFE accuracy 
Khunteta, 
2017) 
(Bellamko Gabor/LBP  SVM JAFFE/MMI/K Accuracy 
nda & DEF (98.83) 
Gopalan, 
2019) 
(Alizadeh   CNN FER-  
& Fazel, 2013/Kaggle 
2017) 
30 
 
University of Ghana http://ugspace.ug.edu.gh
(Zarbakhs  SFFS/conv Fuzzy SVM BU-3DFE Confusion 
h & entional matrix 
Demirel, ttest Accuracy 
2018) (87.67%) 
(Revina & LD-MGAD  PSO-KNN JAFFE Accuracy/confu
Emmanuel sion matrix 
, 2018) 
(Mohamm CNN  CNN JAFFE/CK+/B Accuracy/confu
adpour et U-3DFE sion matrix 
al., 2018) 
(Borui et LBP+LQP+  PCA-LDA JAFFE accuracy 
al., 2017) Gabor 
(Z. Li, CNN  CNN JAFFE/CK+/F Accuracy/confu
2018) ER-2013 sion matrix 
(Y. Liu et CNN  CoNERF JAFFE/CK+/L Confusion 
al., 2018) FW matrix/accurac
y 
(Xu et al., HOG PCA SVM JAFFE Accuracy 
2019) +DHOG (96.4%) 
(W. Sun et   Ensemble of CK+ Accuracy 
al., 2019) CNN (96.15%) 
31 
 
University of Ghana http://ugspace.ug.edu.gh
(T H H Gabor+LBP  Ensemble of CK+/JAFFE Accuracy 
Zavaschi SVM  (96.2%) 
& 
Koerich, 
2011) 
(Farajzade CNN  CNN CK+/JAFFE Confusion 
h & matrix/ ROC/ 
Hashemza Precision 
deh, 2018) 
(M. I. LDN/DGL  SVM JAFFE/CK+ Accuracy 
Revina & TP (88%) 
Emmanuel
, 2018) 
(Michel &   SVM CK Accuracy 
El 
Kaliouby, 
2015) 
(D. K. Jain CNN  DNN CK+/JAFFE Accuracy/Preci
et al., sion/ 
2019) Recall/ROC 
(Abouyah   KNN+DTW CK+ Precision/Recal
ya & l 
32 
 
University of Ghana http://ugspace.ug.edu.gh
Fkihi, 
2018) 
(Y. D. Biorthogon  Fuzzy Captured Accuracy 
Zhang et al Wavelet Multiclass images (96.77%)/ 
al., 2016) Entropy SVM Confusion 
matrix 
(Mayya et DCNN  SVM JAFFE/ CK+ Accuracy/ 
al., 2016) Confusion 
matrix 
(98.12%) 
(Fan & CNN  SVM CK+ Accuracy 
Tjahjadi, features (92.5%) 
2019) plus shape 
and 
appearance 
features 
 
2.7 Databases for facial expression recognition 
There exist standardised databases for facial expression recognition. These databases are 
employed for evaluating the performance of facial expression algorithms to make a meaningful 
comparison. These databases differ in terms of the following (Rizwan, 2013): 
1. The uniqueness of subjects: characteristics such as face, shape, colour, number of 
subjects for capturing the images, age, ethnicity, skin colour distinguishes one database 
33 
 
University of Ghana http://ugspace.ug.edu.gh
from another. For instance, for some databases, there is an equal number of male and 
female subjects whilst others used only female subjects.  
2. Posed versus spontaneous expressions: the databases are composed by asking the 
subjects to perform a series of expressions or are captured naturally. Most of the 2D 
databases are posed databases such as JAFFE with a few exceptions such as MMI which 
contains spontaneous smile facial expressions in addition to the posed facial 
expressions. Currently, FER is moving towards the use of spontaneous databases for 
facial expression analysis.   
3. Face or head orientation: the databases capture their images from different angles, and 
this influence the performance of the facial expression algorithms. For example, the 
KDEF database has its expressions captured in five angles: full right profile, the full 
left profile, half right profile, half left profile and straight (Goeleven, De Raedt, 
Leyman, & Verschuere, 2008). 
2.8 Limitation of current work and contribution 
We have found shortcomings in the FER process. The current systems can detect facial 
expression in general or a subset of emotions. For instance, an algorithm categorizes the basic 
emotions namely disgust, anger, sadness, fear, surprise, and happiness plus neutral or a subset 
of it such as anger, disgust, and fear. To the best of our knowledge, there have not been any 
studies on how to detect anger only using facial expressions. More, with the multiclass 
classification of emotions, there are some drawbacks such as an overlap among the facial 
expressions namely disgust, anger and fear due to the slight distinction among them, giving an 
untrue representation of the emotions (Pell & Richards, 2011). Further, in confirmation to 
Pell’s claim, concerns have been raised about the misclassification between angry and disgust 
facial expressions as observed from the results of experiments (Apte et al., 2016; Kwong, 
34 
 
University of Ghana http://ugspace.ug.edu.gh
Garcia, Abu, & Reyes, 2019; Nugrahaeni & Mutijarsa, 2017; Y. D. Zhang et al., 2016 ) (Kiran 
& Kushal, 2016; Talele, Shirsat, Uplenchwar, & Tuckley, 2017).  
It is postulated a majority of the facial expression algorithms have difficulty in performing a 
multi-class classification, which is attested to with regards to the training time, computational 
time and the insufficient memory space (Kiran & Kushal, 2016; Shah et al., 2017). Also, 
despite the rapidly growing literature on anger detection, what is known is largely based on the 
utilization of either physiological signals or audio or speech data (Chang et al., 2012; Chhabra 
et al., 2017; Deng et al., 2018). Not much is known about the detecting of only anger using 
facial expressions.  We argue that anger detection needs to be done accurately, giving a true 
representation of the emotion. As the detection of anger will provide useful information about 
peoples’ intensity of anger to manage or control it, as unregulated anger sometimes results in 
aggression or violence (Moritz, 2006; Shahsavarani et al., 2015). Therefore, our contribution 
in this respect in these ways: 
1. We have illustrated that only anger can be detected with facial expressions and facial 
expression algorithms (for reference read chapter 5). 
2. We propose a novel ensemble learning algorithm for our anger recognition. For 
reference see chapter 3. 
3. We have shown that our results can achieve high accuracies which exceed the state-of-
the-art results. For reference read chapters 4 and 5. 
 2.9 Chapter summary 
In this chapter, we have gained a comprehensive insight into our research, identified the 
limitations, and this helped us to effectively organise our work to achieve our objectives. The 
next chapter will detail how we plan to undertake our research work specifically explaining the 
methods we intend to deploy.  
35 
 
University of Ghana http://ugspace.ug.edu.gh
  
36 
 
University of Ghana http://ugspace.ug.edu.gh
Chapter 3 
Methodology 
 
3.0 Introduction  
This chapter presents all the details related to the methods utilised during the different phases 
of facial expression recognition as well as the databases used during this research work. Also, 
it justifies the selected algorithms and databases.  
3.1 Workflow 
This section details the methods deployed, which is depicted in figure 3.11 as well as the stages 
by which the research work was realised. The overall work was undertaken in the following 
phases: image acquisition, pre-processing, feature extraction, feature selection and 
classification.  
3.1.1 Image acquisition 
In conducting research involving machine learning and deep learning algorithms, there is a dire 
need for datasets since these algorithms are driven by data. As such, careful diligence needs to 
be observed in the selection of the datasets for facial expression recognition, as an inappropriate 
selection such as datasets with noisy backgrounds might result in increasing the difficulty of 
the project as well as affect the overall recognition accuracy.  
There are several facial expression databases; some can be easily downloaded whilst others 
require permission to be obtained before they can be downloaded. The permission is obtained 
by the mere filing of a form for formality sake and to prove the usage of the dataset strictly for 
academic purposes. The databases differ in terms of the following characteristics: dimensions 
namely two-dimensional and three-dimensional, image quality, posed and spontaneous, static 
and image sequences databases. The 2-dimensional databases are the most utilized due to their 
37 
 
University of Ghana http://ugspace.ug.edu.gh
availability publicly although there are challenged by with constraints of head pose and rotation 
variations (Pantic & Rothkrantz, 2000). On the other the 3-dimensional database is robust to 
these constraints due to the use of the 3D face scanner, are computationally expensive, 
requiring a lot of resources. Hence, these reasons account for the wide adoption of 2D facial 
expressions.  
Therefore, for research work, after considerable weighing and evaluation of the pros and cons 
of both the two-dimensional and three-dimensional databases, the Japanese Female Facial 
Expression (JAFFE), Karolinska Directed Emotional Faces (KDEF) and Extended Cohn 
Kanade database (CK+) were selected. These 2D databases offer frontal, posed, noise-free 
background with labelled and validated images. A description of these databases are as follows. 
Nevertheless, much as we are aware these databases are not of African people, which is what 
we desire to make our work relevant to our people, they do serve a useful purpose to enable us 
quickly test our method against established methods. Therefore, our future work will focus on 
creating a database of African people.  
3.1.1.1 JAFFE database 
The Japanese Female Facial Expression database contains 213 photographed images from ten 
Japanese females who displayed the basic emotion: anger, disgust, fear, surprise, sadness, 
happiness and neutral (Lyons et al., 1998). Each of the ten Japanese females posed 3 or 4 times 
per expression. The images are in grayscale and tiff format with a resolution of 256*256 pixels. 
The JAFFE database is devoid of occlusion and illumination variation as the ten females were 
asked to tie their hair and show the real face. Also, adequate lightning was provided during 
capturing of the facial expression as the images were captured in a controlled environment. 
38 
 
University of Ghana http://ugspace.ug.edu.gh
 
Figure 3.1 shows sample images of the JAFFE database.  
3.1.1.2 CK+ database 
The Extended Cohn Kanade is an extension of the Cohn Kanade database (Kanade, Cohn, & 
Tian, 2000); an addition of spontaneous smile facial expressions whilst the remaining are posed 
facial expressions. It contains 529 image sequences from 123 subjects, within the ages of 18 to 
50 years. Females form 69% and males 31%, having 81%, 13% and 6% as Euro-American, 
Afro-American and others accordingly. The image sequences differ in duration that is 6 to 10 
seconds per frame and was videoed from the onset which is a neutral emotion to the formation 
of the peak emotion. The images sequences are well labelled and validated in png format, with 
a resolution of 640*490 pixels (Lucey et al., 2010). 
  
 
39 
 
University of Ghana http://ugspace.ug.edu.gh
Figure 3.2: displays a sample CK+ images. Left-to-right from top row: anger, disgust, 
happiness, surprise, angry, contempt, sadness. 
3.1.1.3 KDEF database  
Karolinska Directed Emotional Faces (KDEF) database (Goeleven et al., 2008) is a posed facial 
expression database made up of 4900 images from 70 different subjects, 35 males and 35 
females within the ages of 20 to 30 years. The 70 subjects displayed the 7 different emotional 
expressions, each expression photographed twice and shot at 5 different angles. The subjects 
are without visible make-up, spectacles, ornaments, beards and moustaches (Goeleven et al., 
2008). 
 
Figure 3.3: shows KDEF images captured from different angles 
3.1.2 Pre-processing 
Facial pre-processing is the first step in facial expression recognition. Pre-processing of facial 
expressions is performed to discard irrelevant information and improve the recognition 
accuracy of the important extracted features. It involves processes such as face detection and 
other image modification methods such as smoothening and normalisation.  
40 
 
University of Ghana http://ugspace.ug.edu.gh
3.1.2.1 Face detection  
Face detection is the first step in any face image analysis or facial expression classification. It 
is useful in determining the existence of face in an image as well as aligning the face for the 
efficient extracting of the relevant features (Bhardwaj & Dixit, 2016; Martinez & Valster, 
2016). 
In this work, a face detection method proposed by Viola and Jones was employed. The viola 
jones face algorithm is a widely adopted face detection algorithm due to its robustness and 
computational simplicity; it works in the following phases: haar-features, integral image,  
Adaboost and a cascade of classifiers (Viola & Jones, 2004).  
The algorithm classifies images in the form of the value of simple features. The feature method 
is preferred over the pixel-feature since it operates in a much faster way. The features are 
classified into three groups namely: two-rectangle feature, three-rectangle feature, and four-
rectangle feature. The two-rectangle feature is the value of the difference of the sum of pixels 
between the two rectangular regions. The three-rectangle feature quantifies the difference 
between the sum of pixels of the two exterior rectangles and the addition of pixels within the 
central rectangle and the four-feature rectangle computes the variation within the diagonal pairs 
of rectangles (Viola & Jones, 2004). The haar-like features are displayed in figure 3.4.  
However, the total number of features can be extremely huge, hence the proposal of an integral 
image for faster evaluation. The integral image is utilised as an image representation and it 
obtains the sum of values of a rectangular area.  It is defined in equation 3.1: 
𝑘(𝑥, 𝑦) = ∑ ! ! 𝑏(𝑥! !" #",% #% , 𝑦 )																																																																																												 3.1 
where 𝑘(𝑥, 𝑦) and 𝑏(𝑥, 𝑦)is the integral image and actual image respectively (Viola & Jones, 
2004). 
41 
 
University of Ghana http://ugspace.ug.edu.gh
The integral image sums the values of the rectangular feature from the origin to the point (x, 
y). Equation 3.2 is performed the calculation of the sum of values in a rectangular area bounded 
by (𝑥&𝑦&) and (𝑥'𝑦') and 𝐴𝑟𝑒𝑎& < 𝐴𝑟𝑒𝑎'. This is displayed in figure 3.5. 
 
𝑖(𝑥, 𝑦) = 𝑖(𝑥', 𝑦') − 𝑖(𝑥&, 𝑦') − 𝑖(𝑥', 𝑦&) + 𝑖(𝑥&, 𝑦&)																																																		3.2 
 
Figure 3.4 shows the rectangle features. A and B are the two-rectangular features and C and 
D are the three-rectangular features and four-rectangular features respectively (Viola & 
Jones, 2004).  
 
Figure 3.5 displays the calculation of area using integral image (Valero, 2016). 
Although the calculation of the values in the rectangular area has been simplified, there is still 
a requirement for an amount of processing power to work on the whole image. Therefore, 
42 
 
University of Ghana http://ugspace.ug.edu.gh
Adaboost was introduced to select the relevant features from the face. Then it is passed to a 
complex cascade of classifiers for faster face detection. 
In summary, the proposed method works in a form of haar-like features which will extract 
important characteristics from the face which we intend to use to train a machine learning 
algorithm which will detect faces in real-time.  
3.1.2.2 Image enhancement 
Image enhancement is an important factor in pre-processing of images as all images marginally 
contain some level of noise. Image enhancement is useful in extracting features accurately and 
efficiently as it helps in determining the accurate features to be extracted for classification 
which in turn improves the learning ability of classifiers (Tan & Jiang, 2019). The images were 
enhanced using the following methods: Contrast Limited Adaptive Histogram Equalisation and 
median blur. The main aim of image enhancement is to remove noise in images, thereby 
improving their quality.  
3.1.2.2.1 Median blur  
Median blur is a nonlinear technique used to remove noise from images. It is characterised by 
its low computational speed and simplicity. Median blur is useful in preserving edges of an 
image.  It operates as a filter sliding pixel by pixel over an image, exchanging the centre pixel 
with the median of gray levels in a window (Niu, Zhao, & Ni, 2017) 
A two-convolutional median filter is symbolised as:  
																																								𝐼4() = 𝑚𝑒𝑑𝑖𝑎𝑛8𝐼(*+,)*,9                                                 3.3 
Given a H*W image with 1𝐽𝑘 ∈ {1,2… ,𝐻} × {1,2… ,𝑊} and having 𝑤, 𝑣 ∈ (−(𝑤 − 1) ∕
2,⋯ (𝑤 − 1) ∕ 2) 
43 
 
University of Ghana http://ugspace.ug.edu.gh
3.1.2.2.1 Histogram equalisation 
Histogram equalisation is a pre-processing technique performed to enhance the contrast of an 
image after the removal of noise to extract features accurately and clearly. It works by 
transforming the intensity values of an image and can be described by the equation: 
(
																			𝐼( = 𝑇I𝑟(J = ∑
.
)/'𝑃-(𝑟
0"
)) = L 																																																						 3.4 
)/' 0
Where 𝐼( is the intensity value in the processed image which corresponds to 𝑟( in the input 
image having input image’s intensity values as 𝑃-(𝑟)) 	= 	1,2,3, . . . , 𝐿. 
However, histogram equalisation has a drawback, it sometimes contrasts images and has the 
pixels fall within the same range. This intensifies the noise level of the image since it works on 
the whole face. 
Therefore, in this work, we adopted contrast limited adaptive histogram equalisation (CLAHE), 
an extension of histogram equalisation for our image enhancement. CLAHE highlights and 
evaluates the facial regions. Then it divides the facial regions into tiles and applies the 
histogram equalisation on each tile (Zuiderveld, 1994).   
3.1.3 Feature extraction  
After the pre-processing stage, the next step is to extract the features. Feature extraction is 
mostly considered the most important step in facial expression classification, as the selection 
of the features is an important task. It helps in representing the facial image effectively by 
extracting the subtle changes of a facial image into a feature vector (Abouyahya et al., 2016; 
Bhardwaj & Dixit, 2016).  
Therefore, in this work, two types of feature extraction methods are utilised: appearance and 
hybrid feature extraction method; LBP and fusion of HOG and LBP, respectively. These 
feature extraction methods resolve the two-dimensional database issue of illumination invariant 
as they are robust to it.  
44 
 
University of Ghana http://ugspace.ug.edu.gh
3.1.3.1 Local Binary Pattern  
The Local Binary Pattern (LBP) is a commonly used appearance feature extraction method due 
to its numerous advantages. LBP has been widely adopted because it is easy to implement, 
invariant to rotations, robust to grayscale transformations caused by illuminations variations, 
tolerance to illumination, overcomes the problems of disequilibrium displacement, possess 
discriminative power, uses a modest amount of data and can save computational resource 
whilst retaining facial information (Ojala, Pietikäinen, & Mäenpää, 2002). LBP originally 
proposed by Ojala, Pietikäinen, & Harwood (1996), is a 2D texture algorithm distinct from the 
other traditional statistical and structural models for texture analysis. It comprises of two 
components: pattern and contrast, where contrast is the amount of texture.  LBP operator 
thresholds between the centre pixel producing features with binary codes: 0 or 1 with a 3*3 
locality pixel (see figure 3.6).  
Nonetheless, the 3*3 neighbourhood performs poorly in encoding textures with large 
appearance changes and it is also unable to capture nonlocal macrotexture (L. Liu, Fieguth, 
Guo, Wang, & Pietikäinen, 2017). Therefore, the LBP operator has been extended with diverse 
neighbourhood sizes to handle the different scales. The extended LBP uses circular 
neighbourhoods and bilinearly interpolating the pixel values, which allows any number of 
points 𝑃 and radius	𝑅 in the neighbourhood (Ojala et al., 2002). Figure 3.7 displays the various 
neighbourhood sizes.  
According to equation 3.5, a grayscale image 𝐼(𝑥, 𝑦) is considered and let 𝑘1  symbolise the 
intensity of an arbitrary pixel (𝑥, 𝑦).Then,  𝑘2 represent the gray value of a sampling point in 
an evenly spaced circular neighbourhood of 𝑃 sampling points and radius 𝑅 around the point 
(𝑥, 𝑦) (X. Huang, 2014). The LBP operator is defined as: 
																					ℎ3,4(𝑥1 , 𝑦 ) = R
35'
1 2/& 𝐿I(𝑘1 − 𝑘2) ≥ 0J2
2																															3.5 
45 
 
University of Ghana http://ugspace.ug.edu.gh
 
Figure 3.6: operation of the LBP operator. 
 
Figure 3.7: Three neighbour sets for different (P, R) used to construct a circularly symmetric 
LBP (Ojala et al., 2002). 
 
 
Figure 3.8: displays the operation of the LBP operator (Farajzadeh & Hashemzadeh, 2018). 
46 
 
University of Ghana http://ugspace.ug.edu.gh
3.1.3.2 Histogram of Oriented Gradients 
HOG was first proposed by (Dalal & Triggs, 2005) for target detection. Due to its impressive 
performance, it has been widely adopted by researchers for facial expression recognition (Xu 
et al., 2019). HOG employs local gradient to define the shape of an image (Farajzadeh & 
Hashemzadeh, 2018).  
The HOG operator works by dividing the images into small connected regions labelled cell and 
computes the gradient direction histogram for every single cell. Then, a HOG descriptor is 
formed from the combination of the gradient direction histogram (Dalal & Triggs, 2005). 
HOG is characterised by two parameters: the size of cell and number of bins orientation. The 
size of the cell which represents the size per column and size per row is utilised in computing 
the histogram whilst the number of bins orientation is for the construction of the distance of 
the angles of the gradient (Nassih, Amine, Ngadi, & Hmina, 2019).  
Mathematically, the HOG feature extraction is defined as follows:(Xu et al., 2019) 
1. Using a one-dimensional differential template [-1,0,1], the gradient value and gradient 
direction at a pixel ((𝑎, 𝑏) is calculated as follows: 
																																											𝑅6(𝑎, 𝑏) = 𝐼(𝑎 + 1, 𝑏) − 𝐼(𝑎 + 1, 𝑏)  3.6  
																																											𝑅7(𝑎, 𝑏) = 𝐼(𝑎, 𝑏 + 1) − 𝐼(𝑎, 𝑏 − 1)																									3.7 
Where the gradient of the vertical direction of the pixel and the gradient of the horizontal 
direction of the pixel is represented by 𝑅6(𝑎, 𝑏) and 𝑅7(𝑎, 𝑏) respectively.  
2. The gradient amplitude and direction values are calculated as follows: 
𝐴(𝑎, 𝑏) = U𝑅6(𝑎, 𝑏)8 + 𝑅7(𝑎, 𝑏)8 
 
47 
 
University of Ghana http://ugspace.ug.edu.gh
																																																																				𝜃(𝑎, 𝑏) = 𝑎𝑟𝑐𝑡𝑎𝑛	𝜃 Y4#(6,7)Z   3.8 
46(6,7)
    
 
Figure 3.9 displays a HOG feature extraction process (Farajzadeh & Hashemzadeh, 2018). 
3.1.4 Feature selection 
Due to the high dimensionality of extracted features, feature selection is performed to discard 
the irrelevant features retaining the important feature vectors for accurate and acceptable 
classification. Feature selection is useful in reducing the computational cost and memory usage, 
improve data quality hence the predictive accuracy and increase the speed of the algorithm 
(Khalid et al., 2014; Ladha & Deepa, 2011).  
Notable techniques for feature selection are Adaboost, Linear Discriminant Analysis, 
Independent Component Analysis, Whitened Principle Component Analysis Laplacian 
Eigenmaps, Local Linear Embedding and Principal Component Analysis (PCA). Nonetheless, 
in comparison to the other methods, PCA seems to have received dominant research attention 
over the years.  
PCA is a multivariate statistical technique with applicability to the fields of image compression 
and FER. It selects a small set of important inter-correlated features based on patterns from the 
high dimension data after analysis based on the similarities and differences of these feature 
vectors (Abdi & Williams, 2010; Turk & Pentland, 1991). So, in compressing the data as well 
48 
 
University of Ghana http://ugspace.ug.edu.gh
as providing a description for it, these selected features are represented as a set of new 
orthogonal variables called principal components, which are helpful in recognizing facial 
expressions effectively or excellently.  
PCA produces excellent recognition rates due to its reduction in the sensitivity to noise as the 
redundant features are discarded. Further, it is comparatively invariant to changes in facial 
expression, has a low demand for memory and storage as well as low computational cost due 
to the reduced complexity of the features (Calder, Burton, Miller, Young, & Akamatsu, 2001; 
Karamizadeh, Abdullah, Manaf, Zamani, & Hooman, 2013).  
In computing PCA, a series of steps are required to get the final set of dimensions. To start, a 
data set of 𝑞 observations of 𝑚 − 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛𝑎𝑙	𝑣𝑒𝑐𝑡𝑜𝑟, 𝑦 = [𝑦 =', 𝑦;, 𝑦<, … , 𝑦0] . The vector is 
supposed to have a mean value of zero and this is attained by summing the values obtained 
from the subtraction of the mean	𝜇" from each observation 𝑞 of dataset 𝑦. Afterwards, the 
covariance matrix 𝑠" is calculated to get the eigenvectors and the eigenvalues from the data. 
The dimensions containing the highest deviation within a dataset is the eigenvector. As such, 
some of these eigenvectors are selected as new features for our feature selection (Jan, 2017). 
The functions are defined as follows: 
																																																									𝑈 = ' ∑2" >/' 𝑦> 																																																						3.9 2
																																																						𝑠" = R
0 (𝑦> − 𝜇")(𝑦> − 𝜇")=>/' 																											3.10 
3.1.5 Classification models 
There are numerous machine learning and deep learning techniques employed for various 
applications. For our experiment, both machine learning and deep learning techniques were 
selected based on their performance and popularity. A brief description of the classification 
algorithms are as follows:  
49 
 
University of Ghana http://ugspace.ug.edu.gh
3.1.5.1 Support Vector Machine 
It is the most exploitable machine learning algorithm for facial expression recognition due to 
its good classification accuracy;  as it may even gain a better classification accuracy than the 
neural networks (Bhardwaj & Dixit, 2016). SVM has good generalisation ability especially 
when the labels are properly defined, efficiently process high-dimensional feature data and 
highly flexible to data size; making it a dynamic and interactive algorithm for facial expression 
recognition (Ekundayo & Viriri, 2019; Jakkula, 2011; Michel & El Kaliouby, 2015). 
SVM belongs to a family of linear classifiers and used for regression and classification 
purposes. It forms a decision function or a hyperplane from given input vectors, maximizing 
the margin between the inputs. SVM views the classification problem as a quadratic 
optimization problem as it classifies the data with the set of support vectors; reducing the 
structural risk and the average error between the input and their target vectors (Vapnik, 
Golowich, & Smola, 1997). One against one and one against all are the two approaches 
available for SVM classification as well as kernel functions such as Radial Basis Function 
(RBF), linear, sigmoid, and polynomial.  
The optimisation function for SVM, given a training-set of instance-label pairs (𝑥> , 𝑦>), 𝑖 =
1,2, … , 𝑙 where 𝑥 ∈ 𝑅0>  and 𝑦 ∈ {1 − 1}? is defined as: 
																																																										𝑚𝑖𝑛 '𝜔=𝑤 + 𝑐∑?
+,7,@ 8 >/'
𝜀> 																																3.11 
𝑠𝑢𝑏𝑗𝑒𝑐𝑡	𝑡𝑜 	 	𝑦>(𝜔=𝜙(𝑥>) + 𝑤) ≥ 1 − 𝜀>	 , 𝜀> ≥ 0 
According to equation 3.11, there is a mapping of the input vectors 𝑥>  to a higher or infinite-
dimensional space defined by the function 𝜙  and the error term is defined by 𝐶 > 0. More,  
the dimensional space is formed within the linear separating hyperplane (Nugrahaeni & 
Mutijarsa, 2017). 
The functions of the kernel are defined as follows:  
50 
 
University of Ghana http://ugspace.ug.edu.gh
Linear kernel: 𝐾I𝑥> , 𝑥.J = 𝑥=> 𝑥..                                                                          3.12 
Sigmoid kernel: 𝐾I𝑥 => , 𝑥.J = 𝑡𝑎𝑛ℎI𝛾𝑥> 𝑥. + 𝑟J.                                                 3.13 
Polynomial kernel: 𝐾I𝑥> , 𝑥.J = I𝛾𝑥=> 𝑥. + 𝑟J
A , 𝛾 > 0.                                      3.14 
Radial basis function (RBF): 
			 8 																																																																3.15 	𝐾I𝑥> , 𝑥ḂJ = 𝑒𝑥𝑝 Y−𝛾m𝑥> − 𝑥.m Z , 𝛾 > 0.
3.1.5.2 Convolutional Neural Network (CNN) 
CNN originally proposed by Lecun, Bottou, Bengio, & Ha (1998), is an ‘end-to-end’ multi-
layered algorithm; an advancement of artificial neural network (ANN) (Yunxin Huang et al., 
2019). It is popularly employed for image recognition purposes as well as other computer 
vision tasks as it requires little or no data engineering.  
The CNN model is a three-dimensional space measured by height, width and depth, It consists 
of a feature detection layer, a feature pooling layer and a classification layer, although the 
detection and pooling layer may consist of one or more than one step. The convolutional or 
feature detection layer with the help of a learnable kernel is used to compute the convolution 
of a set of neurons.  Feature maps are produced from this layer and passed to the next layer for 
further computation. These feature maps are produced from the dot product of the kernel and 
input neurons. Furthermore, the convolutional layer is characterised by local connectivity 
which learns the interconnection among bordering pixels, shift-invariance to the location of the 
object and weight sharing in the same feature map (S. Li & Deng, 2018). The convolution 
operator is defined as follows: 
																																																																				𝑌 = ∑E5'0 𝑊0	 ∗ 𝑋0 + 𝑏																												3.16 
51 
 
University of Ghana http://ugspace.ug.edu.gh
Where 𝑋 ∈ 𝑅E"F"G is the input with N channels and grids width 𝑊 and height 𝐻 pixels. 𝑊 ∈
𝑅E"H"H is the convolution filter with a kernel size of 𝐾 ∗ 𝐾. 𝑊0		is the convolution kernel and 
the feature map is 𝑌. Therefore, a convolution over a two-dimensional image 𝑀 with kernel 𝐾 
is defined as (Breuer & Kimmel, 2017):  
𝑦(𝑗, 𝑘) = (𝑀 ∗ 𝐾)(𝑗, 𝑘) =L R 𝑀(0 𝑗 − 𝑚)(𝑘 − 𝑛)𝐾(𝑚, 𝑛)
																																	3.17 
2
 There are two types of the pooling layer: average and max pooling. The pooling layer is useful 
in making the network cost-effective as well as reducing the spatial size of the feature maps.  
Further, all the neurons are converted to one dimension spatial size feature maps from the 
previous two-dimensional activated activation maps from the preceding layer in the fully 
connected layer for further feature representation and classification (S. Li & Deng, 2018).  
																											𝑦> = 𝑓I∑2>/' 𝑧> ⋅ 𝜔> ,.+ 𝑏.J                                                    3.18 
The fully connected layer is represented in equation 3.18 (Jan, 2017). It computes the dot 
product between the kernel and the neuron from the input data, where 𝑦> is the output neuron 
by takng the function 𝑓(𝑧) of the given input 𝑧> 	from the previous layer. Then, the sum of all 
inputs 𝑧> with the dot product of each weight 𝑗 plus bias 𝑏. is calculated in the fully connected 
layer. CNN is a good option for classification since its inception due to its embedded feature 
extraction and selection and its computational efficiency. The neural network-based classifier 
CNN gives better accuracy than the other neural network-based classifiers (Revina & 
Emmanuel, 2018). Therefore, CNN was employed for our research. Our architecture is made 
up of three convolution layers and 2 dense layers. 
 
 
 
52 
 
University of Ghana http://ugspace.ug.edu.gh
 
Figure 3.10: Steps involved in CNN FER (Yunxin Huang et al., 2019) 
3.1.5.3 Ensemble method 
In predicting an outcome, the ensemble method builds numerous models by employing either 
multiple different algorithms or training datasets (Kotu & Deshpande, 2015). Then the 
independent base models are combined using a technique such as averaging to produce a result. 
Ensemble methods are normally employed in supervised machine learning tasks. Ensemble 
methods usually produce optimal models from the combination of multiple base models since 
the generalisation error is minimised, as there is the likelihood that the error of the single model 
will be balanced by the other base models. Also, the averaging of the multiple different models 
with minimum bias leads to a higher prediction performance of the model as compared to a 
single predicted model (Valero, 2016). Ensemble methods in machine learning are good 
techniques as they solve issues of overfitting and have proven to be a computational cost-
effective method (Dev & Eden, 2019; Sagi & Rokach, 2018).  
The general framework for building an ensemble model is as follows.  Given datasets 𝑖 and 𝑘 
features 𝐾 = {(𝑗> , 𝑚>)}, (|𝐾| = 𝑖, 𝑗> ∈ 𝑅) , 𝑚> ∈ 𝑅), the ensemble learning model 𝜑  uses an 
aggregation function 𝐹,		combines  𝐵 base models, {𝑔', 𝑔8, … , 𝑔7} towards attaining an output 
defined as: 𝑚z > = 𝜑(𝑗>) = (𝑔', 𝑔8, … , 𝑔7), having 𝑚z > ∈ 𝑍 for classification problems and 𝑚z > ∈
𝑅 for regression problems (Dev & Eden, 2019; Sagi & Rokach, 2018). 
53 
 
University of Ghana http://ugspace.ug.edu.gh
For our work, the following base models: SVM, KNN, Naïve Bayes, Logistic Regression and 
Random Forest will be used. The obtained individual results will be combined and averaged to 
produce a better model for the recognition of anger. These base models were selected because 
of the following reasons. SVM performs excellently on binary classification (J. & Watkins, 
1999). Naïve Bayes converges better with small datasets as compared to other models 
especially when its conditional independence assumption holds (Rish, Hellerstein, & Jayram, 
2001). Random forest is non-parametric, efficient, has high prediction accuracy for many types 
of data, works perfectly with small sample size, has high dimensional space and complex data 
structures. Additionally, the random forest is less prone to overfitting as compared to many 
classifiers and works on many datasets (Agarwal, Baechle, Behara, & Rao, 2016; Breiman, 
2001). KNN is easy to implement and similar to SVM as it finds the relationship between data 
points (Winters-Miner et al., 2015) and LG has the advantages of memory saving and better 
performance than other 2D approaches in two-class problem (Yun, Kim, Chi, & Yoon, 2007). 
3.1.6 Elements involved 
The elements utilised in implementing the three models: the SVM model, the CNN model and 
ensemble model are described below: 
3.1.6.1 Programming language 
Choosing a machine learning programming language for this research work was a difficult task 
as there exists a number of these languages. Therefore, we evaluated the pros and cons of each 
of the most widely used machine learning programming languages and selected one of them. 
These popular machine learning programming languages include Python, C/C++, Go, R and 
Matrix Laboratory (MATLAB) (Gao et al., 2020).  
MATLAB is an interactive, easy-to-use, fast programming language that is used for scientific 
computing. It can be employed for tasks such as data analysis, development of algorithms, 
54 
 
University of Ghana http://ugspace.ug.edu.gh
matrix manipulations, problem-solving and so on. MATLAB has good performance, provides 
an easy-to-use graphics, concise syntax and allows for easy language extension, however, a 
licence is required to use the product and some of its libraries. Go is an open-source 
programming language developed by Google with syntax similarity to C used for building 
simple, efficient, and reliable software. Its syntax is concise, expressive and enables the flexible 
segmented construction of programs. When implementing machine learning algorithms, 
libraries written in Go are used other than using other libraries in different languages. However, 
these machine learning libraries in Go are not numerous. R is an open-source programming 
language for statistical computing. R is highly graphical, producing high-quality images. Yet 
it is characterised by a steep learning curve and limited when analysing big data as it stores its 
data in the system memory (RAM). C/C++ is a powerful and efficient general-purpose 
programming languages used across multiple platforms. However, it is difficult developing and 
implementing machine learning algorithms using C/C++ as it is challenging learning C/C++ 
(Gao et al., 2020; Valero, 2016). 
After careful analysis and evaluation, python was selected as the programming language for 
this research as it is easy to use, requires no licence, has easy portability, well-defined error 
model based on exception,  good performance, simple and easy to learn, efficient, availability 
of a documentation and community support to resolve problems within the shortest possible 
time (Gao et al., 2020; Oliphant, 2007; Pérez, Granger, & Hunter, 2011). 
3.1.6.2 Packages and development environments 
The following packages and development environment were employed for this research, 
following the selection of python as the programming language. Familiarity, flexibility, 
simplicity, the availability of detailed documentation for easy implementation and community 
support were the reasons for the selection of these packages and development. As the success 
55 
 
University of Ghana http://ugspace.ug.edu.gh
of machine learning or deep learning project is dependent on the frameworks and libraries 
available to developers (Valero, 2016). A brief description of them is as follows. 
3.1.6.2.1 Anaconda 
Anaconda is a complete, open-source package manager, environment manager, Python and R 
programming languages distribution for scientific computing and data science. It is easy to 
download and install and functions on cross-platforms. It simplifies package management as 
well as deployment. Anaconda provides a graphical user interface (GUI) which includes a link 
to all the applications which can be installed with just a mouse click. The applications included 
in the Anaconda package include JupyterLab, JupyterNotebook, Spyder, Orange, Glue, Visual 
Studio Code and RStudio. It simplifies installing of libraries and dependencies as it comes with 
over 250 automatically installed packages and over 7500 open-source libraries which can be 
installed using either pip or conda. In addition, multiple virtual environments can be created 
using Anaconda. For example, a Python 2.7 can be installed instead of the default python. Also, 
Anaconda provides detailed documentation as well as community support for additional help 
(Watkins, 2018).  
3.1.6.2.2 Open Source Computer Vision Library (OpenCV) 
OpenCV is an open-source python library built for image and video analysis such as face 
detection and recognition, identifying objects, classification of objects in video etc with more 
than 2500 optimised computer vision and machine learning algorithms. It has the interfaces: 
Python, C++, MATLAB and Java interfaces and functions on a cross-platform. We utilised 
OpenCV mainly for our pre-processing stage as it contains functions such as the Viola Jones 
algorithm for face detection and image smoothening functions like median blur () and clahe () 
for histogram equalisation. This greatly reduces the efforts required for the pre-processing stage 
(Culjak, Abram, Pribanic, Dzapo, & Cifrek, 2012).  
 
56 
 
University of Ghana http://ugspace.ug.edu.gh
3.1.6.2.3 Tensorflow 
Tensorflow is a free, open-source library used for numeric computation. Tensorflow operates 
using dataflow graphs and provides an end-to-end implementation and training of machine 
learning models particularly neural networks. Tensorflow allows for its deployment across 
diverse platforms such as GPU, CPU and TPU. The higher layers provide an application 
programming interface (API), commonly used in deep learning models (Culjak et al., 2012).  
3.1.6.2.4 Keras 
Keras is a deep learning API written in Python for developing and training of deep learning 
models. It is integrated into Tensorflow and was developed for faster experimentation of deep 
learning models. Keras has user-friendly, highly productive interface and modulable and 
composable models. For this research, Keras was utilised for our data augmentation as well as 
the training of our deep learning models.  
3.1.6.2.5 Scikit-learn 
It is an open-source python library developed for the training of machine learning models, 
dimensionality reduction, model selection and feature extraction and normalisation. It is useful 
for the implementation of our SVM and ensemble model and evaluating our models using 
performance measures such as confusion matrix.  
Table 3.1: summary of the package development and environments. 
Operating system Windows  
Language  Python 3.7.4 
Editor Spyder (Anaconda) 
Environments  OpenCV, Keras, Tensorflow, Scikitlearn 
 
57 
 
University of Ghana http://ugspace.ug.edu.gh
3.1.7 Development 
For the development of our research work, a modular design method was used. The modular 
design allows the development of our anger recognition to be performed in modules or 
components, having each module performing a specific function. This allows changes to be 
made easily without or minimally affecting the other components (Valero, 2016). Thus, our 
work was divided into the pre-processing, feature extraction and classification stages. The pre-
processing stage involves the grayscaling and resizing the image, face detection, image 
smoothening and normalisation. The feature extraction module involves the extracting of the 
features and labels. Then the classification stage receives the features and labels and creates 
the ensemble, CNN and SVM models, the classification of the facial expressions and the 
likelihood of each expression.  
58 
 
University of Ghana http://ugspace.ug.edu.gh
 
Figure 3.11: the workflow of our research work (Nassih et al., 2019). 
 
 
 
59 
 
University of Ghana http://ugspace.ug.edu.gh
3.1.8 Chapter summary 
This chapter detailed how our research work is going to be undertaken particularly giving 
background and justification for the chosen methods. Moving on, we will discuss the 
implementation of these methods in the next chapter.  
  
  
60 
 
University of Ghana http://ugspace.ug.edu.gh
Chapter 4 
Experimental setup 
 
4.1 Introduction 
This chapter details the implementation of our research work. We describe how the features 
are extracted, fused, and reduced using feature dimensionality reduction methods, and 
classified using machine learning and deep learning algorithms.  
Before proceeding to the details of the proposed experiment, a summary of the experiment to 
be conducted is described as follows. The datasets JAFFE, CK+ and KDEF will be utilised for 
the experiment. Two experiments will be conducted, outputting 3 models. Experiment I involve 
using the original datasets whilst experiment II involve the application of data augmentation 
technique to balance our dataset.  
4.2 Hardware specification 
In performing a machine learning or deep learning project, selecting the hardware components 
is a key factor as the project is highly dependent on this component. The CPU hardware 
specification employed in experimenting is summarized in table 4.1. More for the TPU 
hardware specification, we employed Google Colaboratory (Google Colab). Google Colab is a 
free cloud service with colab notebooks which has been built on top of Jupyter notebooks and 
Ubuntu 18.04. Colab notebooks leverage on the power of Google’s hardware, executing codes 
in the cloud. It allows for ‘end-to-end ‘processes involved in facial expression recognition, 
from pre-processing to evaluating of models. It is particularly helpful in training deep neural 
networks as it provides Tensor Processing Unit (TPU), an integrated circuit particularly for 
neural network learning developed by Google (Feldman, 2018); since training neural networks 
61 
 
University of Ghana http://ugspace.ug.edu.gh
can be a protracted process depending on the model complexity and the resources available. 
TPU provides access to Ram of 12.72GB and hard disk of 107.77GB (Valero, 2016).  
Table 4.1: a summary of the hardware specification. 
System model HP Pavilion x360 m3 Convertible 
Processor  Intel(R) Core (TM) i3-7100U CPU @ 
2.40GHz, 2400 MHz, 2 Core(s), 4 Logical 
Processor(s) 
Memory (RAM)  6.00 GB 
System type x64-based PC 
 
4.3 Pre-processing stage 
It outlines the steps taken in transforming our datasets due to the different features such as 
colour, size, number of emotions and resolution to get a unified input for the next stages.  
4.3.1 Database pre-processing 
Three widely utilised databases: JAFFE, CK+ and KDEF were utilised to test the performance 
of our proposed models. To have a uniform standard input image, the databases were modified 
a bit to select only frontal, posed images for our experiment. The CK+ database has its images 
captured in the form of image sequences, from the neutral emotion transitioned to the peak 
emotion (the desired expression at the end of the image sequence). Also, the KDEF database 
has its images photographed from five angles: half left, half right, frontal, full left and full right 
(Goeleven et al., 2008). That being so, we selected only peak images and frontal images from 
the CK+ and the KDEF database respectively for this experiment. Subsequently, both the 
KDEF and JAFFE databases containing images expressing 7 different emotions whereas CK+ 
has 8 emotions. Therefore, to have a unified emotion type for our experiment, the contempt 
62 
 
University of Ghana http://ugspace.ug.edu.gh
emotion in the CK+ database was excluded. So, overall, 213, 329 and 980 images for JAFFE, 
CK+ and KDEF datasets respectively were selected and utilised for this experiment.  
The datasets were manually prepared after the modification. They were categorised into 
emotions: “angry” and “not-angry” with emotion labels: 1 and 2 accordingly. The “not-angry” 
images consist of the other emotions apart from angry that is the combination of happy, 
sadness, fear, neutral, disgust and surprise. A summary of the databases is listed in table 4.2. 
Table 4.2: summary of the datasets. 
 
 
4.3.2 Image pre-processing 
The following procedures were implemented: grayscaling and resizing, face detection and 
cropping and image enhancement (Dagher, Dahdah, & Al Shakik, 2019).  
 
63 
 
University of Ghana http://ugspace.ug.edu.gh
4.3.2.1 Grayscaling and resizing of images 
The database comes in different colour format. The JAFFE images are originally grayscaled. 
The CK+ images are either coloured or grayscaled. Likewise, the KDEF images are coloured. 
Therefore, the images were all first tested to know their colour format whether grayscaled or 
coloured and then, the images were all grayscaled to obtain a unified input of images (Dagher 
et al., 2019).  
Follow-up was to ensure the images were all the same sizes. Hence to provide uniform image 
sizes, all the images were resized to from their original size to a size of 128*128. The reduction 
of the sizes of images helps boost training time (Goeleven et al., 2008). It was observed there 
were no changes in the resolution of the images after the grayscaling and resizing (Dagher et 
al., 2019).  
 
                       
 
Figure 4.1: Example of a gray scaled and resized CK+ angry image (from a size of 640*480 to 
128*128).  
4.3.2.2 Face detection and cropping  
After the grayscaling and resizing of the images, the detection of faces from the images is 
performed. The viola jones algorithm implemented in the OpenCV library, which makes the 
face detection easier is used for this task. The viola jones algorithm is employed because it 
detects faces smoothly and at a faster rate. The face detection algorithm detects the face in the 
64 
 
University of Ghana http://ugspace.ug.edu.gh
images. It operates by looping through the images one after the other, finding shapes that 
resemble a face and scans a sub-window around it (Viola & Jones, 2004). The viola jones 
algorithm aligned and created a bounding box around the detected face in the image. Then, the 
rectangular area of box with the detected faces is cropped and saved for further pre-processing 
steps (Rani & Garg, 2014).  
            
Figure 4.2: viola jones face detection on the left and a detected and cropped face on the right.   
4.3.2.3 Image enhancement 
In this phase, the median filter was employed to denoise the images. The median filter is a non-
linear filter which removes noise from images such as salt and pepper, impulse noise and so 
on. However, it preserves the edges of the images after denoising. It works as a sliding window 
and changes the gray level of each pixel with the median gray level in a locality of pixels (Rani 
& Garg, 2014; Tan & Jiang, 2019). Figure 5.3 displays a denoised JAFFE image.  
Next, Contrast Limited Adaptive Histogram Equalisation (CLAHE) was utilised in improving 
the pixel intensities in an image, enhancing the facial features for easy extracting. CLAHE is 
performed after denoising in order not to intensify the noise in the images. It operates by 
dividing an image into non-overlapping contextual tiles and performs a histogram equalisation 
on each tile. Then, the adjoining tiles are combined using bilinear interpolation (Zhao, 
Georganas, & Petriu, 2010). In implementing the median filter and CLAHE for our experiment, 
the functions median Blur () and cv. createCLAHE () in OpenCV were utilised.  
65 
 
University of Ghana http://ugspace.ug.edu.gh
 
Figure 4.3: a denoised JAFFE image using median blur.  
 
Figure 4.4 displays a CLAHE enhanced JAFFE image. 
 4.4 Feature extraction 
4.4.1 LBP feature extraction 
The LBP method is utilised for our feature extraction. As LBP makes use of neighbourhood 
radius and points, a series of experiments were performed to determine the optimal values for 
radius and points, respectively. Therefore, for our experiment, 24 and 8 were selected as the 
number of neighbourhood points and radius respectively which forms a circularly symmetric 
neighbour set. Then, the LBP operator is performed on all the images. Its computed histogram 
is texture-based image descriptor with a total feature vector of dimension 26 (Hossain, 2018).  
4.4.2 HOG feature extraction 
The HOG method was applied to all the images. The optimal parameters found after a series 
of test were a cell size of 16 and 8 number of bins orientations. Thus, in calculating the feature 
size given an image size of 128*128 pixels, the total feature size obtained was:  
128 × 128
16 × 16 × 8 = 512	 
66 
 
University of Ghana http://ugspace.ug.edu.gh
4.4.3 Hybrid feature extraction 
We utilised the hybrid feature extraction method specifically feature-level fusion method for 
the ensemble learning method feature extraction. Feature-level fusion locally concatenated the 
feature vectors of HOG and LBP. 
For our CNN algorithm, feature extraction was performed in the convolution and the pooling 
layer. 
4.5 Feature selection 
PCA was used for dimensionality reduction both for the LBP feature vector and the fused 
feature vector. This is important because the face contains some similarities in each facial 
expression and would create features for all. As such, these correlated irrelevant features need 
to be discarded retaining only the relevant ones. Therefore, PCA was set to maintain 98% of 
the bias between features, discarding noisy and irrelevant features (Jan, 2017).  
4.6 Classification  
This stage centres on training the various models by varying and setting hyperparameters, 
applying data augmentation techniques to improve the models, using different datasets, and 
studying how the models perform by experimenting with it.  
4.6.1 Model selection 
It is a difficult task as it involves selecting the architecture of the algorithms and its 
hyperparameters. Model selection involves selecting the architectures of the various algorithms 
and its hyperparameters. It is a difficult task as it involves tuning the hyperparameters and 
utilising the different datasets to get the best performing models.  
The following subsections describe the elements that were considered during the training and 
testing stage as learning time and classification accuracy is dependent on the hyperparameters 
and the model architecture (Valero, 2016). 
67 
 
University of Ghana http://ugspace.ug.edu.gh
4.6.2 Hyperparameters tuning 
In training and getting an accurate and precise model, hyperparameter tuning needs to be 
performed. It helps in determining the best model and architecture for an algorithm as well as 
the right balance between bias and variance. Hyperparameters are the parameters which 
define a model architecture. Hyperparameter tuning is the process of selecting the optimal 
parameters for a model. Hyperparameter tuning can be defined as the process of adjusting the 
parameters or set of an algorithm to improve performance. A description of the selected 
hyperparameters which were hyper tuned are as follows:  
1. The number of convolutional layers: the selection of convolution layers in building a 
CNN model is a key factor to prevent problems such as overfitting and vanishing and 
exploding gradient.  
2. The number of hidden layers: the number of hidden layers chosen depends on the 
amount of data size used for training. This should be chosen carefully to precisely find 
the difference between bias and variance.  
3. Activations functions: there are 4 activation functions used for CNN. This includes 
ReLu, sigmoid, Tanh and LeakyReLu, Sigmoid and Tahn are used for shallow 
networks. It is an important hyperparameter as it controls the firing of neurons.  
4. Learning rate: it is a key factor in determining convergence to of an algorithm to a 
satisfactory solution as it determines the number of iterations. It should be tried in 
powers of 10 to determine the optimal one (Ramesh, 2018). 
5. Dropout: it is a regularising technique to overfitting by finding an optimum bias-
variance spot.   
6. Number of Epoch: it determines the number of iteration and the time a training process 
will last and how well a model will fit on the train data as well as improve the 
generalisation error.  
68 
 
University of Ghana http://ugspace.ug.edu.gh
7. Optimiser: optimiser minimises the error during training. The optimiser speeds the 
convergence rate as well as optimise the internal parameters. The commonly used 
optimises include Adaptive Momentum (Adam), Root Mean Square 
Propagation(RMSprop), Adaptive Gradient (Adagrad), Adaptive Delta (Adadelta) and 
Nestrov Accelerated Gradient (Nadam) (Prilianti, Brotosudarmo, Anam, & Suryanto, 
2019).  
8. Number of fully connected layer controls the quality and activation maps 
9. Kernel: the kernels for SVM are sigmoid, polynomial, sigmoid, radial basis function 
(RBF) and linear.  However, the most used kernel is RBF. The preference kernel 
determines the performance of an algorithm as well as the distinguishing of classes for 
classification purposes.  
10. The number of trees (n_estimators): random forest can be defined as the grouping of 
trees. Therefore, we need to decide the number of trees to use for our computation as 
the computation efficiency is dependent on the number of trees utilised.  
11. The number of features considered for splitting a node (max_features): it is the 
maximum number of features available to each tree in a random forest. 
12. Distance metric: It helps to find the closest or similar learning points.  
13. Number of neighbours (𝑘): is an important factor in determining the prediction model 
To simplify the process of hyperparameter tuning and obtaining the optimal models at the same 
time, we utilised the hyperas module in Keras for the CNN models and grid search using cross-
validation (GridSearchCv) module in scikit-learn library for both the ensemble and SVM 
models.  
69 
 
University of Ghana http://ugspace.ug.edu.gh
4.6.3 Settings and protocols 
The splitting of the dataset was done using scikit-learn train_test_split. The datasets were split 
into 80-20 ratio, 80% for the training of the model and 20% for testing. the images were 
randomly chosen, thus the procedure was repeated 20 times and the average accuracies were 
calculated (V. Jain, Lamba, Singh, Namboothiri, & Dhall, 2019). The models were trained one 
after the other on each dataset, using the same architectural model and hyperparameters  
(Minaee & Abdolrashidi, 2019).  
 For the SVM, the One Vs One approach was used where the classification would be done 
between the two labels. More, the decision-level fusion was utilised for combining the 
accuracies of our ensemble base models. 
4.6.4 Data augmentation  
Data augmentation is a useful technique used for increasing the number of images. It helps 
resolves issues of overfitting (B. Yang, Cao, Ni, & Zhang, 2017). 
This method was applied to our original datasets with the possibility of increasing the dataset 
(see table 4.3). We used the method: rotation, flipping and zooming. Rotation turned the image 
uniformly from the central point to 20 degrees clockwise; the image was flipped horizontally 
and vertically to create other samples; zooming was the result of randomly extracting sections 
of images and increasing their size (see figure 5.5). 
4.7 Chapter summary 
We provided details on how our work was implemented. This is important for reproduction or 
future advancement of our work. Next chapter will discuss the outcome of our experiment.  
70 
 
University of Ghana http://ugspace.ug.edu.gh
 
Figure 4.5:transformations applied to a JAFFE image (Valero, 2016). 
Table 4.3: summary of the datasets after data augmentation. 
 
 
  
71 
 
University of Ghana http://ugspace.ug.edu.gh
Chapter 5 
Experimental results and discussion 
5.1 Introduction 
In this section, the performance of the proposed methods and the obtained results are discussed 
below. Here, labels are represented as 1 for Angry and 2 for Not Angry. 
5.2 Evaluation metrics 
The datasets: CK+, JAFFE and KDEF were used to evaluate the proposed methods. Also, the 
evaluation metrics confusion matrix, precision, recall, f1-score, and ROC curve were utilised 
for our experiment (Nisbet, Miner, & Yale, 2018).  
A confusion matrix is a table that shows the association between the predicted label and the 
true label in a classification problem, thus describing the performance of a classifier. 
The following error metrics are used in calculating other metrics are:  
True positive (TP): number of correctly predicted labels. That is a positive label that is correctly 
predicted as positive.  
True negative (TN): number of correctly predicted labels. That is a negative label that is 
correctly predicted as negative.  
False positives (FP): number of labels falsely predicted as positive. That is when a label is 
predicted as positive when it is negative.  
False negatives (FN): number of labels falsely predicted as negative. That is when a label is 
predicted as negative when it is positive.  
Metrics such as accuracy, recall, precision and f1-score are computed from rates of a binary 
confusion matrix (Nisbet et al., 2018). Thus, the metrics are defined as: 
Accuracy = =3*=E                                                                                              5.1 
E
72 
 
University of Ghana http://ugspace.ug.edu.gh
Precision(P) = =3                                                                                           5.2                                                                                   
=3*I3
Recall(R) = =3                                                                                                 5.3 
=3*IE
F1-score = 2 × [(𝑃 × 𝑅) ∕ (𝑃 + 𝑅)]                                                                  5.4  
ROC curve (Shu et al., 2018): 
False-positive-rate (FPR) = I3                                                                      5.5  
=E*I3
True-positive-rate (TPR) = =3                                                                      5.6            
=3*IE
Due to the severe imbalance nature of the datasets (see table 4.2), the performance metrics: 
accuracy, precision, recall, F1-Score and confusion matrix were used in evaluating the model, 
as using only accuracy would give a false representation of the model (Koehrsen Will, 2018). 
However, for the second experiment, accuracy, confusion matrix and ROC curves were 
employed as the datasets were balanced using the data augmentation technique. The macro 
average value is used as the precision value. Macro average is the sum of the two precisions of 
both classes divided by 2. 
5.3 Experiment I (training without data augmentation) 
The experiment results of the various classifiers are summarised in tables 5.1 to 5.3. It is 
interesting to note that generally the recognition accuracies and the precision and recall values 
varies based on the combination of descriptor and classifier, although some patterns are 
observed. The precision and recall values obtained were generally low and can be attributed to 
the imbalance nature of the datasets, as these two measures only give output on the relevant 
cases in a dataset.  
In Table 5.1 below, we see the results of SVM, CNN and Ensemble learning models on the 
JAFFE dataset. In terms of accuracy, our ensemble learning model achieved 98% whilst SVM 
73 
 
University of Ghana http://ugspace.ug.edu.gh
and CNN attained 90% and 88% respectively. It also performed better in terms of F1-score, 
precision, recall and accuracy. The performance of the models can be attributed to the fusion 
of the descriptors and classifiers for the ensemble model, SVM’s ability to train on modest 
training data and CNN’s low performance justified by its requirement for huge training data. 
Further, the ensemble learning predictor ( a combination of KNN, SVM, Naïve Bayes (NB), 
Random forest (RF) and Logistic Regression (LB)) with HOG and LBP descriptors), achieved 
the greater accuracy of 98%; thus 8% and 10% greater than SVM and CNN accordingly.  
Table 5.1 shows the performance (accuracy, recall, precision, and F1-score) of the JAFFE 
dataset on the SVM, CNN and ensemble learning models.  
 
The overall value of the various performance metrics (accuracy, precision, recall and F1-score) 
on the CK+ dataset (Table 5.2) is significantly higher than the other datasets. SVM attained a 
higher accuracy of 97%, having CNN and Ensemble learning models with 94% and 93% 
respectively. SVM is thriving to achieve a good accuracy over the two tables probably due to 
its flexibility to perform on any amount of training data.  
 
 
 
 
 
74 
 
University of Ghana http://ugspace.ug.edu.gh
Table 5.2 displays the performance (accuracy, recall, precision, and F1-score) of the CK+ 
dataset for the models: SVM, CNN and ensemble learning. 
 
According to Table 5.3, CNN achieves an accuracy of 93%, Ensemble learning had 92% and 
SVM with an accuracy of 89%. CNN model attained the leading accuracy of 93%. It is not 
surprising CNN attains the highest accuracy on the KDEF dataset as among the three utilised 
datasets for our experiment, KDEF contains the highest number of images (980 images). And 
as CNN requires a large amount of training data; thus, explains the accuracy achieved on the 
CNN model. Generally, the precision and recall values were low as they give the correct value 
of the relevant data points that are angry.  
Table 5.3: KDEF dataset performance (accuracy, precision, recall and f1-score) for CNN, 
SVM and ensemble learning models.  
 
75 
 
University of Ghana http://ugspace.ug.edu.gh
The confusion matrices were utilised in further analysing the performance of our models (see 
figures 5.1 to 5.3). The confusion matrix is visualised by having the true label on the vertical 
axis and the expression recognised by the classifiers on the horizontal axis. The intensity of 
confusion of each expression with its counterparts is indicated in each row of the matrices (Abd 
El Meguid & Levine, 2014). Also, the grayscale levels across the figure present the inter-
expression similarity across the two expressions (Ali, Iqbal, & Choi, 2016). 
 
Figure 5.1: Confusion matrices for the models: SVM, CNN and Ensemble learning for the 
JAFFE dataset – Experiment I. 
76 
 
University of Ghana http://ugspace.ug.edu.gh
 
Figure 5.2: Confusion matrices for the models: SVM, CNN and Ensemble learning for the 
CK+ dataset – Experiment I. 
77 
 
University of Ghana http://ugspace.ug.edu.gh
 
Figure 5.3: Confusion matrices for the models: SVM, CNN and Ensemble learning for the 
KDEF dataset – Experiment I. 
5.4 Experiment II  
There was an improvement in the results of the evaluation metrics for the various models, after 
the application of the data augmentation technique as illustrated in tables 5.4 since the 
balancing of the two classes of the datasets.   
Table 5.4 contains the results of the second experiment on the JAFFE, CK+ and KDEF datasets. 
Generally, there has been an improvement in the performance measures used in evaluating our 
models. The ensemble learning model is the best performing algorithm on the JAFFE dataset. 
78 
 
University of Ghana http://ugspace.ug.edu.gh
The best performing model increased by 2%, having the best performing SVM and CNN 
models with accuracies 97% and 97% on JAFFE and CK+ datasets, respectively. More, the 
SVM and CNN models attained the same accuracy on the KDEF dataset.  
Table 5.4: JAFFE dataset performance (accuracy) on the CNN, SVM and ensemble learning 
models. 
 
Overall, the best performing model was the ensemble learning model on the JAFFE dataset 
(see figure 5.4). All the models generally had some confusions except for the ensemble model 
in figure 5.4. The highest confusion occurred on (figure 5.3). From experiment I, the results 
indicate our true positives (angry) were just a considerable small amount whilst the true 
negatives (not angry) were a considerably large amount. This can be attributable to the 
imbalance nature of the dataset as with the JAFFE dataset, the angry images are 45 whilst the 
not angry images are 168 images.  
It was noted that the best performing model on JAFFE dataset was the ensemble learning model 
as it had no misclassifications (figure 5.4).  
 
79 
 
University of Ghana http://ugspace.ug.edu.gh
 
Figure 5.4: Confusion matrices for the models: SVM, CNN and Ensemble learning for the 
JAFFE dataset – Experiment II. 
 
80 
 
University of Ghana http://ugspace.ug.edu.gh
 
Figure 5.5: Confusion matrices for the models: SVM, CNN and Ensemble learning for the 
CK+ dataset – Experiment II. 
81 
 
University of Ghana http://ugspace.ug.edu.gh
 
Figure 5.6: Confusion matrices for the models: SVM, CNN and Ensemble learning for the 
KDEF dataset – Experiment II. 
82 
 
University of Ghana http://ugspace.ug.edu.gh
 
Figure 5.7: ROC curves on the JAFFE dataset. 
83 
 
University of Ghana http://ugspace.ug.edu.gh
 
Figure 5.8: ROC curve on CK+ dataset. 
84 
 
University of Ghana http://ugspace.ug.edu.gh
 
Figure 5.9: ROC curve on the KDEF dataset. 
Figures 5.7, 5.8 and 5.9 display the Receiver Operating Characteristic (ROC) curves, 
comparing the outcome of the models. The displayed ROC curves were obtained after the 
application of the data augmentation technique.  
As shown in figures 5.7, 5.8 and 5.9, the overall performance of ensemble learning model is 
higher than of SVM and CNN whilst SVM attains the overall lowest-performing model.  
Notwithstanding, all the classifiers produce excellent models, as there are all situated in the 
upper left corner of the ROC, the values are all above 0.5 (Shu et al., 2018).  
85 
 
University of Ghana http://ugspace.ug.edu.gh
5.5 Discussion 
5.5.1 Introduction  
The findings from this study indicate that the recognition of the emotions anger and not-angry 
where not-angry is the combination of the remaining emotions using facial expression as well 
as facial expression algorithms, results in higher accuracy. 
This study revealed that, among the basic emotions introduced by Ekman, anger is the most 
frequently experienced yet poorly handled. And among the modes of expressing emotions, the 
facial expression is the most salient. However, the study conducted on the recognition of anger 
utilised speech data, physiological signals or general multiclassification of facial expressions 
that is all the facial expressions are been classified. However, we want to investigate how anger 
versus the other facial expressions will produce a higher accuracy. Therefore, facial 
expressions algorithms are evaluated, and the most utilised machine learning and deep learning 
algorithms are employed for our study. Also, we proposed an ensemble learning method for 
this study.  
5.5.2 Performance of the models 
Although different datasets were used, all the images were pre-processed to contain a unify 
input which strengthens the experimental results. The experimental results were further 
improved in experiment II with the application of the data augmentation technique to make our 
datasets balanced. Our experiments achieved remarkable results in comparison to the results of 
Tables 5.5-5.7. Generally, the best model for our experiment is the ensemble learning classifier. 
The high performance obtained can be attributed to the fusion of the descriptors HOG and LBP 
and the classifiers. The average accuracy obtained was 97% and the overall best classifier was 
the ensemble classifier with an accuracy of 100%. Further, it was observed the deep learning 
model performed well on the CK+ dataset and the machine learning algorithms on JAFFE 
dataset.  
86 
 
University of Ghana http://ugspace.ug.edu.gh
5.5.3 Comparison of the state-of-the-art 
In this section, in evaluating the effectiveness of our model, we perform a comparison of our 
results with state-of-art results (Lai & Ko, 2014). It is not possible to straightforward compare 
the state-of-the-art experiments to our experiment because of the different experimental setting 
and protocols. Also, as per literature survey, existing work has not performed or reported results 
on any binary classification of facial expressions ( that is categorising of the 7 emotional states 
into angry and not-angry) on the contrary, on several multiclass classifications. Therefore, we 
focus our attention on the performance of these state-of-art facial expression algorithms in 
comparison to ours. The state-of-the-art experiments are discussed in this section and are 
indicated in Tables 5.5 - 5.7 (Alphonse & Dharma, 2017). Although the feature extractors 
differ, the classifiers are the same. It can be observed that our proposed methods obtained 
noticeable results, particularly our ensemble learning algorithm. We believe our ensemble 
learning model made a difference due to the fusion of HOG and LBP as well as SVM, KNN, 
RF, NB, and LG.  
Our SVM classifier is outstandingly comparable to the expression classification accuracies in 
Table 5.5. The techniques used for building the SVM model is the same as the one in 
(Abdulrahman & Eleyan, 2015), however the results differ greatly. Also, diverse feature 
extraction techniques or descriptors were used such as LDA, LDN, HOG, DGLTP and Gabor, 
although the SVM classifier was utilised. Yet, the resulting expression recognition accuracies 
were not the best. 
Follow on, the recognition accuracy of our CNN model is compared to the state-of-the-arts 
experiment in Table 5.6. It can be observed our proposed CNN model obtained a promising 
result in comparison to the others, even with the ensemble of CNN in (W. Sun et al., 2019). 
87 
 
University of Ghana http://ugspace.ug.edu.gh
Further, in table 5.7, our proposed novel ensemble model outperformed the state-of-the-arts 
ensemble methods.  
Table 5.5: Comparison of approaches on the JAFFE dataset 
Author  Technique Classifier Dataset Accuracy Remarks 
(%) 
(Abdulrahman PCA/LBP SVM  JAFFE 87 obtained an accuracy of 
& Eleyan, 87% from an investigation 
2015) of the performance of 
PCA+LBP and SVM on 
JAFFE.  
(M. I. Revina & LDN/DGLTP SVM JAFFE 88.63 utilised SVM and LDN  
Emmanuel, and DGLTP as classifier 
2018) and feature extractors 
respectively on JAFFE 
dataset achieving an 
accuracy of 88.63%. 
(Shah et al., LDA SVM JAFFE 93.97 used threefold SVM and 
2017) LDA to recognise the 
basic emotions on the 
JAFFE dataset attaining 
accuracy of 93.97%. 
88 
 
University of Ghana http://ugspace.ug.edu.gh
(Bellamkonda Kirsh+LBP SVM JAFFE 86 combined an edge 
& Gopalan, detection algorithm called 
2019) kirsh with LBP and SVM 
as the classifier. Their 
method obtained an 
accuracy of 86%.  
 
Proposed LBP+PCA SVM JAFFE 97 Proposed LBP+PCA 
SVM model and SVM classifier on 
JAFFE, achieving an 
accuracy.  
 
 
Table 5.6: Comparison of the proposed method with state-of-the-art. 
Author Classifier Dataset Accuracy (%) 
(Z. Li, 2018) CNN CK+ 95.21 
(W. Sun et al., Ensemble of CK+ 96.2 
2019) CNN 
(Vo & Le, 2016) CNN CK+ 92 
Ours  CNN CK+ 97 
 
 Table 5.7: Comparison of different techniques on JAFFE+ dataset 
89 
 
University of Ghana http://ugspace.ug.edu.gh
Author Technique Classifier Dataset Accuracy Remarks 
(%) 
(T H H Gabor+LBP Ensemble JAFFE/CK+ 96.2 Used NSGA as a 
Zavaschi & of SVM selection ranking 
Koerich, method.  
2011) 
(Thiago H.H. LBP+Gabor Ensemble JAFFE/CK+ 96.2 Created a pool of 
Zavaschi, of classifiers and used 
Britto, classifiers MOGA to select the 
Oliveira, & (MOGA) best ensemble 
Koerich, model 
2013) 
Our LBP+HOG An JAFFE 100 Proposed a novel 
proposed PCA (for ensemble ensemble model for 
ensemble dimension of KNN, recognition of 
learning reduction) SVM, NB, anger.  
model LG, and 
RF 
 
5.6 Limitations  
There are some limitations in terms of the number of angry facial expression images in the 
existing facial expression databases. Therefore, it makes it difficult in training an accurate 
model to recognise anger as observed in experiment I, where the recognition accuracies 
obtained were generally high whilst the precision and recall which gives the actual values, on 
the contrary, were low. This untrue accuracy representation can be attributed to the imbalance 
90 
 
University of Ghana http://ugspace.ug.edu.gh
nature of the datasets. As such, we resolved this issue by balancing the datasets using data 
augmentation (Bargshady et al., 2020).  
Future work will look at creating a database of Africans, use these proposed algorithms to 
detect anger and compare their performance to the standard existing databases as well as 
employing these algorithms to detect anger in a persuasive space and persuade the individual 
from angry to another emotion for example happy. 
5.7 Chapter summary  
In this chapter, we looked at our experimental results. The experiment was conducted in two 
phases: with data augmentation and without data augmentation. It was observed that the 
accuracies were improved in the second experiment after the application of the data 
augmentation technique. Particularly, our proposed novel ensemble learning method 
outperformed the existing experiments in the literature. Thus, we conclude that the proposed 
methods are effective for the recognition of anger using facial expressions.  
  
91 
 
University of Ghana http://ugspace.ug.edu.gh
Chapter 6 
Conclusion 
Charles Darwin’s influential work served as the premise for research in emotions. These 
emotions are recognised using indicators such as speech data, physiological signals and so on. 
Among the indicators for recognition of emotions, facial expression is a significant and leading 
measure as 55% of what we communicate is expressed in our facial expressions.  
 The current systems can detect facial expressions in general or a subset of emotions. To the 
best of our knowledge, there have not been any studies on how to detect only anger using facial 
expressions. Further, the multiclass classification of emotions has drawbacks such as the 
overlapping among the facial expressions which gives an untrue representation of the emotion 
when classified. We argued that anger detection needs to be done in an accurate way, giving a 
true representation of the emotion.  
Therefore, in this research work, we propose a framework to perform binary classification of 
facial expressions for recognition of the emotions: angry and not angry and compare the 
outcome to the state-of-the-art experiments; having identified facial expression as the leading 
and significant measure for detecting emotions. We employed the most utilised algorithms 
from literature as well as propose a novel ensemble learning algorithm. The algorithms are 
SVM, CNN and a novel ensemble learning algorithm. The ensemble learning algorithm is a 
fusion of the feature sets HOG and LBP as well as a fusion of SVM, KNN, RF, NB, and LG.  
The experiment was conducted in two phases due to the imbalance nature of the dataset and 
the  proposed methods were evaluated on JAFFE, KDEF and CK+ datasets. The SVM, CNN 
and ensemble models achieved accuracies of 97% on JAFFE dataset, 97% on CK+ dataset and 
100% on JAFFE dataset, respectively. Our novel proposed an ensemble learning algorithm is 
the best performing model. Also, our models perform better than the state-of-art models.  
92 
 
University of Ghana http://ugspace.ug.edu.gh
Future work, we plan to create a database of Africans, use these proposed algorithms to detect 
anger and compare their performance to the standard existing databases as well as employing 
these algorithms to detect anger in a persuasive space and persuade the individual from angry 
to another emotion for example happy. 
 
93 
 
University of Ghana http://ugspace.ug.edu.gh
Bibliography 
Abd El Meguid, M. K., & Levine, M. D. (2014). Fully automated recognition of spontaneous 
facial expressions in videos using random forest classifiers. IEEE Transactions on 
Affective Computing, 5(2), 141–154. https://doi.org/10.1109/TAFFC.2014.2317711 
Abdi, H., & Williams, L. J. (2010). Principal component analysis. Wiley Interdisciplinary 
Reviews: Computational Statistics, 2(4), 433–459. https://doi.org/10.1002/wics.101 
Abdulrahman, M., & Eleyan, A. (2015). Facial expression recognition using Support Vector 
Machines. 2015 23rd Signal Processing and Communications Applications Conference, 
SIU 2015 - Proceedings, 276–279. https://doi.org/10.1109/SIU.2015.7129813 
Abhang, P. A., Gawali, B. W., & Mehrotra, S. C. (2016). Multimodal Emotion Recognition. 
Introduction to EEG- and Speech-Based Emotion Recognition, 113–125. 
https://doi.org/10.1016/b978-0-12-804490-2.00006-3 
Abouyahya, A., El Fkihi, S., Thami, R. O. H., & Aboutajdine, D. (2016). Features extraction 
for facial expressions recognition. International Conference on Multimedia Computing 
and Systems -Proceedings, 16, 46–49. https://doi.org/10.1109/ICMCS.2016.7905642 
Abouyahya, A., & Fkihi, S. El. (2018). An optimization of the k-nearest neighbor using 
dynamic time warping as a measurement similarity for facial expressions recognition. 
ACM International Conference Proceeding Series, 1–5. 
https://doi.org/10.1145/3230905.3230921 
Agarwal, A., Baechle, C., Behara, R. S., & Rao, V. (2016). Multi-method approach to wellness 
predictive modeling. Journal of Big Data, 3(1), 1–23. https://doi.org/10.1186/s40537-
016-0049-0 
Ahmed, F., Bari, A. S. M. H., & Gavrilova, M. L. (2020). Emotion Recognition from Body 
Movement. IEEE Access, 8, 11761–11781. 
94 
 
University of Ghana http://ugspace.ug.edu.gh
https://doi.org/10.1109/ACCESS.2019.2963113 
Ali, G., Iqbal, M. A., & Choi, T. S. (2016). Boosted NNE collections for multicultural facial 
expression recognition. Pattern Recognition, 55, 14–27. 
https://doi.org/10.1016/j.patcog.2016.01.032 
Alizadeh, S., & Fazel, A. (2017). Convolutional Neural Networks for Facial Expression 
Recognition. Retrieved from http://arxiv.org/abs/1704.06756 
Alphonse, A. S., & Dharma, D. (2017). Enhanced Gabor (E-Gabor), Hypersphere-based 
normalization and Pearson General Kernel-based discriminant analysis for dimension 
reduction and classification of facial emotions. Expert Systems with Applications, 90, 
127–145. https://doi.org/10.1016/j.eswa.2017.08.013 
An, S., Ji, L. J., Marks, M., & Zhang, Z. (2017). Two sides of emotion: Exploring positivity 
and negativity in six basic emotions across cultures. Frontiers in Psychology, 8(APR), 1–
14. https://doi.org/10.3389/fpsyg.2017.00610 
Apte, A., Basavaraj, A., & Nithin, R. K. (2016). Efficient Facial Expression Ecognition and 
classification system based on morphological processing of frontal face images. 2015 
IEEE 10th International Conference on Industrial and Information Systems, ICIIS 2015 - 
Conference Proceedings, 366–371. https://doi.org/10.1109/ICIINFS.2015.7399039 
Bargshady, G., Zhou, X., Deo, R. C., Soar, J., Whittaker, F., & Wang, H. (2020). Enhanced 
deep learning algorithm development to detect pain intensity from facial expression 
images. Expert Systems with Applications, 149, 113305. 
https://doi.org/10.1016/j.eswa.2020.113305 
Bellamkonda, S., & Gopalan, N. P. (2019). Facial Expression Recognition Using Kirsch Edge 
Detection, LBP and Gabor Wavelets. Proceedings of the 2nd International Conference on 
Intelligent Computing and Control Systems, ICICCS 2018, (Iciccs), 1457–1461. 
95 
 
University of Ghana http://ugspace.ug.edu.gh
https://doi.org/10.1109/ICCONS.2018.8662971 
Bhardwaj, N., & Dixit, M. (2016). A Review: Facial Expression Detection with its Techniques 
and Application. International Journal of Signal Processing, Image Processing and 
Pattern Recognition, 9(6), 149–158. https://doi.org/10.14257/ijsip.2016.9.6.13 
Borui, Z., Liu, G., & Xie, G. (2017). Facial expression recognition using LBP and LPQ based 
on Gabor wavelet transform. 2016 2nd IEEE International Conference on Computer and 
Communications, ICCC 2016 - Proceedings, 365–369. 
https://doi.org/10.1109/CompComm.2016.7924724 
Breiman, L. E. O. (2001). Random Forest（LeoBreiman）.pdf, 5–32. 
https://doi.org/10.1023/A:1010933404324 
Breuer, R., & Kimmel, R. (2017a). A Deep Learning Perspective on the Origin of Facial 
Expressions, 1–16. Retrieved from http://arxiv.org/abs/1705.01842 
Breuer, R., & Kimmel, R. (2017b). A Deep Learning Perspective on the Origin of Facial 
Expressions. Israel Institute of Technology. Retrieved from 
http://arxiv.org/abs/1705.01842 
Busso, C., Deng, Z., Yildirim, S., & Bulut, M. (2004). Analysis of Emotion Recognition using 
Facial Expressions, Speech and Multimodal Information. Icmi, 205–211. 
https://doi.org/10.1145/1027933.1027968 
Calder, A. J., Burton, A. M., Miller, P., Young, A. W., & Akamatsu, S. (2001). A principal 
component analysis of facial expressions. Vision Research, 41(9), 1179–1208. 
https://doi.org/10.1016/S0042-6989(01)00002-5 
Candra, H., Yuwono, M., Chai, R., Nguyen, H. T., & Su, S. (2016). Classification of facial-
emotion expression in the application of psychotherapy using Viola-Jones and Edge-
Histogram of Oriented Gradient. Proceedings of the Annual International Conference of 
96 
 
University of Ghana http://ugspace.ug.edu.gh
the IEEE Engineering in Medicine and Biology Society, EMBS, 2016-Octob(Di), 423–
426. https://doi.org/10.1109/EMBC.2016.7590730 
Chakladar, D. Das, & Chakraborty, S. (2018). EEG based emotion classification using 
“correlation Based Subset Selection.” Biologically Inspired Cognitive Architectures. 
https://doi.org/10.1016/j.bica.2018.04.012 
Chang, C. Y., Lin, Y. M., & Zheng, J. Y. (2012). Physiological angry emotion detection using 
support vector regression. Proceedings of the 2012 15th International Conference on 
Network-Based Information Systems, NBIS 2012, 592–596. 
https://doi.org/10.1109/NBiS.2012.78 
Chaparro, V., Gomez, A., Salgado, A., Quintero, O. L., Lopez, N., & Villa, L. F. (2018). 
Emotion Recognition from EEG and Facial Expressions: A Multimodal Approach. 
Proceedings of the Annual International Conference of the IEEE Engineering in Medicine 
and Biology Society, EMBS, 2018-July, 530–533. 
https://doi.org/10.1109/EMBC.2018.8512407 
Chhabra, P., Vyas, G., Chatterjee, J., & Vob, S. H. (2017). An automatic system for recognition 
and assessment of anger using adaptive boost. Proceedings - 2016 International 
Conference on Micro-Electronics and Telecommunication Engineering, ICMETE 2016, 
151–154. https://doi.org/10.1109/ICMETE.2016.89 
Culjak, I., Abram, D., Pribanic, T., Dzapo, H., & Cifrek, M. (2012). A brief introduction to 
OpenCV. MIPRO 2012 - 35th International Convention on Information and 
Communication Technology, Electronics and Microelectronics - Proceedings, 1725–
1730. 
Dagher, I., Dahdah, E., & Al Shakik, M. (2019). Facial expression recognition using three-
stage support vector machines. Visual Computing for Industry, Biomedicine, and Art, 2(1), 
97 
 
University of Ghana http://ugspace.ug.edu.gh
0–8. https://doi.org/10.1186/s42492-019-0034-5 
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. 
Proceedings - 2005 IEEE Computer Society Conference on Computer Vision and Pattern 
Recognition, CVPR 2005, I, 886–893. https://doi.org/10.1109/CVPR.2005.177 
Darwin, C. (1872). The expression of the emotions in man and animals. The Expression of the 
Emotions in Man and Animals. https://doi.org/10.1037/10001-000 
Deng, J., Eyben, F., Schuller, B., & Burkhardt, F. (2018). Deep neural networks for anger 
detection from real life speech data. 2017 7th International Conference on Affective 
Computing and Intelligent Interaction Workshops and Demos, ACIIW 2017, 2018-Janua, 
1–6. https://doi.org/10.1109/ACIIW.2017.8272614 
Dev, V. A., & Eden, M. R. (2019). Gradient Boosted Decision Trees for Lithology 
Classification. Computer Aided Chemical Engineering (Vol. 47). Elsevier Masson SAS. 
https://doi.org/10.1016/B978-0-12-818597-1.50019-9 
Dhall, S., & Sethi, P. (2014). Geometric and Appearance Feature Analysis for Facial 
Expression Recognition. International Journal of Advanced Engineering Technology, 
5(3), 1–11. 
Dino, H. I., & Abdulrazzaq, M. B. (2019). Facial Expression Classification Based on SVM, 
KNN and MLP Classifiers. 2019 International Conference on Advanced Science and 
Engineering, ICOASE 2019, 70–75. https://doi.org/10.1109/ICOASE.2019.8723728 
Domínguez-Jiménez, J. A., Campo-Landines, K. C., Martínez-Santos, J. C., Delahoz, E. J., & 
Contreras-Ortiz, S. H. (2020). A machine learning model for emotion recognition from 
physiological signals. Biomedical Signal Processing and Control, 55, 101646. 
https://doi.org/10.1016/j.bspc.2019.101646 
Dubey, S., & Dixit, M. (2019). Facial expression recognition using deep convolutional neural 
98 
 
University of Ghana http://ugspace.ug.edu.gh
networks. Computing Publications, 8(1), 130–135. 
https://doi.org/10.1109/KSE.2017.8119447 
Duchenne de Boulogne, G. . (1862). The Mechanism of Human Facial Expression. Cambridge 
University Press. https://doi.org/10.1097/00006534-199203000-00032 
Ekman, P. (1970). Universal-Facial-Expressions-of-Emotions. Calfornia Mental health. 
Ekman, P. (1977). Facial Expression, (1972), 97–116. 
Ekman, P. (1999). Basic Emotions. Encyclopedia of Personality and Individual Differences. 
https://doi.org/10.1007/978-3-319-28099-8_495-1 
Ekman, P., & Friesen, W. (1976). Mesauring facial movement.pdf. 
Ekundayo, O., & Viriri, S. (2019). Facial expression recognition: A review of methods, 
performances and limitations. 2019 Conference on Information Communications 
Technology and Society, ICTAS 2019. https://doi.org/10.1109/ICTAS.2019.8703619 
Fan, X., & Tjahjadi, T. (2019). Fusing dynamic deep learned features and handcrafted features 
for facial expression recognition. Journal of Visual Communication and Image 
Representation, 65, 1–6. https://doi.org/10.1016/j.jvcir.2019.102659 
Farajzadeh, N., & Hashemzadeh, M. (2018). Exemplar-based facial expression recognition. 
Information Sciences, 460–461, 318–330. https://doi.org/10.1016/j.ins.2018.05.057 
Fasel, B., Monay, F., & Gatica-Perez, D. (2004). Latent semantic analysis of facial action codes 
for automatic facial expression recognition. MIR’04 - Proceedings of the 6th ACM 
SIGMM International Workshop on Multimedia Information Retrieval, 181–188. 
https://doi.org/10.1145/1026711.1026742 
Feidakis, M. (2016). A Review of Emotion-Aware Systems for e-Learning in Virtual 
Environments. Formative Assessment, Learning Data Analytics and Gamification: In ICT 
99 
 
University of Ghana http://ugspace.ug.edu.gh
Education. Elsevier Inc. https://doi.org/10.1016/B978-0-12-803637-2.00011-7 
Feldman, M. (2018). Google Offers Glimpse of Third-Generation TPU Processor. Retrieved 
from https://www.top500.org/news/google-offers-glimpse-of-third-generation-tpu-
processor/ 
Frank, M. G. (2001). Facial Expression. International Encyclopedia of the Social & Behavioral 
Sciences, 5230–5234. https://doi.org/10.1016/B0-08-043076-7/01713-7 
Gao, K., Mei, G., Piccialli, F., Cuomo, S., Tu, J., & Huo, Z. (2020). Julia language in machine 
learning: Algorithms, applications, and open issues. Computer Science Review, 37, 
100254. https://doi.org/10.1016/j.cosrev.2020.100254 
Goeleven, E., De Raedt, R., Leyman, L., & Verschuere, B. (2008). The Karolinska directed 
emotional faces: A validation study. Cognition and Emotion, 22(6), 1094–1118. 
https://doi.org/10.1080/02699930701626582 
Gonzalez-sanchez, J., Baydogan, M., Chavez-echeagaray, M. E., Robert, K., & Burleson, W. 
(2017). Affect Measurement : A Roadmap Through Approaches, Technologies and Data 
Analysis. Emotions and Affect in Human Factors and Human-Computer Interaction. 
Elsevier Inc. https://doi.org/10.1016/B978-0-12-801851-4/00011-2 
Happy, S. L., Patnaik, P., Routray, A., & Guha, R. (2017). The Indian Spontaneous Expression 
Database for Emotion Recognition. IEEE Transactions on Affective Computing, 8(1), 
131–142. https://doi.org/10.1109/TAFFC.2015.2498174 
Haq, S., & Jackson, P. (2010). Machine Audition: Principles, Algorithms and Systems, chapter 
8. Multimodal Emotion Recognition, 398–423. Retrieved from 
http://scholar.google.com/scholar?hl=en&btnG=Search&q=intitle:Machine+Audition+:
+Principles+,+Algorithms+and+Systems#1 
Hossain, M. M. (2018). Facial Expression Recognition Based on LBP and CNN : A 
100 
 
University of Ghana http://ugspace.ug.edu.gh
Comparative Study Using SVM Classifier. Rajshahi University of Engineering and 
Technology. 
Huang, H., Hu, Z., Wang, W., & Wu, M. (2020). Multimodal Emotion Recognition Based on 
Ensemble Convolutional Neural Network. IEEE Access, 8(2), 3265–3271. 
https://doi.org/10.1109/ACCESS.2019.2962085 
Huang, X. (2014). Methods for Facial Expression Recognition With Applications in 
Challenging Situations. Emotion Review. University of Oulu, Finland. Retrieved from 
http://emr.sagepub.com/content/6/2/113.short 
Huang, X., Kortelainen, J., Zhao, G., Li, X., Moilanen, A., Seppänen, T., & Pietikäinen, M. 
(2015). Multi-modal emotion analysis from facial expressions and electroencephalogram. 
Computer Vision and Image Understanding, 147, 114–124. 
https://doi.org/10.1016/j.cviu.2015.09.015 
Huang, Yongrui, Yang, J., Liao, P., & Pan, J. (2017). Fusion of Facial Expressions and EEG 
for Multimodal Emotion Recognition. Computational Intelligence and Neuroscience, 
2017, 1–8. https://doi.org/10.1155/2017/2107451 
Huang, Yunxin, Chen, F., Lv, S., & Wang, X. (2019). Facial expression recognition: A survey. 
Symmetry, 11(10), 1–28. https://doi.org/10.3390/sym11101189 
Islam, B., Mahmud, F., Hossain, A., Mia, M. S., & Goala, P. B. (2019). Human facial 
expression recognition system using artificial neural network classification of gabor 
feature based facial expression information. 4th International Conference on Electrical 
Engineering and Information and Communication Technology, ICEEiCT 2018, 364–368. 
https://doi.org/10.1109/CEEICT.2018.8628050 
Izard, C. E. (2007). Basic emotions, natural kinds, emotion schemas. Association for 
Psychological Science, 2(3), 260–280. 
101 
 
University of Ghana http://ugspace.ug.edu.gh
J., W., & Watkins, C. (1999). Support Vector Machines for Multi-Class Pattern Recognition. 
ESANN, 219–224. 
Jain, D. K., Shamsolmoali, P., & Sehdev, P. (2019). Extended deep neural network for facial 
emotion recognition. Pattern Recognition Letters, 120, 69–74. 
https://doi.org/10.1016/j.patrec.2019.01.008 
Jain, V., Lamba, P. S., Singh, B., Namboothiri, N., & Dhall, S. (2019). Facial expression 
recognition using feature level fusion. Journal of Discrete Mathematical Sciences and 
Cryptography, 22(2), 337–350. https://doi.org/10.1080/09720529.2019.1582866 
Jakkula, V. (2011). Tutorial on Support Vector Machine (SVM). School of EECS, Washington 
State University, 1–13. Retrieved from 
http://www.ccs.neu.edu/course/cs5100f11/resources/jakkula.pdf 
Jameel, R., Singhal, A., & Bansal, A. (2016). A comprehensive study on Facial Expressions 
Recognition Techniques. Proceedings of the 2016 6th International Conference - Cloud 
System and Big Data Engineering, Confluence 2016, 478–483. 
https://doi.org/10.1109/CONFLUENCE.2016.7508167 
Jan, A. (2017). Deep Learning Based Facial Expression Recognition and Its Applications. 
Brunel University, London. 
Kanade, T., Cohn, J. F., & Tian, Y. (2000). Comprehensive database for facial expression 
analysis. Proceedings - 4th IEEE International Conference on Automatic Face and 
Gesture Recognition, FG 2000, 46–53. https://doi.org/10.1109/AFGR.2000.840611 
Karamizadeh, S., Abdullah, S. M., Manaf, A. A., Zamani, M., & Hooman, A. (2013). An 
Overview of Principal Component Analysis. Journal of Signal and Information 
Processing, 04(03), 173–175. https://doi.org/10.4236/jsip.2013.43b031 
Kassinove, H., Sukhodolsky, D. G., Eckhardt, C. I., & Tsytsarev, S. V. (1997). Development 
102 
 
University of Ghana http://ugspace.ug.edu.gh
of a Russian State-Trait anger Expression Inventory. Journal of Clinical Psychology, 
53(6), 543–557. https://doi.org/10.1002/(SICI)1097-4679(199710)53:6<543::AID-
JCLP3>3.0.CO;2-L 
Kaur, B., Singh, D., & Roy, P. P. (2018). EEG Based Emotion Classification Mechanism in 
BCI. In Procedia Computer Science (pp. 752–758). 
https://doi.org/10.1016/j.procs.2018.05.087 
Keshari, T., & Palaniswamy, S. (2019). Emotion Recognition Using Feature-level Fusion of 
Facial Expressions and Body Gestures. Proceedings of the 4th International Conference 
on Communication and Electronics Systems, ICCES 2019, (Icces), 1184–1189. 
https://doi.org/10.1109/ICCES45898.2019.9002175 
Khalid, S., Khalil, T., & Nasreen, S. (2014). A survey of feature selection and feature extraction 
techniques in machine learning. Proceedings of 2014 Science and Information 
Conference, SAI 2014, 372–378. https://doi.org/10.1109/SAI.2014.6918213 
Kiran, T., & Kushal, T. (2016). Facial expression classification using Support Vector Machine 
based on bidirectional Local Binary Pattern Histogram feature descriptor. 2016 
IEEE/ACIS 17th International Conference on Software Engineering, Artificial 
Intelligence, Networking and Parallel/Distributed Computing, SNPD 2016, 115–120. 
https://doi.org/10.1109/SNPD.2016.7515888 
Kirange, D. K., & Deshmukh, R. R. (2012). EMOTION CLASSIFICATION OF NEWS 
HEADLINES USING SVM, 5, 104–106. 
Koehrsen Will. (2018). Beyond Accuracy: Precision and Recall - Towards Data Science. 
Media.Com, 19, 1–4. Retrieved from https://towardsdatascience.com/beyond-accuracy-
precision-and-recall-3da06bea9f6c 
Kotu, V., & Deshpande, B. (2015). Data Mining Process. Predictive Analytics and Data 
103 
 
University of Ghana http://ugspace.ug.edu.gh
Mining, (1), 17–36. https://doi.org/10.1016/b978-0-12-801460-8.00002-1 
Kudiri, K. M., Said, A. M., & Nayan, M. Y. (2013). Emotion detection using relative grid based 
coefficients through human facial expressions. International Conference on Research and 
Innovation in Information Systems, ICRIIS, 2013, 45–48. 
https://doi.org/10.1109/ICRIIS.2013.6716683 
Kudiri, K. M., Said, A. M., & Nayan, M. Y. (2016). Human emotion detection through speech 
and facial expressions. 2016 3rd International Conference on Computer and Information 
Sciences, ICCOINS 2016 - Proceedings, 351–356. 
https://doi.org/10.1109/ICCOINS.2016.7783240 
Kumar, G. A. R., Kumar, R. K., & Sanyal, G. (2018). Facial emotion analysis using deep 
convolution neural network. Proceedings of IEEE International Conference on Signal 
Processing and Communication, ICSPC 2017, 2018-Janua(July), 369–374. 
https://doi.org/10.1109/CSPC.2017.8305872 
Kwong, J. C. T., Garcia, F. C. C., Abu, P. A. R., & Reyes, R. S. J. (2019). Emotion Recognition 
via Facial Expression: Utilization of Numerous Feature Descriptors in Different Machine 
Learning Algorithms. IEEE Region 10 Annual International Conference, 
Proceedings/TENCON, 2018-Octob(October), 2045–2049. 
https://doi.org/10.1109/TENCON.2018.8650192 
Ladha, L., & Deepa, T. (2011). Feature Selection Methods And Algorithms. International 
Journal on Computer Science and Engineering, 3(5), 1787–1797. Retrieved from 
http://journals.indexcopernicus.com/abstract.php?icid=945099 
Lai, C. C., & Ko, C. H. (2014). Facial expression recognition based on two-stage features 
extraction. Optik, 125(22), 6678–6680. https://doi.org/10.1016/j.ijleo.2014.08.052 
Lalitha, S., Geyasruti, D., Narayanan, R., & Shravani, M. (2015). Emotion Detection using 
104 
 
University of Ghana http://ugspace.ug.edu.gh
MFCC and Cepstrum Features, 70, 29–35. https://doi.org/10.1016/j.procs.2015.10.020 
Lecun, Y., Bottou, L., Bengio, Y., & Ha, P. (1998). Gradient-Based Learning Applied to 
Document Recognition. Proceedings of the IEEE, (November), 1–46. 
https://doi.org/10.1109/5.726791 
Li, S., & Deng, W. (2018). Deep Facial Expression Recognition: A Survey. IEEE, 1–25. 
Retrieved from http://arxiv.org/abs/1804.08348 
Li, Z. (2018). A discriminative learning convolutional neural network for facial expression 
recognition. 2017 3rd IEEE International Conference on Computer and Communications, 
ICCC 2017, 2018-Janua, 1641–1646. 
https://doi.org/10.1109/CompComm.2017.8322818 
Littlewort, G. C., Bartlett, M. S., & Lee, K. (2007). Faces of pain: Automated measurement of 
spontaneous facial expressions of genuine and posed pain. Proceedings of the 9th 
International Conference on Multimodal Interfaces, ICMI’07, (May 2014), 15–21. 
https://doi.org/10.1145/1322192.1322198 
Liu, L., Fieguth, P., Guo, Y., Wang, X., & Pietikäinen, M. (2017). Local binary features for 
texture classification: Taxonomy and experimental study. Pattern Recognition, 62, 135–
160. https://doi.org/10.1016/j.patcog.2016.08.032 
Liu, Y., Yuan, X., Gong, X., Xie, Z., Fang, F., & Luo, Z. (2018). Conditional convolution 
neural network enhanced random forest for facial expression recognition. Pattern 
Recognition. https://doi.org/10.1016/j.patcog.2018.07.016 
Lopes, A. T., de Aguiar, E., De Souza, A. F., & Oliveira-Santos, T. (2017). Facial expression 
recognition with Convolutional Neural Networks: Coping with few data and the training 
sample order. Pattern Recognition, 61, 610–628. 
https://doi.org/10.1016/j.patcog.2016.07.026 
105 
 
University of Ghana http://ugspace.ug.edu.gh
Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., & Matthews, I. (2010). The 
extended Cohn-Kanade dataset (CK+): A complete dataset for action unit and emotion-
specified expression. 2010 IEEE Computer Society Conference on Computer Vision and 
Pattern Recognition - Workshops, CVPRW 2010, (July), 94–101. 
https://doi.org/10.1109/CVPRW.2010.5543262 
Lyons, M., Akamatsu, S., Kamachi, M., & Gyoba, J. (1998). Coding facial expressions with 
Gabor wavelets. Proceedings - 3rd IEEE International Conference on Automatic Face 
and Gesture Recognition, FG 1998, 200–205. 
https://doi.org/10.1109/AFGR.1998.670949 
Mahmood, A., Hussain, S., Iqbal, K., & Elkilani, W. S. (2019). Recognition of Facial 
Expressions under Varying Conditions Using Dual-Feature Fusion. Mathematical 
Problems in Engineering, 2019, 1–12. https://doi.org/10.1155/2019/9185481 
Mangalagowri, S. G., & Raj, P. C. P. (2017). EEG feature extraction and classification using 
feed forward backpropogation algorithm for emotion detection. In 2016 International 
Conference on Electrical, Electronics, Communication, Computer and Optimization 
Techniques, ICEECCOT 2016 (pp. 183–187). IEEE. 
https://doi.org/10.1109/ICEECCOT.2016.7955211 
Martinez, B., & Valster, M. (2016). Advances in face detection and facial image analysis. 
Advances, Challenges, and Opportunities in Automatic Facial Expression Recognition. 
https://doi.org/10.1007/978-3-319-25958-1 
Matlovic, T., Gaspar, P., Moro, R., Simko, J., & Bielikova, M. (2016). Emotions detection 
using facial expressions recognition and EEG. Proceedings - 11th International Workshop 
on Semantic and Social Media Adaptation and Personalization, SMAP 2016, 18–23. 
https://doi.org/10.1109/SMAP.2016.7753378 
106 
 
University of Ghana http://ugspace.ug.edu.gh
Mayya, V., Pai, R. M., & Manohara Pai, M. M. (2016). Automatic Facial Expression 
Recognition Using DCNN. Procedia Computer Science, 93, 453–461. 
https://doi.org/10.1016/j.procs.2016.07.233 
Mehrabian, A. (1968). Communication Studies. Institute of Judicial Studies. 
Michel, P., & El Kaliouby, R. (2015). Facial expression recognition using Support Vector 
Machines. 2015 23rd Signal Processing and Communications Applications Conference, 
SIU 2015 - Proceedings, 276–279. https://doi.org/10.1109/SIU.2015.7129813 
Minaee, S., & Abdolrashidi, A. (2019). Deep-Emotion: Facial Expression Recognition Using 
Attentional Convolutional Network. Retrieved from http://arxiv.org/abs/1902.01019 
Mitsuyoshi, S., & Ren, F. (2013). Emotion Recognition. The Journal of The Institute of 
Electrical Engineers of Japan, 125(10), 641–644. 
https://doi.org/10.1541/ieejjournal.125.641 
Mohammadpour, M., Khaliliardali, H., Hashemi, S. M. R., & Alyannezhadi, M. M. (2018). 
Facial emotion recognition using deep convolutional networks. In 2017 IEEE 4th 
International Conference on Knowledge-Based Engineering and Innovation, KBEI 2017 
(Vol. 2018-Janua, pp. 0017–0021). https://doi.org/10.1109/KBEI.2017.8324974 
Moritz, D. A. (2006). Understanding Anger. The American Journal of Nursing, 78(1), 81. 
https://doi.org/10.2307/3424476 
Nassih, B., Amine, A., Ngadi, M., & Hmina, N. (2019). DCT and HOG Feature Sets Combined 
with BPNN for Efficient Face Classification. Procedia Computer Science, 148, 116–125. 
https://doi.org/10.1016/j.procs.2019.01.015 
Nguyen, D. H., Kim, S. H., Lee, G. S., Yang, H. J., Na, I. S., & Kim, S. H. (2019). Facial 
Expression Recognition Using a Temporal Ensemble of Multi-level Convolutional Neural 
Networks. IEEE Transactions on Affective Computing, 3045, 1–12. 
107 
 
University of Ghana http://ugspace.ug.edu.gh
https://doi.org/10.1109/TAFFC.2019.2946540 
Nguyen, D., Nguyen, K., Sridharan, S., Dean, D., & Fookes, C. (2018). Deep spatio-temporal 
feature fusion with compact bilinear pooling for multimodal emotion recognition. 
Computer Vision and Image Understanding, 174(July), 33–42. 
https://doi.org/10.1016/j.cviu.2018.06.005 
Nisbet, R., Miner, G., & Yale, K. (2018). Model Evaluation and Enhancement. Handbook of 
Statistical Analysis and Data Mining Applications, 215–233. 
https://doi.org/10.1016/b978-0-12-416632-5.00011-6 
Niu, Y., Zhao, Y., & Ni, R. R. (2017). Robust median filtering detection based on local 
difference descriptor. Signal Processing: Image Communication, 53(November 2016), 
65–72. https://doi.org/10.1016/j.image.2017.01.008 
Nugrahaeni, R. A., & Mutijarsa, K. (2017). Comparative analysis of machine learning KNN, 
SVM, and random forests algorithm for facial expression classification. In Proceedings - 
2016 International Seminar on Application of Technology for Information and 
Communication, ISEMANTIC 2016 (pp. 163–168). IEEE. 
https://doi.org/10.1109/ISEMANTIC.2016.7873831 
Ojala, T., Pietikäinen, M., & Harwood, D. (1996). A comparative study of texture measures 
with classification based on feature distributions. Pattern Recognition, 29(1), 51–59. 
https://doi.org/10.1016/0031-3203(95)00067-4 
Ojala, T., Pietikäinen, M., & Mäenpää, T. (2002). Multiresolution gray scale and rotation 
invariant texture classification with local binary patterns. Pattern Recognition, 1842, 404–
420. https://doi.org/10.1007/3-540-45054-8_27 
Oliphant, T. E. (2007). Python for scientific computing. Computing in Science and 
Engineering, 9(3), 10–20. https://doi.org/10.1109/MCSE.2007.58 
108 
 
University of Ghana http://ugspace.ug.edu.gh
Ortony, A., & Turner, T. J. (1990). What’s basic about basic emotions? Psychological Review, 
97(3), 315–331. Retrieved from 
http://search.ebscohost.com/login.aspx?direct=true&db=mnh&AN=1669960&site=ehos
t-live 
Pantic, M., & Bartlett, M. S. (2007). Machine Analysis of Facial Expressions. Intech (Vol. 5). 
https://doi.org/http://dx.doi.org/10.5772/57353 
Pantic, M., Pentland, A., Nijholt, A., & Hunag, T. S. (2007). Human Computing and Machine 
Understanding of Human Behaviour. Springer-Verlag : Human COmputing, 9359(4), 47–
71. Retrieved from https://ibug.doc.ic.ac.uk/media/uploads/documents/LNAI-
PanticEtAl-CAMERA.pdf 
Pantic, M., & Rothkrantz, L. Ü. M. (2000). Automatic analysis of facial expressions: The state 
of the art. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12), 
1424–1445. https://doi.org/10.1109/34.895976 
Patil, A., & Behele, K. (2018). Classification of Human Emotions Using Multiclass Support 
Vector Machine. 2017 International Conference on Computing, Communication, Control 
and Automation, ICCUBEA 2017, 1–4. https://doi.org/10.1109/ICCUBEA.2017.8463656 
Pell, P. J., & Richards, A. (2011). Cross-emotion facial expression aftereffects. Vision 
Research, 51(17), 1889–1896. https://doi.org/10.1016/j.visres.2011.06.017 
Pérez, F., Granger, B. E., & Hunter, J. D. (2011). Python: An ecosystem for scientific 
computing. Computing in Science and Engineering, 13(2), 13–21. 
https://doi.org/10.1109/MCSE.2010.119 
Petrushin, V. A. (2000). Emotion recognition in speech signal: Experimental study, 
development, and application. 6th International Conference on Spoken Language 
Processing, ICSLP 2000, (Icslp). 
109 
 
University of Ghana http://ugspace.ug.edu.gh
Plutchik, R. (1987). The nature of emotions. Philosophical Studies, 52(3), 393–409. 
https://doi.org/10.1007/BF00354055 
Pons, G., & Masip, D. (2018). Supervised Committee of Convolutional Neural Networks in 
Automated Facial Expression Analysis. IEEE Transactions on Affective Computing, 9(3), 
343–350. https://doi.org/10.1109/TAFFC.2017.2753235 
Prilianti, K. R., Brotosudarmo, T. H. P., Anam, S., & Suryanto, A. (2019). Performance 
comparison of the convolutional neural network optimizer for photosynthetic pigments 
prediction on plant digital image. AIP Conference Proceedings, 2084, 1–9. 
https://doi.org/10.1063/1.5094284 
Qayyum, H., Majid, M., Anwar, S. M., & Khan, B. (2017). Facial Expression Recognition 
Using Stationary Wavelet Transform Features. Mathematical Problems in Engineering, 
2017(1). https://doi.org/10.1155/2017/9854050 
Rajvanshi, K. (2018). An Efficient Approach for Emotion Detection from Speech Using Neural 
Networks. International Journal for Research in Applied Science and Engineering 
Technology, 6(5), 1062–1065. https://doi.org/10.22214/ijraset.2018.5170 
Raksarikorn, T., & Kangkachit, T. (2018). Facial Expression Classification using Deep 
Extreme Inception Networks. Proceeding of 2018 15th International Joint Conference on 
Computer Science and Software Engineering, JCSSE 2018, 1–5. 
https://doi.org/10.1109/JCSSE.2018.8457396 
Ramesh, S. (2018). A guide to an efficient way to build neural network architectures- Part I: 
Hyper-parameter selection and tuning for Dense Networks using Hyperas on Fashion-
MNIST. Retrieved June 14, 2020, from https://towardsdatascience.com/a-guide-to-an-
efficient-way-to-build-neural-network-architectures-part-i-hyper-parameter-
8129009f131b 
110 
 
University of Ghana http://ugspace.ug.edu.gh
Rani, J., & Garg, K. (2014). An Interface for Extracting and Cropping of Face from Video 
Frames. International Journal of Computer Science and Information Technologies, 5(3), 
4394–4397. 
Rashid, M. I., Hasan, M., Yeasmin, N., Shahnaz, C., Fattah, S. A., Zhu, W. P., & Ahmed, M. 
O. (2017). Emotion recognition based on vertical cross correlation sequence of facial 
expression images. Midwest Symposium on Circuits and Systems, 2017-Augus, 1344–
1347. https://doi.org/10.1109/MWSCAS.2017.8053180 
Revina, I. Michael, & Emmanuel, W. R. S. (2018). Facial Expression Recognition via Modified 
GAD Features with PSO-KNN. 2018 International Conference on Smart Systems and 
Inventive Technology (ICSSIT), 4(Icssit), 145–149. 
https://doi.org/10.1109/icssit.2018.8748697 
Revina, I . M, & Emmanuel, W. R. . (2018). A Survey on Human Face Expression Recognition 
Techniques. Journal of King Saud University - Computer and Information Sciences, 1–
10. https://doi.org/10.1016/j.jksuci.2018.09.002 
Revina, M. I., & Emmanuel, S. W. R. (2018). Face expression recognition using LDN and 
Dominant Gradient Local Ternary Pattern descriptors. Journal of King Saud University - 
Computer and Information Sciences, 1319–1578. 
https://doi.org/10.1016/j.jksuci.2018.03.015 
Rish, I., Hellerstein, J., & Jayram, T. (2001). An analysis of data characteristics that affect 
naive Bayes performance. IBM TJ Watson Research Center, 30, 1–8. 
https://doi.org/10.1.1.138.672 
Rizwan, A. K. (2013). Detection of emotions from video in non-controlled environment. 
Université Claude Bernard - Lyon I. 
Russell, J. A., & Pratt, G. (1980). A description of the affective quality attributed to 
111 
 
University of Ghana http://ugspace.ug.edu.gh
environments. Journal of Personality and Social Psychology, 38(2), 311–322. 
https://doi.org/10.1037//0022-3514.38.2.311 
Saeed, A., Al-Hamadi, A., Niese, R., & Elzobi, M. (2014). Frame-based facial expression 
recognition using geometrical features. Advances in Human-Computer Interaction, 2014. 
https://doi.org/10.1155/2014/408953 
Sagi, O., & Rokach, L. (2018). Ensemble learning: A survey. Wiley Interdisciplinary Reviews: 
Data Mining and Knowledge Discovery, 8(4), 1–18. https://doi.org/10.1002/widm.1249 
Shah, J. H., Sharif, M., Yasmin, M., & Fernandes, S. L. (2017). Facial expressions 
classification and false label reduction using LDA and threefold SVM. Pattern 
Recognition Letters, 0, 1–8. https://doi.org/10.1016/j.patrec.2017.06.021 
Shahsavarani, A. M., Noohi, S., Jafari, S., Kalkhoran, M. H., & Hatefi, S. (2015). Assessment 
& Measurement of Anger in Behavioral and Social Sciences: A Systematic Review of 
Literature. International Journal of Medical Reviews, 2(3), 279–286. 
Sharma, N., & Jain, C. (2019). Characterization of Facial Expression using Deep Neural 
Networks. 2019 5th International Conference on Advanced Computing and 
Communication Systems, ICACCS 2019, 492–495. 
https://doi.org/10.1109/ICACCS.2019.8728386 
Shu, L., Xie, J., Yang, M., Li, Z., Li, Z., Liao, D., … Yang, X. (2018). A review of emotion 
recognition using physiological signals. Sensors (Switzerland), 18(7), 1–41. 
https://doi.org/10.3390/s18072074 
Stibe, A., & Wiafe, I. (2018). Beyond Persuasive Cities: Spaces that Transform Human 
Behavior and Attitude. Persuasive Technology, 1–2. 
Sun, L., Zou, B., Fu, S., Chen, J., & Wang, F. (2019). Speech emotion recognition based on 
DNN-decision tree SVM model, 115(April), 29–37. 
112 
 
University of Ghana http://ugspace.ug.edu.gh
https://doi.org/10.1016/j.specom.2019.10.004 
Sun, W., Zhao, H., & Jin, Z. (2019). A facial expression recognition method based on ensemble 
of 3D convolutional neural networks. Neural Computing and Applications, 31(7), 2795–
2812. https://doi.org/10.1007/s00521-017-3230-2 
Talele, K., Shirsat, A., Uplenchwar, T., & Tuckley, K. (2017). Facial expression recognition 
using general regression neural network. IEEE Bombay Section Symposium 2016: 
Frontiers of Technology: Fuelling Prosperity of Planet and People, IBSS 2016, 1–6. 
https://doi.org/10.1109/IBSS.2016.7940203 
Tan, L., & Jiang, J. (2019). Image Processing Basics. Digital Signal Processing. 
https://doi.org/10.1016/b978-0-12-815071-9.00013-0 
Turk, M., & Pentland, A. (1991). Eigenfaces for Face Detection / Recognition. Journal of 
Cognitive Neuroscience, 3(1), 1–11. https://doi.org/10.1162/jocn.1991.3.1.71 
Tzirakis, P., Trigeorgis, G., Nicolaou, M. A., Schuller, B. W., & Zafeiriou, S. (2017). End-to-
End Multimodal Emotion Recognition Using Deep Neural Networks. IEEE Journal on 
Selected Topics in Signal Processing, 11(8), 1301–1309. 
https://doi.org/10.1109/JSTSP.2017.2764438 
Valero, H. G. (2016). Automatic Facial Expression Recognition. University of Manchester. 
Retrieved from 
http://studentnet.cs.manchester.ac.uk/resources/library/thesis_abstracts/MSc16/FullText/
GamboaValero-Hugo-diss.pdf 
Vapnik, V., Golowich, S. E., & Smola, A. (1997). Support vector method for function 
approximation, regression estimation, and signal processing. Advances in Neural 
Information Processing Systems, 281–287. 
Verma, K., & Khunteta, A. (2017). Facial expression recognition using Gabor filter and multi-
113 
 
University of Ghana http://ugspace.ug.edu.gh
layer artificial neural network. IEEE International Conference on Information, 
Communication, Instrumentation and Control, ICICIC 2017, 1–5. 
https://doi.org/10.1109/ICOMICON.2017.8279123 
Viola, P., & Jones, M. (2004). Robust Real-Time Face Detection Intro to Face Detection. 
International Journal of Computer Vision, 57(2), 137–154. 
Vo, D. M., & Le, T. H. (2016). Deep generic features and SVM for facial expression 
recognition. NICS 2016 - Proceedings of 2016 3rd National Foundation for Science and 
Technology Development Conference on Information and Computer Science, 80–84. 
https://doi.org/10.1109/NICS.2016.7725672 
Wang, Z., Jiang, R., Jiang, X., & Zhou, T. (2016). Learning sparse representations by K-SVD 
for facial expression classification. Proceedings of 2015 4th International Conference on 
Computer Science and Network Technology, ICCSNT 2015, 01(Iccsnt), 772–775. 
https://doi.org/10.1109/ICCSNT.2015.7490856 
Watkins, D. (2018). Getting started with Anaconda Python for data science | Opensource.com. 
Retrieved June 12, 2020, from https://opensource.com/article/18/4/getting-started-
anaconda-python 
Wei, W., & Jia, Q. (2016). Weighted Feature Gaussian Kernel SVM for Emotion Recognition. 
Computational Intelligence and Neuroscience, 2016. 
https://doi.org/10.1155/2016/7696035 
Winters-Miner, L. ., Bolding, P. ., Hilbe, J. ., Goldstein, M., Hill, T., Nisbet, R., … Miner, G. 
. (2015). Chapter 15. Prediction in Medicine – The Mining Algorithms of Predictive 
Analytics. In Practical Predictive Analytics and Decisioning Systems for Medicine (pp. 
1–21). 
Xu, Y., Pang, Y., & Jiang, X. (2019). A Facial Expression Recognition Methond Based on 
114 
 
University of Ghana http://ugspace.ug.edu.gh
Improved HOG Features and Geometric Features. Proceedings of 2019 IEEE 4th 
Advanced Information Technology, Electronic and Automation Control Conference, 
IAEAC 2019, (Iaeac), 1118–1122. https://doi.org/10.1109/IAEAC47372.2019.8997772 
Yang, B., Cao, J., Ni, R., & Zhang, Y. (2017). Facial Expression Recognition Using Weighted 
Mixture Deep Neural Network Based on Double-Channel Facial Images. IEEE Access, 6, 
4630–4640. https://doi.org/10.1109/ACCESS.2017.2784096 
Yang, M. H., Kriegman, D. J., & Ahuja, N. (2002). Detecting faces in images: A survey. IEEE 
Transactions on Pattern Analysis and Machine Intelligence, 24(1), 34–58. 
https://doi.org/10.1109/34.982883 
Yu, H., & Liu, H. (2015). Combining appearance and geometric features for facial expression 
recognition. Sixth International Conference on Graphic and Image Processing (ICGIP 
2014), 9443, 944308. https://doi.org/10.1117/12.2179066 
Yun, W. H., Kim, D. H., Chi, S. Y., & Yoon, H. S. (2007). Two-dimensional logistic 
regression. Proceedings - International Conference on Tools with Artificial Intelligence, 
ICTAI, 2(1), 349–353. https://doi.org/10.1109/ICTAI.2007.48 
Zarbakhsh, P., & Demirel, H. (2018). Fuzzy SVM for 3D facial expression classification using 
sequential forward feature selection. Proceedings - 9th International Conference on 
Computational Intelligence and Communication Networks, CICN 2017, 2018-Janua, 
131–134. https://doi.org/10.1109/CICN.2017.8319371 
Zavaschi, T H H, & Koerich, A. L. (2011). FACIAL EXPRESSION RECOGNITION USING 
ENSEMBLE OF CLASSIFIERS Pontifical Catholic University of Paran ´ a Department 
of Computer Science Curitiba , PR , Brazil Federal University of Paran ´ a Department of 
Computer Science, 1489–1492. 
Zavaschi, Thiago H.H., Britto, A. S., Oliveira, L. E. S., & Koerich, A. L. (2013). Fusion of 
115 
 
University of Ghana http://ugspace.ug.edu.gh
feature sets and classifiers for facial expression recognition. Expert Systems with 
Applications, `40(2), 646–655. https://doi.org/10.1016/j.eswa.2012.07.074 
Zhan, J., Ren, J., Sun, P., Fan, J., Liu, C., & Luo, J. (2018). The neural basis of fear promotes 
anger and sadness counteracts anger. Neural Plasticity, 2018(Figure 1). 
https://doi.org/10.1155/2018/3479059 
Zhang, S., Zhang, S., Huang, T., & Gao, W. (2018). Speech Emotion Recognition Using Deep 
Convolutional Neural Network and Discriminant Temporal Pyramid Matching. IEEE 
Transactions on Multimedia, 20(6), 1576–1590. 
https://doi.org/10.1109/TMM.2017.2766843 
Zhang, T., Zheng, W., Cui, Z., Zong, Y., & Li, Y. (2019). Spatial-Temporal Recurrent Neural 
Network for Emotion Recognition. IEEE Transactions on Cybernetics, 49(3), 939–947. 
https://doi.org/10.1109/TCYB.2017.2788081 
Zhang, Y. D., Yang, Z. J., Lu, H. M., Zhou, X. X., Phillips, P., Liu, Q. M., & Wang, S. H. 
(2016). Facial emotion recognition based on biorthogonal wavelet entropy, fuzzy support 
vector machine, and stratified cross validation. IEEE Access, 4, 8375–8385. 
https://doi.org/10.1109/ACCESS.2016.2628407 
Zhao, Y., Georganas, N. D., & Petriu, E. M. (2010). Applying Contrast-limited Adaptive 
Histogram Equalization and integral projection for facial feature enhancement and 
detection. 2010 IEEE International Instrumentation and Measurement Technology 
Conference, I2MTC 2010 - Proceedings, 861–866. 
https://doi.org/10.1109/IMTC.2010.5488048 
Zheng, W. L., Liu, W., Lu, Y., Lu, B. L., & Cichocki, A. (2019). EmotionMeter: A Multimodal 
Framework for Recognizing Human Emotions. IEEE Transactions on Cybernetics, 49(3), 
1110–1122. https://doi.org/10.1109/TCYB.2018.2797176 
116 
 
University of Ghana http://ugspace.ug.edu.gh
Zhong, S., Chen, Y., & Liu, S. (2014). Facial expression recognition using local feature 
selection and the extended nearest neighbor algorithm. Proceedings - 2014 7th 
International Symposium on Computational Intelligence and Design, ISCID 2014, 1, 328–
331. https://doi.org/10.1109/ISCID.2014.108 
Zuiderveld, K. (1994). Contrast Limited Adaptive Histogram Equalization. In Graphics Gems 
IV (pp. 474–485). USA: Academic Press Professional, Inc. 
 
 
117 
 
University of Ghana http://ugspace.ug.edu.gh
Appendix 
Source code for SVM on JAFFE dataset. 
#OpenCV module 
import cv2 
import os 
import numpy as np 
from imutils import paths 
from PIL import Image 
from pyimagesearch.localbinarypatterns import LocalBinaryPatterns 
from sklearn import svm 
from sklearn.decomposition import PCA 
from sklearn.metrics import classification_report, confusion_matrix 
from sklearn.metrics import accuracy_score, precision_score, recall_score 
from sklearn.model_selection import train_test_split, learning_curve, cross_val_score 
from sklearn.model_selection import GridSearchCV 
from sklearn.svm import SVC 
from sklearn.svm import LinearSVC 
from sklearn.preprocessing import StandardScaler 
 
 
118 
 
University of Ghana http://ugspace.ug.edu.gh
import matplotlib.pyplot as plt 
#function to detect face using OpenCV 
def detect_face(img): 
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) 
    face_cascade = cv2.CascadeClassifier('../haarcascade_frontalface_default.xml') 
    faces = face_cascade.detectMultiScale(gray, scaleFactor=1.2, minNeighbors=5); 
    if (len(faces) == 0): 
        return None, None 
    (x, y, w, h) = faces[0] 
    return gray[y:y+w, x:x+h] 
print("==============================================================
======") 
print("                         PROCESSING") 
print("==============================================================
======\n") 
emotion = {'angry' : 1, 'not-angry' : 2} 
def face_det_crop_resize(img_path): 
    img = cv2.imread(img_path) 
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) 
    cv2.imwrite(img_path, gray) 
119 
 
University of Ghana http://ugspace.ug.edu.gh
    face_cascade = cv2.CascadeClassifier('../haarcascade_frontalface_default.xml')  
    img = cv2.imread(img_path) 
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) 
    faces = face_cascade.detectMultiScale(gray, 1.3, 5) 
    for (x,y,w,h) in faces: 
        face_clip = img[y:y+h, x:x+w] 
        cv2.imwrite(img_path, cv2.resize(face_clip, (128, 128))) 
def preprocessing(data_folder_path): 
    dirs = os.listdir(data_folder_path) 
    for dir_name in dirs: 
        subject_dir_path = data_folder_path + "/" + dir_name 
        subject_images_names = os.listdir(subject_dir_path) 
        for image_name in subject_images_names: 
            image_path = subject_dir_path + "/" + image_name 
            print("Detecting, cropping, resizing, and saving : ", image_name) 
            if face_det_crop_resize(image_path): 
                print(image_path) 
                print(image_name) 
def prepare_training_data(data_folder_path): 
    dirs = os.listdir(data_folder_path) 
120 
 
University of Ghana http://ugspace.ug.edu.gh
    faces = [] 
    labels = [] 
    data = [] 
    desc = LocalBinaryPatterns(24,8) 
    for dir_name in dirs: 
        subject_dir_path = data_folder_path + "/" + dir_name 
        subject_images_names = os.listdir(subject_dir_path) 
        for image_name in subject_images_names: 
            if image_name.startswith("."): 
                continue; 
            image_path = subject_dir_path + "/" + image_name 
            print(image_name) 
            print(image_path) 
            image = cv2.imread(image_path) 
            print(image) 
            gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) 
            for x, y in emotion.items(): 
                if dir_name == x: 
                    labels.append(y) 
            #face = detect_face(image) 
121 
 
University of Ghana http://ugspace.ug.edu.gh
            #cv2.resize(face,(128,128)) 
            #faces.append(face) 
            #smoothened_faces = image_smoothening(gray,image_path) 
            hist = desc.describe(gray) 
            #clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8,8)) 
            #cl1 = clahe.apply(smoothened_faces) 
            #hist = desc.describe(cl1) 
            #labels.append(label) 
            data.append(hist) 
            #new_data = np.array(data) 
    cv2.destroyAllWindows() 
    cv2.waitKey(1) 
    cv2.destroyAllWindows() 
    return data, labels 
def image_smoothening(image_name, image_path): 
    #print("Smoothening face on {}".format(image_path)) 
    blur = cv2.GaussianBlur(image_name,(5,5),0) 
    medianBlur = cv2.medianBlur(blur,5) 
    print("Smoothened face on {}".format(image_path)) 
    return medianBlur 
122 
 
University of Ghana http://ugspace.ug.edu.gh
 
print("Pre-processing..................") 
preprocessing("data") 
print("Preparing training data.........") 
data, labels = prepare_training_data("data") 
print('\n') 
#data = np.array(data) 
print("Data prepared") 
#print total faces and labels 
#print("Total faces: ", len(faces)) 
print("Total faces detected in the data: ", len(data)) 
print("Total labels detected in the data: ", len(labels)) 
print('\n') 
#new_data = image_smoothening(data) 
new_data = np.array(data) 
scaler = StandardScaler() 
scaler.fit(new_data) 
scaled_data = scaler.transform(new_data) 
pca = PCA(n_components=26, whiten=False) 
pca.fit(scaled_data) 
123