University of Ghana http://ugspace.ug.edu.gh
UNIVERSITY OF GHANA 
COLLEGE OF BASIC & APPLIED SCIENCE 
 
  
  
  
 
  
  
    
  
  
A FRAMEWORK TO DETERMINE SARCASTIC SENTIMENTS IN OPINION POLLS 
   
BY  
 FREDRICK BOAFO  
(10495750)   
THIS THESIS IS SUBMITTED TO THE UNIVERSITY OF GHANA, LEGON IN 
PARTIAL FULFILLMENT OF THE REQUIREMENT FOR THE AWARD OF 
MPHIL COMPUTER SCIENCE DEGREE 
 
DEPARTMENT OF COMPUTER SCIENCE 
 
JULY, 2020 
 
 
University of Ghana http://ugspace.ug.edu.gh
DECLARATION  
 I hereby declare that this thesis is my original research work carried out at the Department of 
Computer Science, University of Ghana, Legon under the supervision of my thesis supervisors. 
Additions from other people have been duly cited and acknowledged.  
STUDENT  
Name: Fredrick Boafo 
Signature:          
Date  23rd July 2020        
 
SUPERVISOR  
Name:    Dr. Solomon Mensah  
 
Signature:    ..........................................................          
Date:    .............03/02/2021.......................................          
CO-SUPERVISOR  
Name:    Dr. Justice Kwame Appati 
Signature:    ..........................................................          
Date:    .......3..r..d.. .F...e..b..r..u..a..r..y..,. .2..0..2...1..................          
ii 
 
University of Ghana http://ugspace.ug.edu.gh
DEDICATION  
 
To my family and loved ones. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
iii 
 
University of Ghana http://ugspace.ug.edu.gh
ACKNOWLEDGMENT 
 
I express my profound thanks to my LORD, God Almighty, for granting me His wisdom, 
knowledge, and understanding and guiding and protecting me throughout the writing of this thesis 
successfully.    
I wish to tender my profound gratitude to my supervisors Dr. Solomon Mensah and Dr. Justice 
Kwame Appati and all other lecturers for their support and guidance throughout my studies and 
thesis work. May the Almighty Lord bless and replenish all that they have lost in the course of 
executing their duties.  
My next appreciation goes to the entire staff of the Department of Computer Science for their vital 
help and support in making my thesis a success.  
Finally, to my family members, Mr. Alfred Koomson, Margaret Cudjoe, Rev. and Mrs. Anthony 
Owusu Sekyere Kwarteng, Rev. and Mrs. Gideon Boadu-yirenkyi, Christian Bossu, Priscilla 
Adjei-Amoako, Serwaa Owusu-Donkor, Simon Sackey, Samuel Abedu, Abigail Wiafi, Rev Father 
William Abeiku Apprey, Jacqueline Kumi, Melody Kakraba, Akon Ekpezu, Dr. Ferdinard 
Katsriku, Dr .Abdulai Jamal, Dr. Winfred Yaokumah, Dr. Isaac Wiafi, Mr. Chris Armefio and 
Auntie Cynthia, Uncle Dan, Paul Yidana and all my colleagues in the Department who helped in 
assorted ways to make this work a success, I say God richly bless you and take you to higher 
heights in your academic and life endeavors.  
 
 
 
 
iv 
 
University of Ghana http://ugspace.ug.edu.gh
ABSTRACT  
 
An examination of the expression of opinions and attitudes by social media and internet users 
toward a specific topic is what sentiment analysis is about. In diverse fields such as commerce, 
politics and education, it is predominantly utilized to aid decision making. Sarcasm delivers an 
implicit information that is opposite to what one positively declares or writes. It is often viewed as 
a witty language that expresses scorn, insult and reprimand in a hilarious manner. Sarcastic 
sentiments by people become a problem if they cannot be detected in a sentiment classification 
since that can affect judgment and decision making. Previous works have indicated that the 
detection of sarcasm during sentiment analysis can be a very tedious task and may consume a lot 
of time.  
To classify a sentiment into sarcastic or non-sarcastic, and determine the level of predicted 
accuracy and loss. By so doing, one can be able to investigate the significant impact and effect of 
sarcastic sentiment in an opinion poll through a theoretical and empirical experiment. The study 
reviews different approaches for detecting sarcasm in sentiment analysis and how they perform. 
Lastly, the study introduces a framework that will assist in the identification and classification of 
sarcastic sentiments in an opinion poll by classifying sentiments as sarcastic or non-sarcastic.  
A framework is introduced that comprises of three operators namely Cluster+Expert Judgement, 
Train & Validate and Classify & predict; that facilitates sentiment classification. Based on our 
empirical findings on state-of-the-act deep learning techniques such as BLSTM and sAtt-BLSTM, 
we employed a simple, efficient and straight forward approach by employing purely LSTM to 
produce the framework that helps in the determination of sarcastic sentiments. The driverless car 
dataset composed of the emergence of driverless cars being powered by Google is considered for 
v 
 
University of Ghana http://ugspace.ug.edu.gh
the empirical analysis. We applied the X-means clustering algorithm with expert judgment to 
ensure the efficient labeling of the chosen unsupervised dataset into sarcastic or non-sarcastic 
classes. We also employed a broad-spectrum twitter dataset consisting of both sarcastic and non-
sarcastic tweets. The performance of our model has been evaluated using performance measures 
such as recall, precision, accuracy and F1-score emerging from a generated confusion matrix. 
Our test result for the sarcastic statement using a sarcastic tweet has a prediction value of 0.998875 
while that of the non-sarcastic tweet has a prediction value of 0.000055. The results provide an 
accurate, efficient, and reliable prediction based on the generated confusion matrix and loss figures 
derived. The result, therefore, indicates that though LSTM is relatively cheap as compared to 
BLSTM, it resulted in improved classification performance. 
Based on our empirical investigations through a thorough review, a framework has been 
introduced to classify sentiments as sarcastic or non-sarcastic. X-means clustering algorithm with 
an expert judgment approach has been applied to label extracted dataset without complete labels. 
A deep learning technique such as LSTM has been adopted because it is efficient but cheaper as 
compared to most state-of-the-art deep learning techniques.  
Keywords: Sentiment analysis, Sarcasm, Classification techniques, Long Short Term Memory, 
Opinion poll 
 
vi 
 
University of Ghana http://ugspace.ug.edu.gh
TABLE OF CONTENT 
 
DECLARATION ............................................................................................................................ ii 
DEDICATION ............................................................................................................................... iii 
ACKNOWLEDGMENT................................................................................................................ iv 
ABSTRACT .................................................................................................................................... v 
TABLE OF CONTENT ................................................................................................................ vii 
LIST OF TABLES ......................................................................................................................... xi 
LIST OF FIGURES ...................................................................................................................... xii 
NOMENCLATURE .................................................................................................................... xiii 
CHAPTER 1 ................................................................................................................................... 1 
INTRODUCTION .......................................................................................................................... 1 
1.1. Background and Motivation ............................................................................................. 2 
1.2. Problem Statement ........................................................................................................... 4 
1.4. Objectives ......................................................................................................................... 4 
1.5. Scope ................................................................................................................................ 4 
1.6. Research Contribution ...................................................................................................... 4 
1.7. Conclusion ........................................................................................................................ 5 
1.8. Organization of Study and Research Plan ........................................................................ 5 
 
CHAPTER 2 ................................................................................................................................... 6 
SYSTEMATIC LITERATURE REVIEW ..................................................................................... 6 
2.1. Introduction ...................................................................................................................... 6 
2.2. Related Works on Sentiment Analysis ............................................................................. 6 
vii 
 
University of Ghana http://ugspace.ug.edu.gh
2.3. Reviews on Sarcasm in Sentiment Analysis .................................................................. 10 
2.4. A Systematic review of sarcasm in sentiment analysis .................................................. 12 
2.4.1. Research Methodology ........................................................................................... 12 
2.4.2. Research Problem ................................................................................................... 13 
2.4.3. Research Questions ................................................................................................. 13 
2.4.4. Research Boundaries ............................................................................................... 14 
2.4.5.  Review Method ...................................................................................................... 15 
2.4.6. Classification of Papers........................................................................................... 15 
2.4.7. Research Process ..................................................................................................... 16 
2.4.8. Inclusion Criteria .................................................................................................... 16 
2.4.9. Exclusion Criteria ................................................................................................... 17 
2.4.10. Studies Selection ..................................................................................................... 17 
2.4.11. Results and Discussion ........................................................................................... 21 
2.4.12. Threats to Validity .................................................................................................. 24 
2.4.13 Conclusion .............................................................................................................. 24 
 
CHAPTER 3 ................................................................................................................................. 26 
METHODOLOGY ....................................................................................................................... 26 
3.1. Introduction .................................................................................................................... 26 
3.2. Description of Datasets .................................................................................................. 26 
3.2.1.  Driverless Car dataset: ........................................................................................... 26 
3.2.2. Extensive Twitter Dataset ....................................................................................... 27 
3.3. Experimental language Used .......................................................................................... 27 
3.4. Deep Learning ................................................................................................................ 28 
3.5. Deep Learning versus Machine Learning ...................................................................... 28 
3.6. Recurrent neural network ............................................................................................... 29 
3.7. LSTM ............................................................................................................................. 30 
3.8. Sentiment Analysis with LSTM ..................................................................................... 32 
3.9. Implementation Using LSTM ........................................................................................ 32 
3.9.1. Loading in and visualizing the data ........................................................................ 32 
3.9.2. Data pre-processing ................................................................................................ 33 
viii 
 
University of Ghana http://ugspace.ug.edu.gh
3.9.3. Sentiment Network with PyTorch .......................................................................... 36 
3.9.4. The Embedding Layer............................................................................................. 37 
3.9.5. The LSTM Layer .................................................................................................... 37 
3.9.6. Instantiating the Network ........................................................................................ 38 
3.9.7. Training ................................................................................................................... 39 
3.9.8. Testing..................................................................................................................... 39 
3.9.9. Trying out test ......................................................................................................... 39 
3.10. Network Architecture ................................................................................................. 40 
3.10.1. Passing in words into an embedding layer. ............................................................. 40 
3.11.   Performance Evaluation ................................................................................................. 40 
3.12.   Framework for Sarcasm Detection ................................................................................. 42 
3.12.1. CLUSTER+EXPERT JUDGEMENT (CLUSTEXPERT) ..................................... 43 
3.12.2. TRAIN & VALIDATE ........................................................................................... 43 
3.12.3. CLASSIFY & PREDICT NEW INSTANCES ....................................................... 44 
3.12.4. Pseudocode for Framework .................................................................................... 44 
3.13. Conclusion .................................................................................................................. 44 
 
CHAPTER 4 ................................................................................................................................. 46 
RESULTS AND DISCUSSION ................................................................................................... 46 
4.1. Visualization................................................................................................................... 46 
4.2. Training, Validation and Test Sets ................................................................................. 47 
4.3. Sampling......................................................................................................................... 49 
4.4. Training .......................................................................................................................... 49 
4.5. Testing ............................................................................................................................ 51 
4.5.1 Inference on Test Review: ...................................................................................... 52 
 
CHAPTER 5 ................................................................................................................................. 58 
CONCLUSION ............................................................................................................................. 58 
5.1. Summary and Conclusion .............................................................................................. 58 
5.2. Threats to Validity .......................................................................................................... 61 
ix 
 
University of Ghana http://ugspace.ug.edu.gh
5.2.1  External Validity ..................................................................................................... 61 
5.2.2  Internal Validity ...................................................................................................... 61 
5.2.3  Constructive Validity .............................................................................................. 61 
5.2.4.  Conclusion Validity ................................................................................................ 62 
5.3 Future Work ................................................................................................................... 62 
 
REFERENCES ............................................................................................................................. 63 
Appendix A ................................................................................................................................... 68 
Appendix B: the results obtained using our extensive twitter dataset .......................................... 72 
Appendix C: shows the result on our sampling considering the driverless dataset. ..................... 72 
Appendix D: shows the result on our sampling considering the extensive or broad-spectrum 
dataset. .......................................................................................................................................... 73 
Appendix E: shows the results obtained using our driverless car dataset..................................... 75 
Appendix F: General Overview of Sarcasm Detection Operators Flowchart ............................... 78 
 
 
 
 
 
 
 
 
 
 
x 
 
University of Ghana http://ugspace.ug.edu.gh
LIST OF TABLES 
 
Table 1 Distribution of search results obtained from different Publisher’s sites. .................................... 16 
Table 2 Confusion Matrix for computing precision and recall ................................................................ 41 
Table 3 Selected articles using SLR......................................................................................................... 68 
Table 4    Training, validation and test set results ....................................................................................... 48 
Table 5 Confusion matrix for trained driverless car dataset ................................................................... 54 
Table 6 Confusion matrix for validated driverless car dataset ................................................................. 54 
Table 7 Confusion matrix for validated extensive twitter dataset ............................................................ 55 
Table 8 Confusion matrix for test data using the driverless car dataset. .................................................. 56 
Table 9 Confusion matrix for test data using the extensive twitter dataset. ............................................. 56 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
xi 
 
University of Ghana http://ugspace.ug.edu.gh
LIST OF FIGURES 
 
Fig 1. Sentiment Analysis Approaches (Serrano-Guerrero, Olivas, Romero, & Herrera-Viedma, 2015) .. 9 
Fig 2. Database search results of overall papers obtained after the application of inclusion and 
exclusion criteria ......................................................................................................................................... 19 
Fig 3. Statistics on papers obtained after inclusion and exclusion criteria applied. .............................. 19 
Fig 4. Chronology of the selection process (SLR protocol) .................................................................. 20 
Fig 5. Scores of papers that were considered for research questions ................................................... 20 
Fig 6.  Diagrammatical presentation of RNN (Srikanth, 2017) ............................................................ 30 
Fig 7.  Diagrammatical presentation of LSTM with gates (Srikanth, 2017) ......................................... 31 
Fig 8.  Visualization of sarcastic and non-sarcastic sentiments for the driverless dataset ..................... 46 
Fig 9.  Visualization of sarcastic and non-sarcastic sentiments for extensive twitter dataset ............... 46 
Fig 10.  2D array features ........................................................................................................................ 47 
Fig 11.  Feature shapes of train, validation and test sets for driverless car dataset ................................. 48 
Fig 12.  Feature shapes of train, validation and test sets for extensive twitter dataset ............................ 48 
Fig 13.  Data loading and batching results using tensor Dataset considering driverless car dataset ....... 48 
Fig 14.  Data Loading and Batching results using tensor Dataset considering the extensive twitter 
dataset …………………………………………………………………………………………………..48 
Fig 15.  Obtained feature from test review .............................................................................................. 52 
 
 
 
 
xii 
 
University of Ghana http://ugspace.ug.edu.gh
NOMENCLATURE 
LSTM      Long Short Term Memory 
RNN      Recurrent Neural Network 
 SVM      Support Vector Machine  
TF-IDF     Term Frequency and Inverse Document Frequency 
 NB      Naïve Bayes 
 LR      Logistic Regression 
API       Application Program Interface 
 DNN      Deep Neural Network 
 CNN      Convolutional Neural Network 
 GRU      Gated Recurrent Unit 
 POS      Part of Speech 
SLR      Systematic Literature Review 
GPU      Graphics processing Unit 
SAtt-BLSTM Soft Attention-Based Bidirectional Long Short-
Term Memory Model With Convolution Network 
 
xiii 
 
University of Ghana http://ugspace.ug.edu.gh
CHAPTER 1 
INTRODUCTION 
Sentiment analysis is the expression of opinions and attitudes by social media and internet users 
towards a specific topic or subject. It is predominantly applied in the area of commerce, politics, 
and education, and in the entertainment industry amongst others to help in decision making. 
Sarcasm, on the other hand, is when someone delivers implicit information which is opposite to 
what was said or written (Bouazizi & Ohtsuki, 2015). Most people view sarcasm as a witty 
language that conveys scorn or insult and as a language used to reprimand something or someone 
hilariously. The authors Parwal et al. (2018) and Bharti et al. (2017), consider sarcasm analysis as 
a very tedious task. The feature of sarcastic sentiments that renders it tedious in its detection is the 
gap that exists between literal and intended meaning. Sarcasm is employed often in the day to day 
speech and it is prevalent in the contexts of online (Teh, Boon, Chan, & Chuah, 2018). Sarcasm 
detection and scrutiny have become one of the core problems in natural language processing and 
detection of sarcastic sentiment in online media platforms including Twitter, Facebook, online 
blogs and others has become critical as they go a long way to influence decision making in 
organizations (Bharti et al., 2018). Most researchers ignore sarcasm when undertaking sentiment 
analysis because they see it as a complex task that consumes time and effort  (Parmar et al., 2018; 
Rahayu et al., 2018; Chandankhede & Chaudhari, 2018). Some researchers also believe that 
sarcasm is for criticisms and mockery (Bouazizi & Otsuki, 2016; Porwal et al., 2019). 
Our focus in this research is on sarcasm in sentiment analysis. Sarcasm in sentiment analysis is an 
expression where a person directs their opinions opposite to what they truly mean during an 
opinion solicitation on a particular platform (Rendalkar & Chandankhede, 2018a). In most cases, 
sarcastic sentiment or statement can be very funny and irritating (Bhan & D’Silva, 2018). Sarcasm 
1 
 
University of Ghana http://ugspace.ug.edu.gh
actually modifies the extremity of a positive or negative articulation to the opposite of it (Khullar 
& Singh, 2019). Due to the complexity of sarcasm detection, there is quite a few research 
undertaken on the topic. This thesis, therefore, seeks to undertake further research on sarcasm in 
sentiment analysis by undertaking a systematic review and an in-depth analysis of sarcasm in 
sentiment analysis over the past 5 years and then develop a system that helps in the identification 
of a sarcastic statement when undertaking sentiment analysis. 
 
1.1. Background and Motivation 
Sentiment analysis is among the trending areas in artificial intelligence (AI) whereby researchers 
are focusing much attention on conducting a series of studies. The opinions of individuals are very 
critical in planning and decision making. Decision-makers for that matter cannot make suitable 
decisions or lead appropriately if they cannot understand the sentiment of the general public. Social 
media is flourishing and sentiments of individuals are made known and exposed to these platforms. 
What then is Sentiment Analysis? According to Agarwal et al. (2015), Sentiment analysis basically 
involves analyzing people’s sentiments towards a particular subject matter. 
In one among the quarters of 2015, the social media network was having three-hundred and five 
million monthly users who were very active , and in 2018 the quarterly statistics of active monthly 
users upgraded to three-hundred and thirty-five million Losada & Benito(2018). To obtain real-
time information for processing, social networks like Twitter, Facebook and the likes become 
inevitable (Paredes-valverde, Colomo-palacios, Salas-zárate, & Valencia-garcía, 2017). 
Information on social media reveals users’ emotions and attitudes on every topic that they will 
actually be finding readers and listeners (Li et al., 2013). Twitter which was established in 2006 
has experienced the rapid increase of users within the first years of operations. Currently, it has 
2 
 
University of Ghana http://ugspace.ug.edu.gh
over five hundred million registered users and over two hundred million active monthly users 
(Romanowski, 2015). In graceful of this efficacious Twitter drive, all major contenders and 
political parties are now having some sort of existence on social media. These growths of Twitter 
usage during elections by politicians, campaigner and even the public has led to ever-increasing 
research within the areas of social media sentiment analysis and data analytics (Park, Sung, 
Sharma, Jeong, & Yi, 2017). Applications of sentiment analyses are in the area of E-commerce, 
Politics, Business, Education, Media, etc. Many researchers have produced works that help in the 
classification of sentiments by classifying them as positive, negative, and neutral as seen from our 
related works. 
Current research in the field of sentiment analyses is contributing to solving the matter of sarcasm 
in the analysis of sentiments. Manohar and Kulkarni (2018) proposed a natural language 
processing and corpus-based approach to figure out sarcasm on Twitter. Others such as Lunando 
and Purwarianti (2013) used machine learning algorithms in their classification by proposing the 
number of interjection words features in the determination of sarcasm and negative information. 
(Kumar et al., 2020), claim that using a multi-head Attention based  long Short term Memory 
(LSTM) could yield better results than SVM. However they failed to perform a comparative 
analysis using LSTM in other to make a general claim. According to (Son et al., 2019), 
Bidirectional LSTM with convolutional will produce a better results, however their experiment 
seemed more tedious and time consuming  and could do no better than multi-head attention based 
LSTM. 
In this study therefore, we seek to use LSTM to introduce a framework to determine whether a 
sentiment extracted from an opinion poll is sarcastic or not. The study will make use of the Twitter 
dataset on a given opinion poll and apply a text mining algorithm to perform the sentiment analysis.  
3 
 
University of Ghana http://ugspace.ug.edu.gh
1.2. Problem Statement 
Previous studies have shown that a sentiment or an opinion classified during sentiment 
classification may be a sarcastic sentiment and not the connotation of the word. But the question 
is, how can one be able to tell whether a sentiment is indeed positive and not just a sarcastic 
statement? 
1.3.  Aim 
 This study, therefore, seeks to classify sarcastic and non-sarcastic sentiments. 
1.4. Objectives 
The study seeks: 
1. To conduct a systematic literature review of sarcasm in sentiment analysis 
2. To classify a sentiment into sarcastic or non-sarcastic.  
3. To investigate the impact of sarcastic sentiment in an opinion poll. 
1.5. Scope 
The focus of this study covers sarcastic sentiment analysis using datasets on opinion polls. The 
Twitter dataset covers issues on driverless car dataset and an extensive twitter dataset. 
1.6. Research Contribution 
The major contribution of this study are: 
1. Theoretical and empirical prove of the efficiency of LSTM 
2. Application of X-means clustering and expert judgement exclusively on sarcastic data labelling  
3. Develop a framework that classifies sentiments into sarcastic or non-sarcastic.  
4 
 
University of Ghana http://ugspace.ug.edu.gh
1.7. Conclusion 
This study focuses on developing a framework that will help in the identification of sarcastic 
sentiment in an opinion poll when undertaking sentiment analyses. This will assist to investigate 
the significant impact of the sarcastic statements in opinion polls. Twitter datasets have been used 
for this study. 
1.8. Organization of Study and Research Plan 
Chapter 1 presents fundamental information on the background and motivation for the study, 
problem statement, scope of our study, research objectives and contribution. Chapter 2 presents a 
summary of related works pertaining to sarcasm in sentiment analysis. The chapter shows clearly 
the clarity of differences between previous and current studies. Chapter 3 does an empirical 
analysis. A detailed description of the datasets and the preprocessing techniques are discussed. 
Moreover, a theoretical foundation has established a solid foundation to address the sarcasm 
detection problem using LSTM. Finally, the experiment setup and performance evaluation 
measures are discussed. Chapter 4 provides the experimental results and discussion from our 
empirical analysis using the defined datasets. Evaluation measures using confusion matrix 
including precision, recall and f1-score are used to measure the performance of our results. Finally, 
chapter 5 gives a summary of this thesis preceded by a concluding remark on the output of our 
analysis. The threats that can possibly affect the validity of our results are discussed and they 
include external validity claims, internal validity claims, construct validity claims and conclusion 
validity. Finally, future directions arising from this thesis are discussed 
 
 
5 
 
University of Ghana http://ugspace.ug.edu.gh
CHAPTER 2 
 SYSTEMATIC LITERATURE REVIEW 
The chapter presents a set of preliminaries to understand this dissertation. The chapter also 
presents a review of related works and a systematic literature review concerning sarcasm in 
sentiment analysis. 
2.1. Introduction 
A lot of works have been undertaken in the area of sentiment analysis as a whole taking into 
consideration different approaches, classification techniques, and feature selection of which most 
are going to be discussed.  
2.2. Related Works on Sentiment Analysis 
In their study, Khasawneh et al. (2013), performed a comparison between two free online sentiment 
analysis tools: Social Mention and SentiStrength which support Arabic. The subsequent phases 
described the methodology that proceeded during this study: Build an outsized dataset of Arabic 
opinions with their emoticons, which are classified manually into five main categories: sports, 
economics, health, education, and news. Build twelve dictionaries; ten of them should be for the 
positive and the other should be negative for the categories. Moreover, there should be one 
dictionary for positive emoticons and the other for negative emotions. Run the set of Arabic 
sentiment analysis tools using the dataset to detect the polarity for every collected Arabic opinion 
within the dataset and compare these tools, and evaluate them under several considerations. 
Support Vector Machine (SVM) and Naïve Bayes (NB) were used as classifiers. No provision was 
made for sarcasm detection in their study and hence its potential to help in better judgement 
becomes limited. Alayba et al. (2020), presented an Arabic dataset, which talks about opinions on 
6 
 
University of Ghana http://ugspace.ug.edu.gh
our health services and has been collected from Twitter. During this experiment, a mixture of 
“Unigram” and “Bigram” techniques were used for text feature selection. TFIDF (Term Frequency 
and Inverse Document Frequency) and every feature within the corpus and the maximum one 
thousand weighted features were fed to their machine learning algorithm.TF-IDF has been 
considered in our implementation. There are three machine learning algorithms that were used are: 
Logistic Regression (LR), Support Vector Machines and Naïve Bayes, to seek out the foremost 
influential parameter in getting good results (Wp et al., 2017). The detailed process involved the 
retrieving of data in the sort of a tweet using Application Program Interface (API) from twitter 
then data is stored in CSV format and then data testing or training of data is executed by 
preprocessing. The method of preprocessing entails the deletion of URL, checking of punctuations, 
deletion of stop words, a change of the word slang to raw, and then stemming. The implemented 
feature extraction process on a tweet was a result of the clean results of preprocessing. The feature 
extraction process includes word grouping with the Bag of Words method and feature weighting 
with Term Frequency and Inverse Document Frequency (TF-IDF). The classification method 
produced a recall value, precision and accuracy of the sentiments. In his studies, Saeed (2018), 
proposed a Deep Neural Network architectures that classify the overlying sentiments with high 
accuracy. More so, the author showed that the proposed classification framework did not require 
any laborious text pre-processing and was capable of handling text pre-processing (E.g. stop word 
removal and feature engineering). The traditional text classification pipeline included text pre-
processing, learning model, text encoding, and evaluation. The primary and also the baseline 
architecture in the study was Convolutional Neural Network (CNN) with 1D convolutions. After 
then four other architectures were studied, in which two were based on CNNs, one on Long Short 
Term Memory (LSTM) and the other on Gated Recurrent Unit (GRU). It is noted that the deep 
7 
 
University of Ghana http://ugspace.ug.edu.gh
learning methodology which requires no pre-processing due to its inherent feature engineering 
capabilities was actually the one which was proposed. Therefore based on their empirical results 
and with the consideration of overlapping text classification, they recommended there was not a 
need to spend excessive amount of time in data pre-processing. Stop words, punctuation, etc. 
proved to be an important constituent of data when it involves the training of the models. Every 
DNN model shows improvement in model performance measurement metrics when trained using 
unprocessed data. Munandar et al. (2018), investigated the general public opinion and sentiment 
from Indonesian twitter toward the utilization problem. The method included dataset description, 
multichannel-convolution neural network (CNN) model, word embedding, and Stop word and 
model experimentation. Since they achieved higher results in accuracy, the proposed CNN as the 
best as compared to SVM. The proposed methodology of Harvinder et al. (2015) utilizes only two 
features to detect sentiments, features like ngrams, bigrams, twitter specific features and semantic 
features. In his study, Adri (2016), aimed to reach more reliable public opinion measurements. 
They however made a claim after their experiment that Twitter data are not consistent especially 
when inference method is applied on them. It is however the case that most sentiment analyses 
research utilizes Twitter data. 
Troussas et al. (2016), evaluated the basic ensemble methods that could be utilized for effective 
sentiment analysis. Their experiment tries to increase the efficiency of machine learning 
algorithms and they prove that they can perform way better that the traditional algorithms.  
The work of Paredes-valverde et al. (2017), proposed a deep-learning-based approach that permits 
companies and organizations to detect opportunities for improving the quality of their products or 
services through sentiment analysis. Convolutional neural networks (CNN) and word2vec were 
8 
 
University of Ghana http://ugspace.ug.edu.gh
used. Eighty-eight point seven percent (88.7 %) precision from 100,000 tweets from experiments 
was piloted with different sizes of twitter and an f-measure of 88.7%. 
It should however be noted that, none of the reviewed papers under this section extensively 
considered sarcasm detection in the analysis of sentiments. 
Figure 1 depicts the different sentiment analysis approaches. However, Akhoundzade & Devin 
(2019), and Sanagar & Gupta (2020), proposed  an unsupervised learning approach in undertaking 
sentiment analysis. 
 
Fig 1. Sentiment Analysis Approaches (Serrano-Guerrero, Olivas, Romero, & Herrera-Viedma, 
2015) 
9 
 
University of Ghana http://ugspace.ug.edu.gh
2.3. Reviews on Sarcasm in Sentiment Analysis  
In Bouazizi et al. (2015), the researchers came up with a method that makes a minimal set of 
features but then efficiently classifies tweets regardless of the topic. The study analyzed the 
relevance of determining sarcastic tweets automatically, signifying the correctness of sentiment 
analysis which can be enhanced due to the knowledge of sarcastic and non-sarcastic sentiment. 
Their study however failed to make provision for unsupervised dataset. Bharti et al. (2015), use 
two approaches to detect sarcasm in a text. These are the parsing –based lexicon generation 
algorithm with the occurrence of the interjection word. The approaches were compared with the 
existing state-of-the-art approach to detect sarcasm. (Lunado and Purwariati,   2013), proposed 
two features to detect sarcasm after a sentiment analysis is done. The features are the negative 
information and the number of interjection words. The sentiwordnet was used in the classification 
of sentiment. Bhan et al. (2018), likewise proposed a system that measures sarcasm using tweets 
from twitter. Different algorithms were proposed to detect the effect of sarcasm on texts and 
generate a score. Different features are generated from the received tweets which helped to 
generate a score. The study also provides a separate portal to check the score of the sentence 
entered by the user and determine the score. Prasal et al. (2017), undertakes a comparison of 
various classification algorithms to detect sarcasm in tweets from the twitter streaming API. The 
basset classifier is chosen and paired with various pre-processing and filtering techniques using 
emoji and slang dictionary mapping to produce the best accuracy. Most of the reviewed papers 
under this section however failed to make provision for unsupervised dataset or dataset that comes 
unlabeled. Others still resort to traditional and machine learning algorithms. 
According to Magumater et al. (2019), the authors hold the view that knowledge in sarcasm 
detection can be relevant to sentiment classification and vice versa. The paper shows that the two 
10 
 
University of Ghana http://ugspace.ug.edu.gh
tasks are correlated, and present a multi-task learning-based framework using a deep neural 
network that models the correlation in other to improve the performance of both tasks in a multitask 
learning setting. Razali et al. (2018), Studies the trends of sarcasm detection and their proposed 
techniques. It focused on the detection of sarcasm and the argument that more than text is needed 
to properly detect sarcasm. The authors, Dharmavarapu and Bayana (2019) employed Naïve Bayes 
classification and AdaBoost algorithms to detect sarcasm on Twitter. Whiles the AdaBoost 
algorithm was used in making weak to a strong statement by iteratively considering the subject of 
training data, Naïve Bayes classifies tweets into sarcastic and non-sarcastic. In the study conducted 
by Khullar and Singh (2019), bagged gradient boosting is proposed with particle swarm 
optimization as feature selection. It is compared with other classifiers such as random forest, 
gradient boosting, and bagged gradient boosting. After the emoji and acronym dictionary mapping 
is done, part of speech (POS) labeling is introduced. Hashtags and stop words are recognized and 
removed. Particle swarm optimization is employed in the removal of noisy data. Moreover, the 
authors, Losada and Benito (2018) did further research to refine sentiment tools to enhance their 
sensitivity and capability and also cause an optimization with sophisticated sarcasm detection. In 
the research undertaken by Bharti et al. (2017), the authors also proposed six algorithms to examine 
the sarcasm in tweets of twitter. The experiment outputs were compared with some of the existing 
state-of-the-art. Porwal et al. (2019) uses recurrent neural network (RNN) model for sarcasm 
detection since it automatically extracts features required for machine learning approaches and 
also uses long short term memory cells on tensor flow to capture syntactic and semantic 
information over twitter tweets to detect sarcasm. In Rendalkar et al. (2018b), the aim was to 
develop a system that groups posts based on emotions, sentiment and figure out sarcastic posts if 
it exists. They proposed to develop a prototype that will aid in coming to an inference about the 
11 
 
University of Ghana http://ugspace.ug.edu.gh
emotion of posts. Moreso, Parmar et.al (2018), used a Hadoop based framework that utilized live 
tweets, processes it, and uses a hybrid algorithm that determines sarcastic sentiment efficiently. It 
is noted that the hybrid approach used lexical and hyperbole feature to improve the performance 
of the system by increasing accuracy, precision and F-score. Raghav et al. (2018), likewise talked 
about the approaches, features, datasets and issues associated with sarcasm detection. Chaudhari 
(2018), enumerated approaches, issues, challenges and future scopes in sarcasm detection. Hiai et 
al. (2016), uses three stages; judgment process based on rules for 8 classes, boosting rules and 
rejection rules in analyzing and classifying sentences into 8 classes by focusing on evaluation 
expressions and then generate classification rules for every class to extract sarcastic statements. 
To reiterate, most of the reviewed papers under this section tactlessly failed to make establishment 
for unsupervised dataset or dataset that comes unlabeled. Some current researchers still resorted 
to traditional and machine learning algorithms. A few work, has been done on determining sarcasm 
in sentiment analysis using different classification models, methodology, feature selection 
algorithms and datasets. 
2.4. A Systematic review of sarcasm in sentiment analysis 
A protocol was followed to undertake a systematic review of sarcasm in sentiment analysis within the 
preceding few years by clearly defining the inclusion and exclusion criteria used and the threat to its 
validity. 
2.4.1. Research Methodology 
The systematic literature review method provides a means to categorizing, discovering and probing 
the present research linked to any questions of concentration and research areas. 
12 
 
University of Ghana http://ugspace.ug.edu.gh
2.4.2. Research Problem 
In most situations, a sentiment or an opinion classified during sentiment classification may be a 
sarcastic sentiment and not the exact connotation of the word. But the question is, how can one be 
able to tell whether a sentiment is indeed positive and not just an ironic statement? This research 
seeks to pinpoint various sarcasm detection methods and approaches in sentiment analysis 
particularly the classification models used, challenges encountered, techniques used and among 
others. 
2.4.3. Research Questions 
 
 What are the classification techniques that could be employed in undertaking sentiment 
analysis? 
Motivation:  
The motivation here is to find out the popular classification technique considered especially 
when undertaking sarcasm in sentiment analysis. We would want to know the different 
classifiers that are considered by researchers and the widely used technique. For instance, 
the paper produced by Prasad et al. (2017), help us identify different classifiers and then 
proposed the simple and widely used technique. 
 What are the feature selection techniques that could be used? 
Motivation: 
To help us know some feature selection approaches considered when undertaking 
sentiment analysis. There may be the non-textual and textual feature selection approach. 
This would also help us know the set of features in a preprocessed text such as Unigram. 
Consequently, we will get to know some proposed set of features. 
 What type of dataset or what dataset could be used? 
13 
 
University of Ghana http://ugspace.ug.edu.gh
Motivation: 
Different datasets are considered when undertaking sentiment analysis. What are these 
datasets? It may be a set of publicly available tweets or obtained private tweets that are 
classified and manually annotated by humans. The dataset also would pertain to a particular 
topic which we would want to know. 
 What are the challenges that could be encountered when undertaking sarcasm in 
sentiment analysis? 
Motivation: 
Bharti et al. (2015b) indicated that sarcasm detection is a very challenging task. We would 
want to find out the challenges that make sarcasm in sentiment analyses very tedious. 
 What performance evaluation could be obtained? 
Motivation: 
We would wish to ascertain the prediction performance attained using diverse evaluation 
measures? During the scrutiny of the prediction performance in text mining, we might have 
four possible outputs namely true positives, true negatives, false positives and false 
negatives. Computation of these might be based on precision, recall, f-score and accuracy. 
 
2.4.4. Research Boundaries 
 In this systematic review, the authors consider how sarcasm is detected when conducting 
sentiment analysis. The population group that will be observed consists, therefore, of those 
publications which consider sarcasm in sentiment analysis during an opinion poll.  
14 
 
University of Ghana http://ugspace.ug.edu.gh
2.4.5.  Review Method 
The review method is based on the research protocol and it is in this section that the search strategy, 
the sources, the studies selection, and the selection execution are defined. The objective of this 
section is to provide the sources in which searches for primary studies will be executed. All 
references used in this research were included for analysis based on the following criteria: 
1.  All publications are between 2015 and 2020. 
2.  Journal Publications, conferences, magazines and books. 
3.  Papers with most of the keywords in the title. 
4.  Publications whose title contains sarcasm and sentiment analysis. 
5.  Publications whose abstract provide much enlightenment on sarcasm in sentiment analysis. 
 
2.4.6. Classification of Papers 
Papers used for this research are classified according to the publishers that published them. In this 
review, the following publishers were used to search for the research material; 
1. IEEE Xplore  
2. The ACM digital library 
3. Science Direct  
4. Scopus 
5. Elsevier  
6. Oxford Academic 
Table 1 presents the distribution of the search results from each of the aforementioned publisher’s 
site. 
 
15 
 
University of Ghana http://ugspace.ug.edu.gh
Table 1  Distribution of search results obtained from different Publisher’s sites. 
Final results after all 
Database search results of 
Publishers exclusion mechanism 
papers obtained 
applied 
IEEE Xplore 47 7 
ACM Digital library 2 2 
Science Direct 263 1 
Scopus 40 6 
Elsevier 0 0 
Oxford Academic 20 0 
Total 368 16 
2.4.7. Research Process 
The databases chosen in this research study included publisher’s sites which consist of published 
studies from their database. The search strings that were used in the databases were on the bases 
of keywords and when least results are obtained, some alternative words in research questions. 
Some keywords were concatenated to form a search string. All the selected databases were 
searched using the search string; Sarcasm in sentiment analyses 
2.4.8 Inclusion Criteria 
The studies considered are those that focus on sarcasm in sentiment analyses. The selected articles 
must be available in English and must be full-text articles. These articles are expected to be 
conferences, journals, magazines or book articles. These studies have more weight that gives 
empirical evaluations. The intention was not about rating any work but to ascertain the importance 
16 
 
University of Ghana http://ugspace.ug.edu.gh
of the work according to the domain proposed. Selected studies to be searched are papers from 
IEEE, ACM digital library, Science Direct, Elsevier, Scopus and Oxford Academic. 
2.4.9 Exclusion Criteria 
The exclusion criteria consisted of the elimination of the duplicate results of articles. Further 
exclusions of studies that did not give many details about sarcasm in sentiment analysis have been 
made. 
2.4.10 Studies Selection 
Once the sources have been defined it is necessary to describe the process and the criteria for 
studies selection and evaluation to reduce the likelihood of bias. Selection criteria should be 
decided during the protocol definition. The inclusion and exclusion criteria are based on the 
research questions. The researchers have therefore established that the studies must present new 
initiatives (from a maximum of approximately 5 years ago) which consider all kinds of discussions 
about sarcasm in sentiment analysis. The keywords are sentiment analysis, sarcasm, classification 
techniques, feature selection and an opinion poll. So the phrase, “Sarcasm in sentiment analysis” 
was used for the search in all the publisher’s site. 
In IEEE, conference papers and journals were considered and 47 of such papers showed up in the 
search. A resolution was consequently made to consider both the journal and conference papers 
together with a magazine. The further exclusion was done using the formatting conditions in 
Microsoft spreadsheet using the keywords “sarcasm” and “sentiment analysis”. More so, the 
exclusion was considered by going through the abstracts of each of these articles to understand 
whether there were needed information as far as the topic and the keyword was concerned. 7 papers 
were obtained from the search. From the search using the keywords sarcasm and sentiment analysis 
in the ACM Digital Library database, only 2 conference papers published from 2015 to 2020 were 
17 
 
University of Ghana http://ugspace.ug.edu.gh
obtained from the search. No search result was obtained from Elsevier. 40 articles were obtained 
from the search in the Scopus database. Excel conditional formatting was conducted taking into 
consideration highlight cell rules with texts that contain both sarcasm and sentiment analysis. The 
authors retrieved five (5) publications from this exercise. But the paper “Hybrid method for 
sarcasm target identification to assist the sentiment analysis systems” could not be accessed for 
free and hence excluded. In the oxford academic database, 20 papers were retrieved from the 
search all other things being equal. However, advanced search by observation with much emphasis 
on the keywords; sarcasm and sentiment analyses did not fetch us any results. After 263 results 
being obtained from Science Direct only one article was considered after further excluding with 
emphasis on article titles consisting of both sarcasm and sentiment analysis. This review got rid of 
duplicate papers from different databases. All downloaded ACM digital library papers were also 
found in IEEE and hence discarded. Twelve different papers from all other databases were 
considered for the studies. Figure 2 depicts the statistics on the papers from the different databases 
primarily considered before and after the inclusion and extraction criteria application whilst Figure 
3 shows the percentage statistics of papers obtained after the application of our inclusion and 
exclusion criteria. Figure 4 also shows the selection process used to derive our papers and lastly, 
Figure 5 is a graph showing the results of the scores of research questions answered. 
18 
 
University of Ghana http://ugspace.ug.edu.gh
300
250
200
150
100
50
0
IEEE ACM Digital library Science Direct Scopus Oxford Academic Elsevier books
Number of overall  database search results  of papers
Final results of obtained papers after inclusion and exclusion mechanism applied
 
Fig  2. Database search results of overall papers obtained after the application of inclusion and 
exclusion criteria 
 
IEEE
38%
44% ACM Digital library
Science Direct
Scopus
6%
12%
 
Fig 3. Statistics on papers obtained after inclusion and exclusion criteria applied. 
19 
 
University of Ghana http://ugspace.ug.edu.gh
 
Fig 4. Chronology of the selection process (SLR protocol) 
 
6
5
4
3
2
1
0
Ø  What are the Ø  What are the Ø  What type of Ø  What are the Ø  What performance
classification challenges that could dataset or what feature selection evaluation could be
techniques that could be encountered dataset could be techniques that could obtained?
be employed in when undertaking used? be used?
undertaking sarcasm in sentiment
sentiment analysis? analysis?
 scores of papers that answered Research Questions
 
Fig 5. Scores of papers that were considered for research questions 
20 
 
University of Ghana http://ugspace.ug.edu.gh
2.4.11. Results and Discussion 
RQ1: What are the classification techniques that can be employed in undertaking sentiment 
analysis? 
Four (4) papers namely [LT1, LT11, LT16, LT14] see Table 3 in the appendix, were considered 
for this question. In the research conducted by Bouazizi et al. (2015), they performed the 
classification using Naive Bayes, Support Vector Machine (SVM), and Maximum Entropy 
classifiers [LT1]. In the paper produced by Prasad et al. (2017), the sarcasm detection in the 
proposed model is done using classifiers such as Decision Tree, Random Forest, Gradient 
Boosting, Adaptive Boosting, Logistic Regression and Gaussian Naive Bayes. It was concluded 
that the Decision Tree Classifier is a simple and widely used classification technique [LT11]. 
Magumater et al. ( 2019), applied the ultimate softmax classification [LT16]. In their study 
Dharmavarapu and Bayana (2019), extensively utilized AdaBoost and Naïve Bayes 
classification[LT14]. 
RQ2: What sort of dataset or what dataset could be used? 
Five (5) papers namely [LT1, LT2, LT11, LT16, LT15] see Table 3 within the appendix, were 
considered for this question. Bouazizi et al. (2015), collected a group of publicly available tweets 
“classifiable” by humans and manually annotated them into “positive” and “negative”. Tweets are 
selected to belong to one(1) of the ensuing topics: politics, phone reviews, sports, movie reviews, 
and electronic products [LT1]. Bharti et al. (2015a), did Database Collection for training 50,000 
tweets which were obtained with the sarcasm hashtag (#sarcasm) from Twitter with keyword love, 
amazing, good, hate, sad, happy, bad, hurt, awesome, excited, nice, great, sick, etc. For testing, 
Tweets are collected in two categories (i) tweets with sarcasm hashtag and (ii) tweets without a 
hashtag [LT2]. In the study by Prasad et al. (2017), the dataset contains a collection of 2000 tweets 
21 
 
University of Ghana http://ugspace.ug.edu.gh
that have class labels of 1 or 0, where 1 means sarcastic and 0 means non-sarcastic. The dataset 
taken is that of about 2000 pre-classified tweets. The dataset contains two columns, Tweet and 
Label. The Tweet column contains the tweet, and the Label contains a binary label indicating 
whether the tweet is sarcastic or not [LT11]. According to Magumeter et al. (2019), their dataset16 
consisted of 994 samples, each sample containing a text snippet labeled with sarcasm tag, 
sentiment tag, and eye movement data of seven readers. The authors ignored the eye-movement 
data in the experiments. Of those samples, 383 are positive and 350 are sarcastic [LT16]. In the 
paper produced by Suhaimin et al. (2019), they took into consideration a dataset which is manually 
arranged, the tweets in the dataset are physically named sarcastic and non-sarcastic dependent on 
human instinct which has organized an exact dataset for preparing. The physically ordered dataset 
is one of the presentation in this paper. The dataset contains an accumulation of 1000 tweets. The 
dataset taken is that of around 1000 pre-characterized tweets [LT15]. 
RQ3: What feature selection is being used? 
Two (2) papers namely [LT5, LT7] see Table 3 within the appendix, were considered for this 
question. Different papers use different feature selection approaches. In the feature selection of the 
paper produced by Parwal et al. (2019), two types of features were extracted: 
1) Non-textual features: From the “raw” tweets they first extracted 6 features by counting the 
number of positive and negative Hashtags, that of positive and negative Emoticons, and that of 
positive and negative slang words. 
2) Textual features: After extraction of “non-textual features”. There are several features taken 
from the pre-processed text: Unigram, Negativity, number of interjection words [LT5]. To measure 
sarcasm accurately, Bhan et al. (2018), proposed a set of features namely: Ngrams, Sentiments, 
Topics, Pos-tag, and Capitalization. This system uses sentiwordnet Dictionary to assign negative 
22 
 
University of Ghana http://ugspace.ug.edu.gh
and positive scores to each word and store it using its POS-ID. Using the above features, they 
trained their topic modeler using all tweets, then it generated the features for all tweets and then 
trained a classifier using these features [LT7].  
RQ4: What are the challenges that could be encountered when undertaking sarcasm in 
sentiment analysis? 
One paper which is [LT8] see Table 3 within the appendix, answered this question. The study 
conducted by Khullar and Singh (2019), identified these challenges with sarcasm detection:  
a) it would be utilized indirectly, also the authors might have the type of incongruity which makes 
it hectic and tedious to comprehend the sentiments. 
b) The snide tweets communicate negative guesstimate utilizing positive words. In this way the 
classifier would erroneously dispense sentiments to these tweets 
c) There is a wide usage of slang words, abbreviations, smileys, special symbols, and unstructured 
data which makes it quite tedious to identify sentiments [LT8]. 
RQ5: What performance evaluation could be obtained? 
Three (3) papers namely [LT1, LT4, LT16] see Table 3 within the appendix, were considered for 
this question. Bouazizi et al. (2015), compared their proposed method to the baseline one presented 
by the n-grams model. They evaluated the two methods using one Key Performance Indicator 
(KPI) which is the accuracy. The results showed that their approach outperforms the baseline one. 
They obtained an accuracy that exceeded 80% using the 3 algorithms. However, SVM accuracy is 
better than that of Naive Bayes and Maximum Entropy [LT1]. In the study conducted by Bharti et 
al. (2015a), the first approach attains a 0.89, 0.81 and 0.84 precision, recall and f − score 
respectively. The second approach attains 0.85, 0.96 and 0.90 precision, recall and f – score 
23 
 
University of Ghana http://ugspace.ug.edu.gh
respectively in tweets with the sarcastic hashtag [LT4]. Magumeter et al. (2019), stated that their 
method outperformed the state-of-the-art by 3–4% in the benchmark dataset [LT16]. 
2.4.12. Threats to Validity 
Since the researchers considered only a few databases with few papers being obtained and 
considered for the research questions, there is a possibility of a narrow study of research. Papers 
that were not written in English were not considered which implies the authors will miss key 
information in articles written in different languages. Inevitably, there were biases since some 
articles were not selected because their abstracts and conclusions were not conveying our 
expectations. Since the motivations behind the research questions are subjective, there are 
possibilities of data extraction inaccuracy and data synthesis biases. Only a few individuals carried 
out this research, the possibility of limitations in the research is probable because the knowledge 
domain in the subject matter may not be as broad as expected. 
2.4.13 Conclusion 
Research in sentiment analysis affirms how tedious the determination of sarcasm in sentiment is. 
Quite a few works have been done on this topic using different techniques, classifiers and 
methodologies. The trends have been studied and basic questions asked are answered taking into 
consideration some selected articles from different databases. The authors looked at the classifiers 
used, challenges encountered, the datasets used, their performance evaluation and the feature 
selection used. The study shows that 25% of the reviewed papers produce details on the 
classification techniques being used, 31.25% delivers more details on the datasets and 
preprocessing techniques, 12.25% offers detailed information on their feature selection, and 6.25% 
throw light on the performance evaluation and the different challenges encountered in undertaking 
sarcasm detection. Our results indicate that much has not been done in the area of sarcasm in 
24 
 
University of Ghana http://ugspace.ug.edu.gh
sentiment analysis. Therefore more research on determining sarcastic sentiment in sentiment 
analysis can be considered using different techniques, classifiers, tools and methodologies. 
Table 3 in the appendix A, presents a selected article using a systematic literature review (SLR) 
 
 
  
25 
 
University of Ghana http://ugspace.ug.edu.gh
CHAPTER 3 
METHODOLOGY 
3.1. Introduction 
Both qualitative and quantitative research approach is adopted. The data being used is secondary 
data obtained from an online repository and on twitter since raw data collection is labor-intensive 
and time-consuming. The interpretation of the values derived is seen as a major scientific proof of 
how the phenomenon works. This study is a design science research in that it aims at creating 
artifacts that provide services to serve a human purpose. As such, a system is designed and 
programmed to help solve a challenge. We go through the design cycle by meeting all basic 
requirements of problem identification and stakeholder consideration which forms the 
environment, a design and implementation of a model and its evaluation, and a consideration of 
the knowledge base which involves scientific theories and methods (Hevner, 2007). 
3.2. Description of Datasets 
Two different datasets were considered for our study. They included the driverless car dataset 
and the extensive twitter dataset which are all described in detail. 
3.2.1.  Driverless Car dataset: 
Dataset used is driverless car data which talks about the emergence of driverless cars being 
powered by Google. The dataset is about 700 tweets obtained through twitter's social media 
platform. Since the tweets originally did not come with any labels such as sarcastic (1) or non-
sarcastic (0), we utilized X-means clustering algorithm (due to its repetitive nature to form partition 
to achieve the Bayesian Information Criterion) with expert judgment in our quest to generate labels 
for our training; and therefore, we were able to represent zero (0) as a label for non-sarcastic tweets 
26 
 
University of Ghana http://ugspace.ug.edu.gh
and one (1) as a label for sarcastic tweets. The dataset was retrieved online via the UC Irvine (UCI) 
Machine learning repository in 2019. In applying the X-means clustering, different clusters were 
achieved, after which stratification was achieved based on these clusters. Each cluster consisted of 
a group of chronologically positioned data. Human expert judgement based on intuition was 
applied after the clustering in other to confirm the results obtained based on the clustering. The 
driverless car dataset was purposefully considered because it enables easier demonstration of how 
to undertake the determination of sarcastic sentiments when handling or using unlabeled tweets 
with a moderate total size. 
3.2.2. Extensive Twitter Dataset 
A broad-spectrum twitter dataset consisting of both sarcastic and non-sarcastic tweets with their 
labels; 0 for non-sarcastic and 1 for sarcastic was obtained online via the UCI machine learning 
repository in January 2020. The dataset consisted of 31,962 labeled tweets for testing.  
3.3. Experimental language Used 
We considered python for this project because it is stable, flexible, and has all the necessary tools 
and packages needed for the implementation of this project. Python as a language enables 
implementation from development to deployment and maintenance. It is simple and consistent, 
there is accessibility to enough and powerful libraries for AI particularly for machine learning and 
deep learning. More so, it is platform-independent and has a wide community of users which 
makes support and assistance easier and hence makes python popular. 
It has concise and readable codes, enables the development of reliable systems, easier to learn, has 
easier to build models for machine learning, more intuitive, and facilitates the implementation of 
different functionalities and recommendable for collaborative implementation when multiple 
27 
 
University of Ghana http://ugspace.ug.edu.gh
developers are involved. It is a general-purpose language that implements complex machine 
learning, deep learning tasks, and enables one to build prototypes quickly and faster. 
These and other reasons make python preferable for this study to other programming languages 
such as R, Scala, Julia, and Java  
3.4. Deep Learning  
A deep neural network is a feedforward artificial neural network architecture and is often a single 
hidden layer with their respective interconnection of neurons. Deep learning models have created 
enhanced prediction accuracy in a range of earlier studies (Mensah, Keung, Bennin, & Bosu, 2016; 
LeCun, Bengio, & Hinton, 2015; LeCun, Bengio, & Hinton, 2015; LeCun, Bengio, & Hinton, 
2015; LeCun, Bengio, & Hinton, 2015; LeCun, Bengio, & Hinton, 2015; LeCun, Bengio, & 
Hinton, 2015)    
3.5. Deep Learning versus Machine Learning 
We learned from the research conducted by Srikanth (2017) that Sentiment analysis is one of the 
prominent areas researched in natural language processing. With the improvement in artificial 
intelligence, machine learning algorithms played a critical role in sentiment analysis applications 
after the age of conventional lexicon-based processing. Currently, nonetheless, Deep learning is 
the latest approach being used in the prediction of sentiments and research has been undertaken 
using the deep learning approach. 
The paper produced by Goularas (2019), presents a comparison of assorted deep learning 
approaches used for conducting sentiment analysis using Twitter data. According to their research, 
two main categories of neural networks are used, convolutional neural networks (CNN) and the 
recurrent neural networks (RNN). The convolution neural network is performant especially in the 
area of image processing. On the other hand, recurrent neural networks are used in natural language 
28 
 
University of Ghana http://ugspace.ug.edu.gh
processing tasks. More so, the researchers evaluated and compared ensembles and combinations 
of convolutional neural networks and a category of RNN called the long short term memory 
(LSTM) networks. El-jawad & Hodhod (2017), in their paper also introduced a new hybrid system 
that uses text mining with neural networks for sentiment classification. They claimed that the 
hybrid learning approach was more efficient than a standard supervised learning approach like 
SVM and this conclusion arrived as a result of the 83.7% accuracy obtained. A deep learning 
approach entailing of the use of RNN with and word2Vec obtained an accuracy of 83%. Their 
study however failed to take into consideration sarcastic sentiments. Their work also did not 
consider combination of emotions and text for sentiment analysis. 
3.6. Recurrent neural network 
We learned from the paper by  (Srikanth, 2017)  that RNN employs a back propagation, unlike 
CNN which uses a feedforward network. RNN, therefore, processes sequential data with the aid 
of internal memory and it is built based on the principle that it is not always that human’s reason 
from the onset. RNN, therefore, makes use of a previous word to forecast and predict the next 
word in the context. Natural language processing text analysis, voice recognition, and language 
translation systems are amongst the few applications that employ RNN in its development. Baktha 
et al. (2017), employs the use of RNN in undertaking sentiment analysis. They analyzed the 
performance of the three RNNs which are vanilla RNNS, long short-Term memory and Gated 
Recurrent Units. The dataset used was pre-trained word vectors from the Google News and the 
Amazon health product review dataset was used to evaluate its performance. Figure 6 gives a 
diagrammatical illustration of RNN. 
29 
 
University of Ghana http://ugspace.ug.edu.gh
 
Fig 6.  Diagrammatical presentation of RNN (Srikanth, 2017) 
3.7. LSTM 
According to  Srikanth (2017), LSTM is an extension of RNN that can recall inputs over a long 
period. LSTM nevertheless has an advanced memory and not a simple internal memory. It can 
read, write, and delete data from its memory. It also provides an antidote to the challenges of RNN 
including vanishing gradient, and can decide on the information to keep or eradicate and the 
memory can be gated. The gates are input, forget, and output. Research by Ayata et al. (2017), 
shows that the application of LSTM in sentiment analysis produces better evaluation in 
performance than using machine learning approaches like SVM. Figure 7 gives a pictorial 
representation of LSTM with the gates. 
30 
 
University of Ghana http://ugspace.ug.edu.gh
 
Fig 7.  Diagrammatical presentation of LSTM with gates (Srikanth, 2017) 
According to (Kumar et al., 2020), their experiment results discovered that a multi-head attention 
mechanism improves the performance of BiLSTM, and it accomplishes better than feature-rich 
SVM models. However they failed to perform a comparative analysis using LSTM in other to 
make a common claim. 
From the study of (Son et al., 2019) using the training and test set accuracy, metrics were acquired 
to compare the proposed deep neural mode(Soft Attention-Based Bidirectional Long Short-Term 
Memory Model With Convolution Network) with convNet, LSTM, and bidirectional LSTM 
with/without attention. It was observed that the novel Soft Attention-Based Bidirectional Long 
Short-Term Memory Model with Convolution Network (sAtt-BLSTM) convNet model 
outperformed others with a superior sarcasm classification accuracy using the proposed datasets. 
Their paper reveals that their proposed model is obtained from LSTM with a connection of 
multiple hidden layers. This would consequently lead to higher processing cost and complexities 
as compared to LSTM. Considering their SemEval dataset, LSTM performs better than BLSTM 
as far as precision is concerned. The precision for LSTM was 86.78 whilst that of BLSTM was 
31 
 
University of Ghana http://ugspace.ug.edu.gh
81.61.This indicates that LSTM can perform better depending on the type of dataset. It can 
therefore be concluded that LSTM though cheaper could also perform efficiently and better. 
3.8. Sentiment Analysis with LSTM 
In this project, there is an implementation of a Long Short Term Memory that performs sentiment 
analysis and the detection of sarcastic comments. 
Using LSTM rather than a strict feedforward network is more accurate since we can include 
information about the sequence of words. 
There is the usage of a driverless car dataset, accompanied by sentiment labels: sarcastic or non-
sarcastic. 
3.9. Implementation Using LSTM 
 This section provides detailed and pragmatic discussion on the implementation of our 
framework by using LSTM. 
3.9.1. Loading in and visualizing the data 
The data is loaded through the opening of text files and reading in data as text from our directory. 
Data is read from an excel file and could be printed in shape and columns. This is possible in 
python with the help of libraries such as numpy and pandas. Libraries such as seaborn and 
matplotlib also help us with our data visualization to view the loaded data. This is more clear when 
newline characters from each record in a data frame are removed. The results of our visualized 
data presented a graph with labels of counts and sarcastic or non-sarcastic. 
32 
 
University of Ghana http://ugspace.ug.edu.gh
3.9.2. Data pre-processing 
Getting your data into the proper form to feed into the network is the first step when building a 
neural network model. Since we are using embedding layers, there is the need to encode each word 
with an integer. Cleaning up the data is necessary. 
The processing steps are: 
  Periods and extraneous punctuation must be get rid of. 
 Also, you might notice that the reviews are delimited with newline characters \n. split the 
text into each comment using \n as the delimiter, to deal with those. 
 Then a combination is necessary to get all the comments back together into one big string. 
To commence with, punctuations are removed. Then all the text without the newlines is obtained 
and split into individual words. This is possible with the help of string and punctuation libraries. 
3.9.2.1. Encoding the words 
The embedding lookup requires that we pass in integers to our network. The easiest way to do this 
is to create dictionaries that map the words in the vocabulary to integers. After then, there can be 
a conversion of each of the comments into integers so they can be passed into the network. We do 
this by encoding the words with integers and build dictionaries that map words to integers. The 
integers ought to start at 1 before its storage in a list so that input vectors can be padded with zeros. 
So by implementation, we make use of the collection and counter libraries and then build 
dictionaries that map words to integers before storage of the tokenized reviews. Text 
implementation of our dictionary produces 3,084 unique words for the driver’s car dataset and then 
47,521 for the broad-spectrum dataset. 
33 
 
University of Ghana http://ugspace.ug.edu.gh
3.9.2.2. Removing Outliers 
As an additional pre-processing step, the comments or reviews ought to be in good shape for 
standard processing. That is, the network will expect a standard input text size, hence, there is the 
need to shape the reviews or comments into a specific length. This task can be approached in two 
main steps: 
1. Getting rid of extremely long or short comments; the outliers 
2. Padding/truncating the remaining data so that we have reviews of the same length. 
Before padding the comment or review text, we should check for reviews or comments of 
extremely short or long lengths; outliers that may mess with our training. 
With the driverless car dataset, a zero-length review of 1 and a maximum review length of 30was 
obtained. On the other hand, a zero-length review of 1 and a maximum review length of 30 was 
obtained using the extensive twitter dataset. 
The challenge here is that we get a zero review length and the maximum review lengths become 
too many steps for our LSTM; hence, the removal of any super short review is expedient and that 
of super long review is also crucial. By so doing, our model can be trained efficiently as a result 
of the removal of the outliers. 
To implement this: 
1. remove any review with  zero-length from the review list 
2. Get indices reviews with the length being zero 
3. Remove the zero-length reviews and their labels. 
34 
 
University of Ghana http://ugspace.ug.edu.gh
3.9.2.3.  Padding sequences 
 In other to tackle with both short and very long comments and reviews, a step will be taken to pad 
or truncate our comments to a specific length (seq_length). For comments smaller than 
some seq_length, we decided to pad with zeros. Again, considering reviews longer 
than seq_length, we can cut them into the first seq_length words. A good seq_length, in this case, 
is 71 (Max length of a tweet) for driverless car dataset and 71 seq_length was also defined for the 
extensive twitter dataset. Maximum length of each tweet for both datasets do not exceed 71, and 
hence it is a justifiable value to be considered as a good sequence length to reduce processing time.  
Our  final features array should be a 2D array, with as many rows as there are comments or 
reviews, and as many columns as the specified seq_length. This is not trivial and there are a bunch 
of ways to do this. But, if you are going to be building your deep learning networks, you're going 
to have to get used to preparing your data. 
By implementation, our data should come from an integer array of reviews to be fed into our 
network. Each road ought to have the maximum length of tweets (seq_length), thus it should be 
seq_length elements long. 
If the review is shorter than the seq_length words, there should be a left pad of zeros. On the 
contrary, if the review length is longer than the seq_length, use only the first seq_length words as 
the feature vector. For instance if the seq_length is 12 and the review is [10, 112, 127,130], the 
resultant padded sequence will be [0, 0, 0, 0, 0, 0, 0, 0, 10, 112,127, 130]. Our features for both 
driverless car and the extensive twitter datasets could be saved as a CSV file by the aid of NumPy, 
asarray and savetxt libraries 
3.9.2.4. Training, Validation, Test 
With our data in nice shape, there is the need to divide it into training, validation, and test sets. 
35 
 
University of Ghana http://ugspace.ug.edu.gh
There is the creation of training, validation, and test sets. There is the need to create sets for the 
features and the labels, train_x and train_y, for example. Next is to define a split 
fraction, split_frac as the fraction of data to keep in the training set. Usually, this is set to 0.8 or 
0.9. Whatever data is left will be split in half to create the validation and testing data. The fraction  
0.8 or 0.9 are just experimental values that give us accurate results and model for our training 
without any problem such as overfitting unlike other values such as 0.3 ,0.4 or 0.5. 
3.9.2.5. DataLoaders and Batching 
Following the creation of training, test, and validation data, it becomes imperative to create Data 
Loaders for the data by following two steps: 
1. Create a known format for accessing our data, using the Tensor Dataset which takes in an 
input set of data and a target set of data with the same first dimension, and creates a dataset. 
2. Create DataLoaders and batch our training, validation, and test Tensor datasets. 
This is a substitute for creating a generator function which is for batching the data into full batches. 
The output is displayed as a tensor matrix. 
Libraries such as a torch, torch.utils.data, TensorDataset, Data Loader, learn, oversampling and 
SMOTE should be imported to create Tensor datasets, define data loaders batch size, ensure 
shuffling of training data and perform sampling. More so, length derived after sampling will be 
used to obtain a batch of training data. 
3.9.3. Sentiment Network with PyTorch 
This is where the network will be defined. 
The layers are as follows: 
36 
 
University of Ghana http://ugspace.ug.edu.gh
1. A layer that converts tokens into a particular embeddings  
2. hidden state size and number of layers that defines an LSTM layer 
3. The LSTM layer outputs to the desired output size that is mapped by a fully-connected 
layer of an output 
4. The return of only the last sigmoid output as the output of this network that is done by a 
sigmoid activation layer that converts the results to 0 and 1. 
3.9.4. The Embedding Layer 
It is recommendable to add an embedding layer because there are 74000+ words in our vocabulary. 
It creates a lot of inefficiencies to one-hot encode that numerous classes. So, instead of one-hot 
encoding, one can have an embedding layer and use that layer as a lookup table. You could train 
an embedding layer using Word2Vec, then load it here. But, it is fine to just make a new layer, 
using it for only dimensionality reduction, and let the network learn the weights. 
3.9.5. The LSTM Layer 
There is the creation of a Long Short Term Memory to be utilized in our recurrent network, this 
takes in an input size, a hidden dim, several layers, a dropout probability (for dropout between 
multiple layers), and a batch first parameter. 
Most of the time, the network will have a better performance with more layers; between two to 
three. Adding more layers allows the network to learn complex relationships. 
The creation of LSTM is implemented by first checking if Graphics processing Unit 
(GPU) is available and if it is then we do the training on GPU. Here we also define the LSTM 
model that will be used to execute sentiment analysis. The model is initialized by setting up the 
layers. This is where we employ an embedding and LSTM, dropout, linear and sigmoid layers. A 
37 
 
University of Ghana http://ugspace.ug.edu.gh
forward pass of our model on some input and hidden state is performed. This leads to defining a 
forward function that considers embedding and LSTM, stack up LSTM outputs, dropout and fully-
connected layer, sigmoid function, shaping of batch size, and a return of last sigmoid output and 
hidden state. The hidden states are initialized by creating two new tensors. 
3.9.6. Instantiating the Network 
Here, the network is instantiated. This is commenced by defining the hyperparameters and the 
model hyperparameters. Some terminology to understand here: 
 vocab_size: represents size of our vocabulary or the range of values for our input, word 
tokens. 
 output size: represents size of our desired output; the number of class scores we want to 
output  
 embedding dim: represents number of columns in the embedding lookup table; the size of 
our embedding. 
 hidden_dim: represents number of units in the hidden layers of our LSTM cells. Usually, 
larger is better performance-wise. Common values are 128, 256, 512, etc. These are just 
common experimental index of two values being considered to give us accurate results. 
 n_layers: represents number of LSTM layers in the network. Typically, it is between 1-3. 
This is one of the basic rules in choosing the number of hidden neurons. However, 2 layers 
are considered better Marco (2013). 
38 
 
University of Ghana http://ugspace.ug.edu.gh
3.9.7. Training 
It is expedient also to be using a new kind of cross-entropy loss, which is designed to work with a 
single sigmoid output. BCELoss, or Binary Cross-Entropy Loss, applies cross-entropy loss to a single 
value between 0 and 1. We also have some data and training hyperparameters defined as: 
  LR: represents learning rate for our optimizer. 
 epochs: represents number of times to iterate through the training dataset. 
 Clip: represents the maximum gradient value to clip at (to prevent exploding gradients). 
3.9.8. Testing 
The few ways for one to test the network are as follows: 
 Test data performance: we will see how our trained model performs on all of our defined 
test data above. We will calculate the average loss and accuracy over the test data. 
 Inference on user-generated data: Second, we will see if we can input just one example 
comment at a time and without a label, and see what the trained model predicts. The 
inference is looking at new, user input data like this, and predicting an output label. 
3.9.9. Trying out test 
Now that we have a trained model and a predict function, you can pass in any kind of text and this 
model will predict whether the text has a sarcastic or non-sarcastic sentiment. 
39 
 
University of Ghana http://ugspace.ug.edu.gh
3.10. Network Architecture 
This section presents a discussion on the network architecture used and how it was passed to the 
LSTM cells. 
3.10.1. Passing in words into an embedding layer. 
There was a necessity for an embedding layer since we were having lot of words, so therefore 
needed an efficient representation of our input data preferably to a single encoded vector. We can 
train an embedding with the Skip-gram Word2Vec model and use those embedding as input. 
Nevertheless, it is recommendable to have the embedding layer and let the network learn a different 
embedding table by their own. When this happens, we say the embedding layer is for 
dimensionality reduction, and not learning of semantic representations. 
The new embedding that was passed to LSTM cells following the input of words were passed to 
an embedding layer. LSTM cells add recurrent connections to the network and enable us in our 
ability to include information about the sequence of words in the data under consideration. 
Additionally, our LSTM results will go to a sigmoid output layer. The sigmoid function will be 
used because non-sarcastic and sarcastic = 0 and 1, respectively, and a sigmoid will output 
predicted sentiment values between 0-1. 
The emphasis is not on the sigmoid outputs except for the very last one; the rest can be ignored.  
Calculation of the loss will be done by comparing the output at the last time step and the training 
label (1 or 0). 
 
3.11.   Performance Evaluation   
 
40 
 
University of Ghana http://ugspace.ug.edu.gh
Table 2  Confusion Matrix for computing precision and recall  
 Predicted Positive Predicted Negative 
Actual Positive TP FN 
Actual Negative FP TN 
 
For each corpus of extracted sentiment from a given opinion poll, we respectively partition the 
corpus into q strata or subsets. We calculate the confusion matrix and assess the prediction 
performance by utilizing the precision, F1-Score and recall evaluation indicators. When probing 
into the prediction performance of text mining, we could have four possible outcomes - true 
positives, true negatives, false positives, and false negatives. For instance, in a classification task, 
when the text mining algorithm appropriately classifies the relevant documents for a given query, 
then the number of relevant documents is labelled to as the true positives (Jain, Agrawal, Goyal, 
& Aggrawal, 2018). Again, when trivial documents are wrongly classified as relevant documents 
after the search process, the number of such irrelevant documents then is stated to as false 
negatives. In the same way, if relevant documents are wrongly classified as trivial documents after 
the search process, then the number of such relevant documents is termed as false positives (Jain 
et al., 2018). When irrelevant documents are decorously classified as trivial documents after the 
search process, then the number of such irrelevant documents is referred to as true negatives (Jain 
et al., 2018). The Confusion matrix for computing the precision, F1-Score and recall measures are 
expressed in Table 2. 
Precision is defined as the percentage of the number true positives (correctly predicted positives) 
by the search process to the total number of misclassified relevant documents observation (false 
positives) plus the number of relevant documents (true positives) that was recovered by the same 
41 
 
University of Ghana http://ugspace.ug.edu.gh
search process. On the other hand, recall unlike the precision is the probability that a relevant 
document is recovered from the search process. That is, recall is the ratio of the correctly predicted 
positive observations to the total number of existing or actual relevant documents. F1 score is 
defined by (Jain et al., 2018), as the weighted average of the precision and accuracy. It considers 
both false positives and false negatives (Mensah, Keung, Svajlenko, Ebo, & Mi, 2018) 
. 
 𝑇𝑃Precision (P) =  (3.11.1) 
𝑇𝑃+𝐹𝑃
 𝑇𝑃Recall (R) =  (3.11.2) 
𝑇𝑃+𝐹𝑁
 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛(𝑃)∗𝑅𝑒𝑐𝑎𝑙𝑙(𝑅)F1 Score= 2 ∗   (3.11.3) 
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙
 
To be able to plot our confusion matrix, we import the intertools, NumPy and matplotlib library. 
It could be normalized or without normalization and the nearest interpolation can be shown with 
their labels being true or predicted. For stacked data for the confusion matrix of training and 
validation, we track loss. The other processes involve tracking loss, initializing hidden state and 
getting predictions, iterating over test data and stacking predictions and labels before calling a get 
confusion matrix with it associated data loader as a parameter. Data is trained for the confusion 
matrix and the class involves sarcastic and non-sarcastic statements. 
3.12.   Framework for Sarcasm Detection 
 
The pseudocode for framework section below contains pseudocode for our framework whilst 
Appendix F presents a general overview of Sarcasm detection operator flowchart. Our prediction 
42 
 
University of Ghana http://ugspace.ug.edu.gh
model employs the Long Short Term Memory learning algorithm. If the dataset being used is 
unlabeled, a CLUSTER+EXPERT JUDGEMENT operator helps us to do the labelling through 
Clustering and the application of human expert Judgement. The data is then trained and validated 
by the assistance of the Train & Validate operator and then finally, Classify & predict function is 
employed in the determination of sarcastic sentiments. 
So the three major operators which are considered are Cluster+Expert Judgement, Train & 
Validate and Classify & predict: 
3.12.1. Cluster+Expert Judgement (Clustexpert) 
 
Once the dataset obtained from the repository is unlabeled, we utilize the X-means clustering 
algorithm due to its repetitive nature to form partition to achieve the Bayesian Information 
Criterion. In applying the X-means clustering, different clusters are obtained, after which 
stratification is attained based on these clusters. Each cluster consists of a group of chronologically 
positioned data. The result of our clustering is noted and then an expert judgment that is based on 
human intuition and wisdom is applied manually for verification. These two steps would assist us 
in generating accurate labels for our training and validation. Consequently, there would be a 
representation of zero (0) as a label for non-sarcastic sentiment and one (1) as a label for the 
sarcastic sentiment. 
3.12.2. Train & Validate 
  
With our data in nice shape sets of features, we can proceed with training and validation for an 
unbiased evaluation using a supervised learning method. Following the creation of training and 
validation data, it becomes imperative to create Data Loaders to batch our dataset and apply it to 
a defined Sentiment Network. In our case, The LSTM layer employed converts the tokens into a 
43 
 
University of Ghana http://ugspace.ug.edu.gh
particular embedding using a defined hidden state size and number of layers. The layer outputs to 
the desired size which is mapped by a fully-connected layer of output and a sigmoid activation 
layer do the conversion. 
3.12.3. Classify & Predict New Instances 
You can pass in any kind of text and this model will predict through our prediction function as to 
whether the text has a sarcastic or non-sarcastic sentiment. This can be done by using our data 
performance or by inference on user-generated data. In our case, we calculated the average loss 
and confusion matrix indicators like accuracy over the test data to analyze the prediction values 
and classified instances of the sentiments as sarcastic or non-sarcastic. 
3.12.4. Pseudocode for Framework 
 
Procedure determineSarcasm (Dataset) 
1. Begin 
2. foreach Dataset do 
3.   if Dataset_labelled do 
4.    train_validate(Dataset) 
5.   Else applyClustExpert(Dataset) 
6.    if Dataset_labelling_succesful do 
7.     train_validate (Dataset) 
8.   foreach sentiment in dataset do 
9.    Classify_predict (sentiment) 
10.             //Determine whether the statement is sarcastic 
11. Ends 
 
3.13. Conclusion 
 
In conclusion, this section provides descriptive details of our methodology. A framework has been 
introduced that comprises of three operators namely Cluster+Expert Judgement, Train & Validate 
and Classify & predict; that facilitates sentiment classification. We considered LSTM for the 
implementation of our framework. The study of (Kumar et al., 2020) and (Son et al., 2019), 
employed multi-head Attention-based and Bidirectional LSTM respectively. However, their 
44 
 
University of Ghana http://ugspace.ug.edu.gh
experiment reveals that their methodology is expensive and much complicated. By inference from 
their study, it could be deduced that a lot of resources need to be utilized to ensure optimization. 
We therefore, employed a cheap, simple, efficient and straight forward approach by employing 
purely LSTM to produce a framework that helps in the determination of sarcastic statements when 
undertaking sentiment analysis. 
 
 
 
 
 
 
 
 
 
 
 
 
45 
 
University of Ghana http://ugspace.ug.edu.gh
CHAPTER 4 
RESULTS AND DISCUSSION 
4.1. Visualization 
The dataset is read from an excel file and displayed in data shape and columns. About seven 
hundred (700) driverless dataset and thirty-one thousand nine hundred and sixty-two (31,962) 
extensive tweets are displayed with their associated sarcastic or non-sarcastic label. The use of 
libraries such as seaborn and matplotlib helps us visualize label values as seen in Figure 8 and 9. 
 
Fig 8.  Visualization of sarcastic and non-sarcastic sentiments for the driverless dataset 
 
Fig 9.          Visualization of sarcastic and non-sarcastic sentiments for extensive twitter dataset 
46 
 
University of Ghana http://ugspace.ug.edu.gh
More so, results may be put in a data frame and then written to a CSV file which can later be read 
as a text file. During data preprocessing, the result of padding sequences undertaken after encoding 
and removing outliers ought to be a 2D array features, with as many rows as there are reviews and 
as many columns as the specified sequence length as depicted in Figure 10 using the driverless car 
dataset. 
 
Fig 10.  2D array features 
4.2. Training, Validation and Test Sets 
Normally, there is a need to split our data into training, validation, and testing. Again, using the 
driverless car and the extensive twitter dataset, the result obtained after creating the training, 
validation, and test sets are depicted in Figures 11 and 12 in Table 4 respectively for driverless and 
extensive twitter datasets. 
47 
 
University of Ghana http://ugspace.ug.edu.gh
Table 4.  Training, validation and test set results 
 
 
 
 
Fig 11.     Feature shapes of train, validation and  
Fig 12.     Feature shapes of train, validation and test sets for 
test sets for driverless car dataset extensive twitter dataset 
 
 
 
  
  
Fig 13.     Data loading and batching results using tensor  Fig 14.  Data Loading and Batching results using  
Dataset considering driverless car dataset tensor Dataset considering the extensive twitter dataset 
  
  
 
48 
 
University of Ghana http://ugspace.ug.edu.gh
Following the creation of training, testing, and validation of the data, we create Data Loaders by 
defining a batch size for the data using the tensor dataset, then constructed data loaders and batch 
the dataset. In the process of doing this, the training data needs to be shuffled and a batch of training 
data needs to be obtained. This is an alternative approach to creating a generator function for 
batching our data into full batches. 
Figures 13 and 14 in Table 4 depict the results after Data Loading and Batching using the tensor 
dataset considering the driverless car dataset and the extensive twitter data respectively. 
4.3. Sampling  
Appendix C shows the result on our sampling considering the driverless dataset. Length after 
sampling: 1094. Count 0’s (zeros) sarcastic statements after sampling: 547. Appendix D shows the 
result on our sampling considering the extensive or broad-spectrum dataset. 
Length after sampling: 47542 
Count 0’s sarcastic statements after sampling: 23771 
4.4. Training 
In building a model, a new kind of cross entropy loss was used, which is designed to work with a 
single sigmoid output. BCELoss, or binary Cross Entropy Loss, applies cross entropy loss to a 
single value between 0 and 1. 
Some data and training hyparameters we had are: 
Learning rate for optimizers which is represented by Lr 
Number of times to iterate through the training dataset and which is our Epoch 
The maximum gradient value to clip at which is our Clip 
49 
 
University of Ghana http://ugspace.ug.edu.gh
In defining our loss and optimization function we used a learning rate (lr) of 0.001 that is lr=0.001 
and the Binary Cross Entropy Loss function being our criterion. The learning rate as a hyper 
parameter which is used in training neural network that has positive value normally range between 
0.0 and 0.1. The defined learning rate is not too large and hence it will not converge too quickly 
to a suboptimal solution and also the process does not get stuck. Our optimizer therefore made use 
of the defined learning rate and other net parameters functions needed to ascertain the optimizer. 
More so, during the training of the program, an epoch of 4 was assigned. Three to four (3-4) epoch 
is approximated when validation loss stop decreasing is noticed. After defining an initial counter 
and every print point the paramount element known as the clip or gradient clipping is defined to 
prevent exploding gradients. 
With these in place, we move our model through a gpu then call on the net training function, train 
by initializing hidden state for quite a number of epochs and performed a bath loop to create new 
variables for the hidden state or backdrop over the whole training history. We also obtained the 
zero accumulated gradients, get the output from the model, calculate the loss and perform 
backprop. There is the exploding gradient problem in RNN and LSTM and hence, the use of 
clip_grad_norm function assists to avoid the gradient problem. 
Loss Statistics: 
We can do the loss statistics by obtaining the validation loss. In getting the validation loss, we 
create backprop through the entire training history or new variables for the hidden state or and then 
undergo the net training to print each Epoch, Step, Loss and validation Loss in the defined format 
by calling the format functions we define in our program. 
50 
 
University of Ghana http://ugspace.ug.edu.gh
Appendix E shows the results obtained using our driverless car dataset without that of the extensive 
twitter dataset due to its enormousness in Appendix B. 
4.5. Testing 
As already discussed, our network can be tested by test data performance which helps us view our 
trained model performs on all of the defined test data and then calculate the average loss and 
accuracy over the test data. 
Another way to test our network is by inference on user-generated data which enables us to have 
the possibility of inputting just one comment or review as an example at a time and without a label 
and see what the trained model predicts. The fact is that we can look at new and user input data as 
such and predict an output model; we are justified to refer to this as inference. 
So in the course of the testing, we get the data loss and accuracy by tracking loss, initializing 
hidden state using the batch size, iterating over the test data by here also creating a new variable 
for hidden state or back prop through the entire training history, getting predicted output, 
calculating loss, converting output probabilities to a predicted class of (0 or 1) and comparing 
predictions to true label 
We can then obtain and print the average test loss and get accuracy over all the test data. Using 
our driverless dataset, a test loss of 0.136 and a test accuracy of 0.757 was obtained. More so, a test 
loss of 0.309 and a test accuracy of 0.905 was obtained using our extensive twitter dataset. These 
results depends on the size of the datasets and its composition. We are getting lesser test loss for 
driverless dataset because the dataset size is small than the extensive twitter dataset and therefore 
few loss will be attained. More so, we are obtaining higher test accuracy for the extensive twitter 
dataset comparably to the driverless dataset because the dataset size for the extensive twitter 
51 
 
University of Ghana http://ugspace.ug.edu.gh
dataset is far greater than the driverless car dataset and hence, a higher probability of obtaining 
more sarcastic sentiments in the tweets. 
4.5.1 Inference on Test Review: 
An inference on our test review can now be conducted by changing the test review or comment to 
any desired test and then check or test manually if the model does a correct prediction. Considering 
these examples: 
i. Sarcastic statement-“The day I never have to deal with Fred will be one of the best days of my 
life indeed” 
ii. Non-sarcastic review-“everything will alright”. 
How does the test work with each of these examples? It starts by tokenizing the review by getting 
rid of punctuations and splitting by spaces to get the needed tokens in an array and then derive 
their integer values. We then undertake test sequence padding. We generate the features by passing 
in the generated integers and the sequence length. The feature can then be printed. Figure 15 is our 
obtained feature using our test review 
 
Fig 15.  Obtained feature from test review 
52 
 
University of Ghana http://ugspace.ug.edu.gh
We can then do test alteration to tensor and pass it into our model. Here we obtained torch. Size 
([1, 14]) as feature tensor size. So basically, a predict function accepts as parameters test review, 
sequence length, and net. 
Through a predict function, we tokenize the review, pad tokenized sequence, convert to tensor to 
pass into our model, initialize hidden state, get the output from the model, convert output 
probabilities to a predicted class of (0 or 1)  even before the output value is rounded and then print 
custom response as sarcastic or non-sarcastic. So with our reviews, we defined a sequence length 
of 141 and pass it through our predict function. Our test result for the sarcastic statement was a 
prediction value pre-rounding 0.998875 when we used a sarcastic tweet! And that of the non-
sarcastic was a prediction value pre-rounding 0.000055. Same pre-rounding values will be 
obtained in terms of prediction value if we use different values for sequence length. In other words, 
a sequence length of 141 was defined because it is the maximum number of characters a tweet can 
contain. Any number less than that will just be padded with zeros. 
4.6. Confusion Matrix 
The Confusion matrix for computing the precision, F1-Score and recall measures has been 
expressed as defined our methodology. 
Precision is derived by the percentage of the number true positives (correctly predicted positives) 
by the search process to the total number of misclassified relevant documents observation (false 
positives) plus the number of relevant documents (true positives) that was recovered by the same 
search process.  
53 
 
University of Ghana http://ugspace.ug.edu.gh
The recall is derived by the ratio of the correctly predicted positive observations to the total number 
of existing or actual relevant documents. Whilst F1 score is obtained from the weighted average 
of the precision and accuracy. It considers both false positives and false negatives. 
With our confusion matrix without normalization using the driverless car dataset for training, our 
derived precision was 0.995, the recall was 1.000 and the F1 Score was 0.997. Table 5 shows our 
tensor matrix for trained data using a driverless car dataset. 
Table 5  Confusion matrix for trained driverless car dataset 
Confusion matrix 
Non sarcastic statements 543 3 
Sarcastic statements 0 544 
 Non sarcastic statements `Sarcastic statements 
 Predicted label 
With the validation data, our precision was 0.632, the recall was 0.977 and the F1 Score was 0.761 
using the driverless car dataset. Table 6 depicts the matrix results of the validated data using the 
driverless car dataset involving its true and predicted label.  
Table 6 Confusion matrix for validated driverless car dataset 
 
 
 
 
54 
 
True label 
 
University of Ghana http://ugspace.ug.edu.gh
Confusion matrix 
Non sarcastic statements 43 26 
Sarcastic statements 1 0 
 Non sarcastic statements `Sarcastic statements 
 Predicted label 
Using the extensive twitter dataset, our precision was 0.975, the recall was 0.997 and the F1 Score 
was 0.986. These different results for the different datasets depends on the size of the datasets and 
their composition for each performance evaluation measure considered. Table 7 depicts the results 
of our validated data without normalization using the extensive twitter dataset. 
Table 7  Confusion matrix for validated extensive twitter dataset 
Confusion matrix 
Non sarcastic statements 5766 147 
Sarcastic statements 18 459 
 Non sarcastic statements `Sarcastic statements 
 Predicted label 
 
Additionally, there is a plot of a confusion matrix for the test data with the same defined formula 
for precision, recall, and F1 Score. Precision was 0.679, the recall was 0.939 and the F1 Score was 
0.800 using the driverless car dataset. The different values are more so as a result of the different 
55 
 
True label True label 
  
University of Ghana http://ugspace.ug.edu.gh
parameters in the principle of the performance evaluation measures. Table 8 depicts the results of 
the confusion matrix for the test using the driverless car dataset. 
Table 8  Confusion matrix for test data using the driverless car dataset. 
Confusion matrix 
Non sarcastic statements 46 20 
Sarcastic statements 1 3 
 Non sarcastic statements `Sarcastic statements 
 Predicted label 
On the other hand, the extensive twitter car dataset generated a precision of 0.977, Recall of 0.997, 
and F1 score of 0.987 after data testing. Table 9 depicts the results of the confusion matrix for the 
test using the extensive twitter dataset. 
Table 9  Confusion matrix for test data using the extensive twitter dataset. 
Confusion matrix 
Non sarcastic statements 5809 137 
Sarcastic statements 15 429 
 Non sarcastic statements `Sarcastic statements 
 Predicted label 
 
Accuracy is determined by the number of test predictions over the length of the total test dataset. 
On the other hand, the test loss which considers the average test lost is the mean of the test losses. 
56 
 
True label True label 
  
University of Ghana http://ugspace.ug.edu.gh
The result of our setup shows a test loss of 0.309 and an accuracy of 0.905 using the extensive 
twitter dataset. That of the driverless car dataset shows a test loss of 0.136 and a test accuracy of 
0.757. The figures are rounded to 3dp. 
Again, it should be noted that these results depends on the size of the datasets and its composition. 
We are getting lesser test loss for driverless dataset because the dataset size is small than the 
extensive twitter dataset and therefore few loss will be attained. More so, we are obtaining higher 
test accuracy for the extensive twitter dataset comparably to the driverless dataset because the 
dataset size for the extensive twitter dataset is far greater than the driverless car dataset and hence, 
a higher probability of obtaining more sarcastic sentiments in the tweets. 
 The results provide an accurate, efficient, and reliable prediction based on the confusion matrix derived. 
 
 
 
 
 
 
 
 
 
 
57 
 
University of Ghana http://ugspace.ug.edu.gh
CHAPTER 5 
CONCLUSION 
 
5.1. Summary and Conclusion  
Our study concentrates on developing a framework that will help in the identification of sarcastic 
sentiment in an opinion poll when undertaking sentiment analysis. This would assist to investigate 
the significant impact of sarcastic statements in opinion polls. Twitter datasets were used for this 
study. The study consists of five chapters and an extensive discussion has been done in each of the 
chapters. Sentiment analysis has been defined as the expression of opinions and attitudes by social 
media and internet users towards a specific topic or subject. Sarcasm detection during sentiment 
analysis is not an easy task and hence few works have been done on the subject matter. It has been 
viewed as consuming time and effort. The study undertakes a review and an in-depth analysis of 
sarcasm in sentiment analysis over the past 5 years and then develop a system that helps in the 
identification of a sarcastic statement when undertaking sentiment analysis. 
The study brings about the development of a framework that enables us to classify sentiments as 
sarcastic or non-sarcastic, determine the level of predicted accuracy and loss and investigate the 
significant impact of the sarcastic sentiments in an opinion poll. The significant impact can be 
known because as we get to know the number of sarcastic statements, it would enable us to easily 
identify the impact they have on the holistic opinion poll. Theoretically and empirically, the effect 
of sarcastic statements in an opinion poll has been proven and a system has been developed that 
classifies sentiments into sarcastic and non-sarcastic. Consequently, we will be able to determine 
the entire level of sarcasm as high, low, or medium based on the number of detected sarcastic 
sentiments to the total comments in an opinion poll. 
58 
 
University of Ghana http://ugspace.ug.edu.gh
A framework has been introduced that comprises of three operators namely Cluster+Expert 
Judgement, Train & Validate and Classify & predict; that facilitates sentiment classification. 
Dataset used was a driverless car data which talked about the emergence of driverless cars being 
powered by Google which we applied X-means clustering algorithm (due to its repetitive nature 
to form partition to achieve the Bayesian Information Criterion) with an expert judgment that is 
based on intuition to label, and a broad-spectrum twitter dataset consisting of both sarcastic and 
non-sarcastic tweets.  
The study considered LSTM for the implementation of our framework. The study of (Kumar et 
al., 2020) and (Son et al., 2019), employed multi-head Attention-based Bidirectional LSTM and 
sAtt-BLSTM respectively. However, their experiment reveals that their methodology is expensive 
and much complicated. We therefore employed a cheap, simple, efficient and straight forward 
approach by employing purely LSTM to produce a framework that helps in the determination of 
sarcastic statements when undertaking sentiment analysis. BLSTM would consequently lead to 
higher processing costs and complexities as compared to LSTM based on the experiment results 
of (Son et al., 2019). Considering their SemEval dataset, LSTM performs better than BLSTM as 
far as precision is concerned. The precision for LSTM was 86.78 whilst that of BLSTM was 
81.61.This indicates that LSTM can perform better depending on the type of dataset and did 
perform well when we passed our datasets through our model. 
We considered LSTM for our studies because it is effective and efficient in language processing. 
LSTM will make a prediction of sarcasm easy as compared to conventional lexicon-based and 
machine learning approaches. The employment of LSTM will extensively help to ensure better 
performance. There was an implementation of LSTM that could detect sarcastic comments. Using 
LSTM rather than a strict feedforward network is more accurate since we can include a sequence of 
59 
 
University of Ghana http://ugspace.ug.edu.gh
words information. In our descriptive implementation using LSTM, we started with loading and 
visualizing the data. There was data pre-processing which involved encoding the words, removing 
outliers, and padding sequences. A decision is made on how to split training, validation, and test 
data. We proceeded with creating data loaders with full batching. The network was defined using 
PyTorch and the layers were an embedding layer that converts tokens into embedding of a hidden 
state size, specific size, the number of layers which defines LSTM, a fully-connected output layer 
that maps the LSTM layer outputs to the desired output size and a sigmoid activation that returns 
only the last sigmoid output as the output of the network. We then instantiated the network by the 
creation of hyperparameters. 
In training our data it was advantageous to use a new kind of cross-entropy loss, which was 
designed to work with a single sigmoid output. BCELoss, or Binary Cross-Entropy Loss, applies 
cross-entropy loss to a single value between 0 and 1. The hyperparameters where defined. Two 
ways were proposed for the testing which was testing data performance and inference on user-
generated data. A confusion matrix was derived and values for precision, recall, f-score, and 
accuracy for the different datasets and stages were recorded accordingly. The results of our 
experiment are shown in our results and discussion section of our work. The lost statistics were 
obtained through the validation loss which is derived by the creation of new variables for hidden 
state or back prop through the entire training history and undergoing the net training to print each 
Epoch, Step, Loss, and validation loss. Using our driverless dataset, a test loss of 0.136 and a test 
accuracy of 0.757 was obtained. More so, a test loss of 0.309 and a test accuracy of 0.905 was 
obtained using our extensive twitter dataset. We could then do test conversion to tensor and pass 
it into our model. Here we obtained torch. Size ([1, 14]) as feature tensor size. So basically, a 
predict function accepted as parameters test review, sequence length, and net. So with our reviews, 
60 
 
University of Ghana http://ugspace.ug.edu.gh
we defined a sequence length of 141 and pass it through our predict function. Our test result for 
the sarcastic statement was a prediction value pre-rounding 0.998875 using a sarcastic tweet. And 
that of the non-sarcastic was a prediction value pre-rounding 0.000055. The results indicates that 
the prediction accuracy for sarcastic detection was very high for the tweet considered. On the other 
hand, a very slight loss was obtained. This could be confirmed by an expect judgment. The results 
therefore provide an accurate, efficient, and reliable prediction.  
5.2. Threats to Validity  
5.2.1  External Validity 
This study took into consideration two different independent twitter datasets. The datasets were 
convenient samples and do not represent commonly every datasets since some datasets may 
contain images and other emoji. Outcomes from this study therefore might not be boldly 
widespread beyond other datasets. 
5.2.2  Internal Validity 
Threats to internal validity consist of factors that relate to the selection of the independent variables 
used for setting up the predictive models that could impact the results of this study. This study 
chose independent variables that are known to be existing before the starting of the project. 
5.2.3  Constructive Validity 
This study considered LSTM for building the prediction model when sampling. The model was 
considered as a result of the fact that it has been revealed to improve prediction accuracy and hence 
can be considered as an effective and reliable model.  
61 
 
University of Ghana http://ugspace.ug.edu.gh
5.2.4.  Conclusion Validity 
The models employed in the study were programmed and hence can result in automation biasness. 
When automating a process, it involves sequences of assumptions made which can end in biasness. 
Nonetheless, the assumptions made in this study are based on prior knowledge from already 
specified previous studies in building data mining and prediction models (Kumar et al.,2020 ; Son 
et al.,2019). The validity of the results therefore can be trusted. 
5.3  Future Work 
In a future study, further theoretical and empirical studies will be conducted on the determination 
and differentiation of some images and emoji as sarcastic or otherwise. More so, there should be 
a differentiation of sarcastic from an ironic statement. There could be an implementation using an 
unsupervised approach by other clustering algorithms and applying expert judgment algorithm on 
all raw data collected. Sarcasm has become very predominant in Ghanaian society. Therefore a 
future study could also consider the identification of sarcasm in Akan when undertaking sentiment 
analysis in Ghanaian Akan language 
 
 
 
 
 
 
 
 
 
62 
 
University of Ghana http://ugspace.ug.edu.gh
REFERENCES 
 
Adri, J. (2016). A Sentiment Analysis System of Spanish Tweets and Its Application in 
Colombia 2014 Presidential Election. https://doi.org/10.1109/BDCloud-SocialCom-
SustainCom.2016.47 
Agarwal, B., Mittal, N., Bansal, P., & Garg, S. (2015). Sentiment Analysis Using Common-
Sense and Context Information. Computational Intelligence and Neuroscience, 2015, 1–9. 
https://doi.org/10.1155/2015/715730 
Akhoundzade, R., & Devin, K. H. (2019). Persian sentiment lexicon expansion using 
unsupervised learning methods. 2019 9th International Conference on Computer and 
Knowledge Engineering, ICCKE 2019, (Iccke), 461–465. 
https://doi.org/10.1109/ICCKE48569.2019.8964692 
Alayba, A. M., Palade, V., England, M., & Iqbal, R. (2020). Arabic Language Sentiment 
Analysis on Health Services, 114–118. 
Ayata, D., Saraclar, M., & Ozgur, A. (2017). Uzun-K Õ sa Süreli Bellek Yinelemeli A ÷ lar ile 
Politik Yönelimlerin / Duygular Õ n Twitter üzerinden Tahminlenmesi Political Opinion / 
Sentiment Prediction via Long Short Term Memory Recurrent Neural Networks on Twitter. 
2017 25th Signal Processing and Communications Applications Conference (SIU), 1–4. 
https://doi.org/10.1109/SIU.2017.7960733 
Baktha, K., Tripathy, B. K., Member, S., & Rnn, A. V. (2017). Investigation of Recurrent Neural 
Networks in the field of Sentiment Analysis, 2047–2050. 
Bhan, N., & D’Silva, M. (2018). Sarcasmometer using sentiment analysis and topic modeling. 
International Conference on Advances in Computing, Communication and Control 2017, 
ICAC3 2017, 2018-Janua, 1–6. https://doi.org/10.1109/ICAC3.2017.8318782 
Bharti, D. K., Pradhan, R., Babu, K. S., & Jena, S. K. (2017). Sarcastic Sentiment Detection 
Based on Types of Sarcasm Occurring in Twitter Data, (August). 
https://doi.org/10.4018/IJSWIS.2017100105 
Bharti, S. K., Babu, K. S., & Jena, S. K. (2015a). Parsing-based sarcasm sentiment recognition in 
Twitter data. Proceedings of the 2015 IEEE/ACM International Conference on Advances in 
Social Networks Analysis and Mining, ASONAM 2015, 1373–1380. 
https://doi.org/10.1145/2808797.2808910 
Bharti, S. K., Babu, K. S., & Jena, S. K. (2015b). Parsing-based sarcasm sentiment recognition in 
Twitter data. Proceedings of the 2015 IEEE/ACM International Conference on Advances in 
Social Networks Analysis and Mining, ASONAM 2015, 1373–1380. 
https://doi.org/10.1145/2808797.2808910 
Bharti, S. K., Naidu, R., & Babu, K. S. (2018). Hyperbolic Feature-based Sarcasm Detection in 
Tweets: A Machine Learning Approach. 2017 14th IEEE India Council International 
Conference, INDICON 2017. https://doi.org/10.1109/INDICON.2017.8487712 
63 
 
University of Ghana http://ugspace.ug.edu.gh
Bouazizi, M., & Ohtsuki, T. (2015). Opinion Mining in Twitter How to Make Use of Sarcasm to 
Enhance Sentiment Analysis. 2015 IEEE/ACM International Conference on Advances in 
Social Networks Analysis and Mining (ASONAM), 1594–1597. 
https://doi.org/10.1145/2808797.2809350 
Chaudhari, P., & Chandankhede, C. (2018). Literature survey of sarcasm detection. Proceedings 
of the 2017 International Conference on Wireless Communications, Signal Processing and 
Networking, WiSPNET 2017, 2018-Janua, 2041–2046. 
https://doi.org/10.1109/WiSPNET.2017.8300120 
Dharmavarapu, B. D., & Bayana, J. (2019). Sarcasm Detection in Twitter using Sentiment 
Analysis, (1), 642–644. 
El-jawad, M. H. A., & Hodhod, R. (2017). Sentiment Analysis of Social Media Networks Using 
Machine Learning. 2018 14th International Computer Engineering Conference (ICENCO), 
174–176. 
Goularas, D., & Kamis, S. (2019). Evaluation of Deep Learning Techniques in Sentiment 
Analysis from Twitter Data. 2019 International Conference on Deep Learning and Machine 
Learning in Emerging Applications (Deep-ML), 12–17. https://doi.org/10.1109/Deep-
ML.2019.00011 
Harvinder, M., & Kaur, J. (2015). Sentiment Analysis from Social Media in Crisis Situations. 
International Conference on Computing, Communication & Automation, 251–256. 
https://doi.org/10.1109/CCAA.2015.7148383 
Hevner, A. R. (2007). A Three Cycle View of Design Science Research A Three Cycle View of 
Design Science Research, 19(2). 
Hiai, S., & Shimada, K. (2016). A sarcasm extraction method based on patterns of evaluation 
expressions. Proceedings - 2016 5th IIAI International Congress on Advanced Applied 
Informatics, IIAI-AAI 2016, 31–36. https://doi.org/10.1109/IIAI-AAI.2016.198 
Jain, T., Agrawal, N., Goyal, G., & Aggrawal, N. (2018). Sarcasm detection of tweets: A 
comparative study. 2017 10th International Conference on Contemporary Computing, IC3 
2017, 2018-Janua(August), 1–6. https://doi.org/10.1109/IC3.2017.8284317 
Khasawneh, R. T., Wahsheh, H. A., Arabia, S., Aismadi, I. M., & Arabia, S. (2013). Sentiment 
Analysis of Arabic Social Media Content : A Comparative Study, 101–106. 
https://doi.org/10.1109/ICITST.2013.6750171 
Khullar, H., & Singh, A. (2019). A Proposed Approach for Sentiment Analysis and Sarcasm 
Detection on Textual Data, (1), 3387–3391. 
Kumar, A., Narapareddy, V. T., Srikanth, V. A., Malapati, A., Bhanu, L., & Neti, M. (2020). 
Sarcasm Detection Using Multi-Head Attention Based Bidirectional LSTM. IEEE Access, 
8, 6388–6397. https://doi.org/10.1109/ACCESS.2019.2963630 
Li, L., Zhao, Y., Jiang, D., Zhang, Y., Wang, F., Gonzalez, I., … Sahli, H. (2013). Hybrid Deep 
Neural Network--Hidden Markov Model (DNN-HMM) Based Speech Emotion 
Recognition. In Proceedings of the 2013 Humaine Association Conference on Affective 
Computing and Intelligent Interaction (pp. 312–317). https://doi.org/10.1109/ACII.2013.58 
64 
 
University of Ghana http://ugspace.ug.edu.gh
Losada, J. C., & Benito, R. M. (2018). Recurrent Patterns of User Behavior in Different Electoral 
Campaigns : A Twitter Analysis of the Spanish General Elections of 2015 and 2016, 2018. 
Lunando, E., & Purwarianti, A. (2013). Indonesian social media sentiment analysis with sarcasm 
detection. 2013 International Conference on Advanced Computer Science and Information 
Systems, ICACSIS 2013, 195–198. https://doi.org/10.1109/ICACSIS.2013.6761575 
Majumder, N., Poria, S., Peng, H., Chhaya, N., Cambria, E., & Gelbukh, A. (2019). Sentiment 
and Sarcasm Classification with Multitask Learning. IEEE Intelligent Systems, 34(3), 38–
43. https://doi.org/10.1109/MIS.2019.2904691 
Manohar, M. Y., & Kulkarni, P. (2018). Improvement sarcasm analysis using NLP and corpus 
based approach. Proceedings of the 2017 International Conference on Intelligent 
Computing and Control Systems, ICICCS 2017, 2018-Janua, 618–622. 
https://doi.org/10.1109/ICCONS.2017.8250536 
Mensah, S., Keung, J., Bennin, K. E., & Bosu, M. F. (2016). Multi-Objective Optimization for 
Software Testing Effort Estimation. In Proceedings of the 28th International Conference on 
Software Engineering and Knowledge Engineering (SEKE) (pp. 527–530). San Francisco 
Bay, California, USA: SEKE. https://doi.org/10.18293/SEKE2016-017 
Mensah, S., Keung, J., Svajlenko, J., Ebo, K., & Mi, Q. (2018). The Journal of Systems and 
Software On the value of a prioritization scheme for resolving Self-admitted technical debt. 
The Journal of Systems & Software, 135, 37–54. https://doi.org/10.1016/j.jss.2017.09.026 
Munandar, D., Arisal, A., Riswantini, D., Rozie, A. F., & Description, A. D. (2018). Text 
Classification for Sentiment Prediction of Social Media Dataset using Multichannel 
Convolution Neural Network. 2018 International Conference on Computer, Control, 
Informatics and Its Applications (IC3INA), 104–109. 
Paredes-valverde, M. A., Colomo-palacios, R., Salas-zárate, M. P., & Valencia-garcía, R. (2017). 
Sentiment Analysis in Spanish for Improvement of Products and Services : A Deep 
Learning Approach, 2017. 
Park, J. H., Sung, Y., Sharma, P. K., Jeong, Y.-S., & Yi, G. (2017). Novel assessment method for 
accessing private data in social network security services. The Journal of Supercomputing, 
73(7), 3307–3325. https://doi.org/10.1007/s11227-017-2018-6 
Parmar, K., Limbasiya, N., & Dhamecha, M. (2018). Feature based Composite Approach for 
Sarcasm Detection using MapReduce. Proceedings of the 2nd International Conference on 
Computing Methodologies and Communication, ICCMC 2018, (Iccmc), 587–591. 
https://doi.org/10.1109/ICCMC.2018.8488096 
Porwal, S., Ostwal, G., Phadtare, A., Pandey, M., & Marathe, M. V. (2019). Sarcasm Detection 
Using Recurrent Neural Network. Proceedings of the 2nd International Conference on 
Intelligent Computing and Control Systems, ICICCS 2018, (Iciccs), 746–748. 
https://doi.org/10.1109/ICCONS.2018.8663147 
Prasad, A. G., Sanjana, S., Bhat, S. M., & Harish, B. S. (2017). Sentiment analysis for sarcasm 
detection on streaming short text data. 2017 2nd International Conference on Knowledge 
Engineering and Applications, ICKEA 2017, 2017-Janua(2009), 1–5. 
65 
 
University of Ghana http://ugspace.ug.edu.gh
https://doi.org/10.1109/ICKEA.2017.8169892 
Raghav, S., & Kumar, E. (2018). Review of automatic sarcasm detection. 2nd International 
Conference on Telecommunication and Networks, TEL-NET 2017, 2018-Janua, 1–6. 
https://doi.org/10.1109/TEL-NET.2017.8343562 
Razali, M. S., Halin, A. A., Norowi, N. M., & Doraisamy, S. C. (2018). The importance of 
multimodality in sarcasm detection for sentiment analysis. IEEE Student Conference on 
Research and Development: Inspiring Technology for Humanity, SCOReD 2017 - 
Proceedings, 2018-Janua, 56–60. https://doi.org/10.1109/SCORED.2017.8305421 
Rendalkar, S., & Chandankhede, C. (2018a). Sarcasm Detection of Online Comments Using 
Emotion Detection. Proceedings of the International Conference on Inventive Research in 
Computing Applications, ICIRCA 2018, (Icirca), 1244–1249. 
https://doi.org/10.1109/ICIRCA.2018.8597368 
Rendalkar, S., & Chandankhede, C. (2018b). Sarcasm Detection of Online Comments Using 
Emotion Detection. Proceedings of the International Conference on Inventive Research in 
Computing Applications, ICIRCA 2018, (Icirca), 1244–1249. 
https://doi.org/10.1109/ICIRCA.2018.8597368 
Romanowski, A. (2015). Sentiment Analysis of Twitter Data within Big Data Distributed 
Environment for Stock Prediction, 5, 1349–1354. https://doi.org/10.15439/2015F230 
Saeed, H. H. (2018). Overlapping Toxic Sentiment Classification using Deep Neural 
Architectures. 2018 IEEE International Conference on Data Mining Workshops (ICDMW), 
1361–1366. https://doi.org/10.1109/ICDMW.2018.00193 
Sanagar, S., & Gupta, D. (2020). Unsupervised Genre-Based Multidomain Sentiment Lexicon 
Learning Using Corpus-Generated Polarity Seed Words. IEEE Access, 8, 1–1. 
https://doi.org/10.1109/access.2020.3005242 
Serrano-Guerrero, J., Olivas, J. A., Romero, F. P., & Herrera-Viedma, E. (2015). Sentiment 
analysis: A review and comparative analysis of web services. Information Sciences, 311, 
18–38. https://doi.org/10.1016/j.ins.2015.03.040 
Son, L. H., Kumar, A., Sangwan, S. R., Arora, A., Nayyar, A., & Abdel-Basset, M. (2019). 
Sarcasm detection using soft attention-based bidirectional long short-term memory model 
with convolution network. IEEE Access, 7, 23319–23328. 
https://doi.org/10.1109/ACCESS.2019.2899260 
Srikanth, G. U. (n.d.). Survey of Sentiment Analysis Using Deep Learning Techniques. 
Suhaimin, M. S., Hanafi, M., Hijazi, A., Alfred, R., & Coenen, F. (2019). Modified framework 
for sarcasm detection and classification in sentiment analysis, 13(3), 1175–1183. 
https://doi.org/10.11591/ijeecs.v13.i3.pp1175-1183 
Teh, P. L., Boon, O. P., Chan, N. N., & Chuah, Y. K. (2018). A comparative study of the 
effectiveness of sentiment tools and human coding in sarcasm detection, (January 2019), 0–
15. https://doi.org/10.1108/JSIT-12-2017-0120 
Troussas, C., Krouska, A., & Virvou, M. (n.d.). Evaluation of Ensemble-based Sentiment 
66 
 
University of Ghana http://ugspace.ug.edu.gh
Classifiers for Twitter Data. 2016 7th International Conference on Information, 
Intelligence, Systems & Applications (IISA), 1–6. 
https://doi.org/10.1109/IISA.2016.7785380 
Wp, R., T, A. N. S., & T, C. S. S. (2017). Sentiment Analysis Using Multinomial Logistic 
Regression, 46–49. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67 
 
University of Ghana http://ugspace.ug.edu.gh
Appendix A 
 
Table 3 Selected articles using SLR 
 
 Tracking code Citation Source 
    
1 LT1 M. Bouazizi and T. Ohtsuki, “Opinion IEEE/ACM 
Mining in Twitter How to Make 
Use of Sarcasm to Enhance 
Sentiment Analysis,” 2015 
IEEE/ACM Int. Conf. Adv. Soc. 
Networks Anal. Min., pp. 1594–
1597, 2015. 
 
2 LT2 D. K. Bharti, R. Pradhan, K. S. Babu, and Scopus 
S. K. Jena, “Sarcastic Sentiment 
Detection Based on Types of 
Sarcasm Occurring in Twitter 
Data,” no. August, 2017. 
 
3 LT3 P. L. Teh, O. P. Boon, N. N. Chan, and Y. Scopus 
K. Chuah, “A comparative study of 
the effectiveness of sentiment tools 
and human coding in sarcasm 
detection,” no. January 2019, pp. 
0–15, 2018. 
4 LT4 M. Bouazizi and T. Otsuki, “A Pattern- IEEE 
Based Approach for Sarcasm 
Detection on Twitter,” IEEE 
68 
 
University of Ghana http://ugspace.ug.edu.gh
Access, vol. 4, pp. 5477–5488, 
2016. 
 
5 LT5 S. Porwal, G. Ostwal, A. Phadtare, M. IEEE 
Pandey, and M. V. Marathe, 
“Sarcasm Detection Using 
Recurrent Neural Network,” Proc. 
2nd Int. Conf. Intell. Comput. 
Control Syst. ICICCS 2018, no. 
Iciccs, pp. 746–748, 2019. 
 
6 LT6 S. Rendalkar and C. Chandankhede, IEEE 
“Sarcasm Detection of Online 
Comments Using Emotion 
Detection,” Proc. Int. Conf. Inven. 
Res. Comput. Appl. ICIRCA 2018, 
no. Icirca, pp. 1244–1249, 2018. 
 
7 LT7 N. Bhan and M. D’Silva, “Sarcasmometer IEEE 
using sentiment analysis and topic 
modeling,” Int. Conf. Adv. Comput. 
Commun. Control 2017, ICAC3 
2017, vol. 2018-Janua, pp. 1–6, 
2018. 
 
8 LT8 H. Khullar and A. Singh, “A Proposed Scopus 
Approach for Sentiment Analysis 
and Sarcasm Detection on Textual 
Data,” no. 1, pp. 3387–3391, 2019. 
9 LT9 Y. Wang, K. Kim, B. Lee, and H. Y. Springer 
Youn, “Word clustering based on 
69 
 
University of Ghana http://ugspace.ug.edu.gh
POS feature for efficient twitter 
sentiment analysis,” Human-centric 
Comput. Inf. Sci., 2018. 
 
10 LT10 S. K. Bharti, K. S. Babu, and S. K. Jena, IEEE/ACM 
“Parsing-based sarcasm sentiment 
recognition in Twitter data,” Proc. 
2015 IEEE/ACM Int. Conf. Adv. 
Soc. Networks Anal. Mining, 
ASONAM 2015, pp. 1373–1380, 
2015. 
 
11 LT11 A. G. Prasad, S. Sanjana, S. M. Bhat, and IEEE 
B. S. Harish, “Sentiment analysis 
for sarcasm detection on streaming 
short text data,” 2017 2nd Int. 
Conf. Knowl. Eng. Appl. ICKEA 
2017, vol. 2017-Janua, no. 2009, 
pp. 1–5, 2017. 
 
12 LT12  A. G. Prasad, S. Sanjana, S. M. IEEE 
Bhat, and B. S. Harish, “Sentiment 
analysis for sarcasm detection on 
streaming short text data,” 2017 
2nd Int. Conf. Knowl. Eng. Appl. 
ICKEA 2017, vol. 2017-Janua, no. 
2009, pp. 1–5, 2017. 
 
13 LT13 M. S. Razali, A. A. Halin, N. M. Norowi, IEEE/ACM 
and S. C. Doraisamy, “The 
importance of multimodality in 
70 
 
University of Ghana http://ugspace.ug.edu.gh
sarcasm detection for sentiment 
analysis,” IEEE Student Conf. Res. 
Dev. Inspiring Technol. Humanit. 
SCOReD 2017 - Proc., vol. 2018-
Janua, pp. 56–60, 2018. 
 
14 LT14 B. D. Dharmavarapu and J. Bayana, Scopus 
“Sarcasm Detection in Twitter 
 
using Sentiment A Sarcastic 
sentiment detection based on types 
of sarcasm occurring in twitter 
datanalysis,” no. 1, pp. 642–644, 
2019. 
 
15 LT15 M. S. Suhaimin, M. Hanafi, A. Hijazi, R. Scopus 
Alfred, and F. Coenen, “Modified 
framework for sarcasm detection 
and classification in sentiment 
analysis,” vol. 13, no. 3, pp. 1175–
1183, 2019 
16 LT16 N. Majumder, S. Poria, H. Peng, N. IEEE 
Chhaya, E. Cambria, and A. 
Gelbukh, “Sentiment and Sarcasm 
Classification with Multitask 
Learning,” IEEE Intell. Syst., vol. 
34, no. 3, pp. 38–43, 2019. 
 
 
 
 
 
71 
 
University of Ghana http://ugspace.ug.edu.gh
Appendix B: the results obtained using our extensive twitter dataset 
Available online on: 
https://colab.research.google.com/drive/100Hfb_Igyx_Z8M5Af7gZInPBXFHSpeR0#scrollTo=R
rPOFIeJLCCA. 
Appendix C: shows the result on our sampling considering the driverless 
dataset. 
Sample input size:  torch. Size ([10, 71]) 
Sample input:  
 tensor ([[    0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, 
                  0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, 
                  0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, 
                  0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, 
                  0,      0,      0,      0,      0,      0,      0,      0,      0, 1673,    52,    23, 
                10,    92,      6,    82,    98,    10,      4,   475,   195,      1,   652], 
            [    0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, 
                  0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, 
                  0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, 
                  0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, 
                  0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, 
                  0,      0,      0,      0,      0,      0,   535,    30,    61,   929,   218], 
            [    0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, 
                  0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, 
                  0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, 
                  0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,   332, 
                  1,    90,   471,    83,    14,   801,   259,    52,    84,    35,      9,   802, 
               199,    62,   142,   218,   808,   245,    41,    54,   133,    38,   244], 
            [    0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, 
                  0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, 
                  0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, 
                  0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, 
                  0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, 
                  0,      0,    42,    16,   133,      1,    27,    44,   285,      5,   309], 
            [    0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, 
                  0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, 
                  0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, 
                  0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,    78, 
                  0,    21,   111,    19,      3,   188,    61,    12,    19,      8,      2,   188, 
                43,      8,   306,   153, 1041,    58,      5,   101,    11,    37,   504], 
            [    0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, 
                  0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, 
                  0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, 
                  0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, 
                  0,      0,      0,      0,      0,      0,      0,      0, 1296,    83,      4,      2, 
72 
 
University of Ghana http://ugspace.ug.edu.gh
             1297,      5,    94,    50,   197,   642, 1408,   110,   262, 1300,   385], 
            [    0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, 
                  0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, 
                  0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, 
                  0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,   266, 
                  1,    72,   378,    67,    11,   642,   208,    41,    67,    28,      7,   643, 
               147,    38,   114,   439,   644,   202,    41,    18,    61,    36,    48], 
            [    0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, 
                  0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, 
                  0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, 
                  0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, 
                  0,      0,      2,    97,      2,    83,    60,   392,    51,      1,    88,    71, 
                18,   121,   406,   456,   216,      2, 2433,    24,    44,    21,    69], 
            [    0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, 
                  0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, 
                  0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, 
                  0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,   346, 
                  1,    93,   491,    87,    15,   835,   270,    54,    88,    37,      9,   836, 
               205,    60,   147,   226,   841,   249,    38,    49,   114,    38,   213], 
            [    0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, 
                  0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, 
                  0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, 
                  0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, 
                  0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0,      0, 
                  0,      0,      0,      0,      0,      0,    38,    24,    18,      7, 1089]]) 
 
Sample label size:   torch. Size ([10]) 
Sample label:  
 tensor ([0, 0, 1, 1, 1, 1, 1, 0, 1, 0]) 
 
Appendix D: shows the result on our sampling considering the extensive or 
broad-spectrum dataset. 
 
Sample input size:   torch.Size ([10, 71]) 
Sample input:  
 tensor ([[      0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       1,       1,       1,    130, 22819,       9, 
               1868,   1675,       7,      23,    108,   2027,      18,    243,      42,      17, 
                  10], 
            [      0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
73 
 
University of Ghana http://ugspace.ug.edu.gh
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,    375,      45, 10147,   2019,      10,    401, 
               1298,    331, 10276, 15716, 15717, 15718,   8031, 15719,    592,   1489, 
             37558], 
            [      0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,    372, 
               9757,    337, 17541, 13512,      87, 11040,   2244, 17543,   3319, 17544, 
             17545], 
            [      0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,    261,      19,   6469, 30503,       8, 
               6470], 
            [      0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,    955,   9574, 13450, 
                426], 
            [      0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,      10,    106,    756,    284,    756,   1170,    548, 
                  30,    661,   2474, 12438,    500,      52,       8,   1365,      11,      90, 
               1002], 
            [      0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       1,   1944,   1945,   1946,    102,   1676,      10,   1622, 
                   1], 
            [      0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       6,    233,      18,       4,    331,      73,    331,    630,    498, 
                392], 
            [      0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
74 
 
University of Ghana http://ugspace.ug.edu.gh
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,      60,       3,    736,   5763, 
               5763], 
            [      0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0,       0,       0,       0, 
                   0,       0,       0,       0,       0,       0,       0, 20885, 33180, 33181, 
                  17,      12,    364,    218,    420,    432,    364,   5773,   1116, 21164, 
             26532]]) 
Sample label size:   torch.Size([10]) 
Sample label:  
 tensor([0, 0, 1, 0, 0, 0, 1, 1, 0, 1]) 
 
Appendix E: shows the results obtained using our driverless car dataset 
 
Epoch: 1/4... Step: 10... Loss: 0.308221... Val Loss: 0.091225 
Epoch: 1/4... Step: 20... Loss: 0.003682... Val Loss: 0.102582 
Epoch: 1/4... Step: 30... Loss: 0.027171... Val Loss: 0.090254 
Epoch: 1/4... Step: 40... Loss: 0.046117... Val Loss: 0.098415 
Epoch: 1/4... Step: 50... Loss: 0.753728... Val Loss: 0.090104 
Epoch: 1/4... Step: 60... Loss: 0.351336... Val Loss: 0.088068 
Epoch: 1/4... Step: 70... Loss: 0.028270... Val Loss: 0.085113 
Epoch: 1/4... Step: 80... Loss: 0.024787... Val Loss: 0.081984 
Epoch: 1/4... Step: 90... Loss: 0.016288... Val Loss: 0.081004 
Epoch: 1/4... Step: 100... Loss: 0.027513... Val Loss: 0.079952 
Epoch: 1/4... Step: 110... Loss: 0.253308... Val Loss: 0.077065 
Epoch: 1/4... Step: 120... Loss: 0.329916... Val Loss: 0.076185 
Epoch: 1/4... Step: 130... Loss: 0.295668... Val Loss: 0.074330 
Epoch: 1/4... Step: 140... Loss: 0.019882... Val Loss: 0.072703 
Epoch: 1/4... Step: 150... Loss: 0.016161... Val Loss: 0.067215 
Epoch: 1/4... Step: 160... Loss: 0.005318... Val Loss: 0.068186 
Epoch: 1/4... Step: 170... Loss: 0.002181... Val Loss: 0.078808 
Epoch: 1/4... Step: 180... Loss: 0.034508... Val Loss: 0.080732 
Epoch: 1/4... Step: 190... Loss: 0.008713... Val Loss: 0.051247 
Epoch: 1/4... Step: 200... Loss: 0.847413... Val Loss: 0.050752 
Epoch: 1/4... Step: 210... Loss: 0.165887... Val Loss: 0.042462 
Epoch: 1/4... Step: 220... Loss: 0.032503... Val Loss: 0.033708 
Epoch: 2/4... Step: 230... Loss: 0.006288... Val Loss: 0.022869 
Epoch: 2/4... Step: 240... Loss: 0.212718... Val Loss: 0.018187 
75 
 
University of Ghana http://ugspace.ug.edu.gh
Epoch: 2/4... Step: 250... Loss: 0.009774... Val Loss: 0.014826 
Epoch: 2/4... Step: 260... Loss: 0.002254... Val Loss: 0.007697 
Epoch: 2/4... Step: 270... Loss: 0.018461... Val Loss: 0.003845 
Epoch: 2/4... Step: 280... Loss: 0.000709... Val Loss: 0.002665 
Epoch: 2/4... Step: 290... Loss: 0.000634... Val Loss: 0.002383 
Epoch: 2/4... Step: 300... Loss: 0.059355... Val Loss: 0.001749 
Epoch: 2/4... Step: 310... Loss: 0.000500... Val Loss: 0.001550 
Epoch: 2/4... Step: 320... Loss: 0.001336... Val Loss: 0.002043 
Epoch: 2/4... Step: 330... Loss: 0.000684... Val Loss: 0.001425 
Epoch: 2/4... Step: 340... Loss: 0.000218... Val Loss: 0.000794 
Epoch: 2/4... Step: 350... Loss: 0.000231... Val Loss: 0.000818 
Epoch: 2/4... Step: 360... Loss: 0.000346... Val Loss: 0.000770 
Epoch: 2/4... Step: 370... Loss: 0.000268... Val Loss: 0.000734 
Epoch: 2/4... Step: 380... Loss: 0.000173... Val Loss: 0.000699 
Epoch: 2/4... Step: 390... Loss: 0.000275... Val Loss: 0.000666 
Epoch: 2/4... Step: 400... Loss: 0.000431... Val Loss: 0.000858 
Epoch: 2/4... Step: 410... Loss: 0.000888... Val Loss: 0.001160 
Epoch: 2/4... Step: 420... Loss: 0.000726... Val Loss: 0.001105 
Epoch: 2/4... Step: 430... Loss: 0.000593... Val Loss: 0.000954 
Epoch: 2/4... Step: 440... Loss: 0.001276... Val Loss: 0.000815 
Epoch: 3/4... Step: 450... Loss: 0.000467... Val Loss: 0.000716 
Epoch: 3/4... Step: 460... Loss: 0.000421... Val Loss: 0.000644 
Epoch: 3/4... Step: 470... Loss: 0.000431... Val Loss: 0.000584 
Epoch: 3/4... Step: 480... Loss: 0.001586... Val Loss: 0.000535 
Epoch: 3/4... Step: 490... Loss: 0.002078... Val Loss: 0.000494 
Epoch: 3/4... Step: 500... Loss: 0.000290... Val Loss: 0.000455 
Epoch: 3/4... Step: 510... Loss: 0.000451... Val Loss: 0.000428 
Epoch: 3/4... Step: 520... Loss: 0.000849... Val Loss: 0.000389 
Epoch: 3/4... Step: 530... Loss: 0.000214... Val Loss: 0.000364 
Epoch: 3/4... Step: 540... Loss: 0.000246... Val Loss: 0.000342 
Epoch: 3/4... Step: 550... Loss: 0.000238... Val Loss: 0.000322 
Epoch: 3/4... Step: 560... Loss: 0.000267... Val Loss: 0.000305 
Epoch: 3/4... Step: 570... Loss: 0.000278... Val Loss: 0.000292 
Epoch: 3/4... Step: 580... Loss: 0.000153... Val Loss: 0.000279 
Epoch: 3/4... Step: 590... Loss: 0.000145... Val Loss: 0.000265 
Epoch: 3/4... Step: 600... Loss: 0.000663... Val Loss: 0.000255 
Epoch: 3/4... Step: 610... Loss: 0.000175... Val Loss: 0.000243 
Epoch: 3/4... Step: 620... Loss: 0.000238... Val Loss: 0.000232 
Epoch: 3/4... Step: 630... Loss: 0.000354... Val Loss: 0.000224 
Epoch: 3/4... Step: 640... Loss: 0.000138... Val Loss: 0.000216 
Epoch: 3/4... Step: 650... Loss: 0.000542... Val Loss: 0.000210 
Epoch: 3/4... Step: 660... Loss: 0.000121... Val Loss: 0.000204 
Epoch: 4/4... Step: 670... Loss: 0.000098... Val Loss: 0.000198 
Epoch: 4/4... Step: 680... Loss: 0.000137... Val Loss: 0.000194 
Epoch: 4/4... Step: 690... Loss: 0.000134... Val Loss: 0.000191 
Epoch: 4/4... Step: 700... Loss: 0.000125... Val Loss: 0.000189 
76 
 
University of Ghana http://ugspace.ug.edu.gh
Epoch: 4/4... Step: 710... Loss: 0.000164... Val Loss: 0.000178 
Epoch: 4/4... Step: 720... Loss: 0.000126... Val Loss: 0.000173 
Epoch: 4/4... Step: 730... Loss: 0.000081... Val Loss: 0.000168 
Epoch: 4/4... Step: 740... Loss: 0.000342... Val Loss: 0.000164 
Epoch: 4/4... Step: 750... Loss: 0.000103... Val Loss: 0.000160 
Epoch: 4/4... Step: 760... Loss: 0.000155... Val Loss: 0.000155 
Epoch: 4/4... Step: 770... Loss: 0.000095... Val Loss: 0.000149 
Epoch: 4/4... Step: 780... Loss: 0.000104... Val Loss: 0.000145 
Epoch: 4/4... Step: 790... Loss: 0.000105... Val Loss: 0.000139 
Epoch: 4/4... Step: 800... Loss: 0.000207... Val Loss: 0.000135 
Epoch: 4/4... Step: 810... Loss: 0.000096... Val Loss: 0.000131 
Epoch: 4/4... Step: 820... Loss: 0.000110... Val Loss: 0.000127 
Epoch: 4/4... Step: 830... Loss: 0.000311... Val Loss: 0.000124 
Epoch: 4/4... Step: 840... Loss: 0.000286... Val Loss: 0.000120 
Epoch: 4/4... Step: 850... Loss: 0.000110... Val Loss: 0.000116 
Epoch: 4/4... Step: 860... Loss: 0.000080... Val Loss: 0.000113 
Epoch: 4/4... Step: 870... Loss: 0.000149... Val Loss: 0.000110 
Epoch: 4/4... Step: 880... Loss: 0.000088... Val Loss: 0.000107 
Epoch: 4/4... Step: 890... Loss: 0.000058... Val Loss: 0.000104 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
77 
 
University of Ghana http://ugspace.ug.edu.gh
Appendix F: General Overview of Sarcasm Detection Operators Flowchart 
 
 
 
 
  
 
 
78