SELF-REPORTING SYSTEM FOR INCIDENTS DETECTION IN AUTOMATED 

TELLER MACHINE (ATM) USING MACHINE LEARNING TECHNIQUES 

 
BY 

IVY NKRUMAH PAYNE 

  (10701873) 

 
THIS THESIS IS SUBMITTED TO THE UNIVERSITY OF GHANA, 

LEGON, IN PARTIAL FULFILMENT OF THE REQUIREMENT FOR THE 

AWARD OF MPHIL COMPUTER ENGINEERING DEGREE 

 
DEPARTMENT OF COMPUTER ENGINEERING 

SCHOOL OF ENGINEERING SCIENCES 

UNIVERSITY OF GHANA, LEGON 

 
SEPTEMBER 2021

University of Ghana http://ugspace.ug.edu.gh 


i 
 

DECLARATION 

 
I, Ivy Nkrumah Payne, author of this thesis, hereby declare that the work presented in this thesis, 

Self-Reporting System for Incidents Detection In Automated Teller Machine (ATM) 

Using Deep Learning Techniques, is my work, produced from research undertaken under 

supervision in the Department of Computer Engineering, School of Engineering Sciences, 

University of Ghana, Legon from September 2019 to July 2021. This work has never been 

presented either in whole or in part for any other degree in this University or elsewhere. 

 
                                           October 11, 2022 

…………………………………………                       ………………………………………. 

Ivy Nkrumah Payne        Date 

(Student) 

 
                    October 11, 2022 

……………………………………                        ………………………………………. 

Prof. Robert Adjetey Sowah       Date 

(Principal Supervisor) 

University of Ghana http://ugspace.ug.edu.gh 


ii 
 

DEDICATION 

 
This work is dedicated to YAHWEH, GOD ALMIGHTY, and the memory of my late mother, 

Faustina Serwah Agyarkoh. 

 
University of Ghana http://ugspace.ug.edu.gh 


iii 
 

ACKNOWLEDGEMENT  

 
First of all, thanks and exaltation go to God Almighty for granting the strength and wisdom 

throughout this academic journey. I wish to extend my appreciation to my supervisor, Dr. 

Robert Adjetey Sowah, for all the ideas, guidance, and encouragement toward the successful 

completion of this research. I also wish to thank all Department of Computer Engineering 

members, especially Dr. Wiafe Owusu-Banahene, Dr. Godfrey A. Mills, Dr. Nii Longdon 

Sowah, and the entire department graduate committee for their input in this work. 

  
University of Ghana http://ugspace.ug.edu.gh 


iv 
 

ABSTRACT 

 
Automated Teller Machines (ATMs) have increased over the past decade due to their 

advantages in the banking sector. ATMs provide convenience to customers, optimizes banking 

operations, and minimizes transaction cost. However, undesirable security incidents such as 

tempering, skimming, physical attacks, robbery, and transaction reversal fraud may occur on 

ATM systems and negatively affect the user experience and banking institutions. ATM 

incidents occur either by system defect or through a deliberate act of physical attack by an 

intruder. In most security incidents, financial losses are imminent, and the customers' 

confidence in banking reduces. Developing a Self-Reporting System for ATM Incident 

Detection (SRSAID) is needed to avert the threats posed by security incidents on ATM systems. 

This research uses a machine-learning approach to solve this problem. Regional Convolutional 

Neural Network (R-CNN) and Support Vector Machine (SVM) algorithms are used to develop 

a detection model that detects occurrences of security incidents on an ATM system. Datasets 

used in the machine learning model development were obtained from NCR Ghana and the 

online repository. Experimental results showed that two CNN architecture models, ALEXNET 

and ssdlite_mobilenet_V2, obtained an accuracy score of 80% and 96%, respectively. SVM 

classifiers were developed using the linear, polynomial, and radial basis kernels, getting 

accuracy scores of 70.6%, 72.56%, and 81.21%, respectively. The initial results necessitated 

hyperparameter optimization to improve the performance of the classifiers. This resulted in 

improved accuracy scores of 76%, 77%, and 86% for linear, polynomial, and radial basis 

kernels, respectively, for the SVM models. The machine learning model was later deployed on 

a Raspberry Pi system which connected to a web application that provided a graphical user 

interface for user interactivity and viewing of reports. 

University of Ghana http://ugspace.ug.edu.gh 


v 
 

TABLE OF CONTENTS 

DECLARATION ....................................................................................................................... i 

DEDICATION .......................................................................................................................... ii 

ACKNOWLEDGEMENT ...................................................................................................... iii 

ABSTRACT ............................................................................................................................. iv 

TABLE OF CONTENTS ......................................................................................................... v 

LIST OF FIGURES ................................................................................................................ ix 

LIST OF TABLES ................................................................................................................. xii 

LIST OF ABBREVIATIONS .............................................................................................. xiii 

CHAPTER ONE ...................................................................................................................... 1 

INTRODUCTION .................................................................................................................... 1 

1.0 Introduction ................................................................................................................. 1 

1.1 Problem Statement ...................................................................................................... 3 

1.2 Objectives of Study ..................................................................................................... 6 

1.4 Justification of Study ................................................................................................... 6 

1.5 Research Limitation .................................................................................................... 7 

1.6 Thesis Outline .............................................................................................................. 7 

CHAPTER TWO ..................................................................................................................... 9 

LITERATURE REVIEW ........................................................................................................ 9 

2.0 Introduction ................................................................................................................. 9 

2.1 ATM Network Architecture ........................................................................................ 9 

2.2 ATM Incident Detection ........................................................................................... 11 

2.2.1 Object Detection................................................................................................. 11 

2.2.2 Computer Vision ................................................................................................ 12 

2.3 Machine Learning Classification Algorithms ........................................................... 13 

2.3.1 SVM-OVO-OVR Approach .............................................................................. 14 

2.3.2 Convolutional Neural Networks (CNN) ............................................................ 20 

2.3.3 Regional Convolutional Neural Networks (RCNN) .......................................... 21 

2.4 Transfer Learning ...................................................................................................... 23 

2.5 Performance Metrics for Security Incidents Detection ............................................. 24 

University of Ghana http://ugspace.ug.edu.gh 


vi 
 

2.6 Summary of Papers ................................................................................................... 28 

2.7 Conclusion and Summary of Literature Review ....................................................... 38 

CHAPTER THREE ............................................................................................................... 39 

METHODOLOGY ................................................................................................................. 39 

3.0 Introduction ............................................................................................................... 39 

3.1 Proposed System Design ........................................................................................... 39 

3.2 Field Survey .............................................................................................................. 40 

3.2.1 Data Collection................................................................................................... 41 

3.4 Data Pre-processing ................................................................................................... 41 

3.4.1 Security Incident Data Pre-processing ............................................................... 41 

3.4.2 Image Dataset and Description .......................................................................... 42 

3.5 Machine Learning Classification Algorithms ........................................................... 46 

3.5.1 Regional Convolutional Neural Network........................................................... 46 

3.5.2 Support Vector Machines ................................................................................... 48 

3.6 Tools Used ................................................................................................................. 61 

3.6.1 Software Component .......................................................................................... 61 

3.6.2 Hardware Component ........................................................................................ 62 

3.6.3 System Specifications ........................................................................................ 65 

CHAPTER FOUR .................................................................................................................. 66 

3.7 Experimental Method ................................................................................................ 67 

3.7.1 Security Incident Detection Training ................................................................. 67 

SYSTEM DESIGN AND DEVELOPMENT ....................................................................... 94 

4.0 Introduction ............................................................................................................... 94 

4.1 Proposed System Design ........................................................................................... 94 

4.2 Incident Detection ..................................................................................................... 95 

4.2.1 CNN and SVM Incident Detection System Implementation and Testing. .............. 95 

4.3 Implementation of SRSAID ...................................................................................... 98 

4.4 The Hardware Architecture ..................................................................................... 101 

4.5 System Integration Architecture .............................................................................. 102 

SYSTEM IMPLEMENTATION AND TESTING ............................................................ 103 

5.0 Introduction ............................................................................................................. 103 

University of Ghana http://ugspace.ug.edu.gh 


vii 
 

5.1 Testing of SRSAID ................................................................................................. 103 

5.2 Results and Discussions for Security Incident Detection ........................................ 104 

5.3 Security Incident Detection Using ssdlite_mobilenet_V2 ...................................... 108 

5.3.1 Regional Proposal Extraction ........................................................................... 109 

5.3.2 Performance Analysis for Different Mini Batch Size and Epoch .................... 110 

5.4 Results and Discussion for Defects Incident Detection .......................................... 118 

However, the plots of accuracy, AUC, G-mean, and MCC of the classifier are shown in 

APPENDIX B. ...................................................................................................................... 123 

5.5 Research Contribution ............................................................................................. 126 

CHAPTER SIX .................................................................................................................... 128 

CONCLUSION AND RECOMMENDATION ................................................................. 128 

6.1 Conclusion ............................................................................................................... 128 

6.2 Recommendations ................................................................................................... 131 

APPENDICES ...................................................................................................................... 136 

APPENDIX A ....................................................................................................................... 136 

I. EXPERIMENTAL PROCESS IMPLEMENTATION (SECURITY INCIDENT 

DETECTION) .................................................................................................................... 136 

APPENDIX B ....................................................................................................................... 142 

I. FULL IMPLEMENTATION OF SRSAID (SECURITY INCIDENT DETECTION)

 142 

II. CODE IMPLEMENTATION FOR SOFTWARE: DJANGO VIEW .................... 143 

APPENDIX C ....................................................................................................................... 146 

EXPERIMENTAL PROCESS IMPLEMENTATION (DEFECTS INCIDENT 

DETECTION) .................................................................................................................... 146 

APPENDICES ...................................................................................................................... 149 

APPENDIX B ....................................................................................................................... 149 

I. PLOT OF PERFORMANCE METRICS FOR DEFECT INCIDENT DETECTION 

USING SVM CLASSIFIER ............................................................................................... 149 

APPENDIX C ....................................................................................................................... 159 

I. PERFORMANCE METRIC FOR ATM SECURITY INCIDENTS IMAGE 

DATASET USING ............................................................................................................. 159 

R-CNN ( SSDLITE_MOBILENET_V2) ........................................................................... 159 

University of Ghana http://ugspace.ug.edu.gh 


viii 
 

II. OTHER PERFORMANCE METRIC FOR ATM SECURITY INCIDENTS IMAGE 

DATASET USING R-CNN ( SSDLITE_MOBILENET_V2) .......................................... 160 

III. PERFORMANCE METRIC FOR ATM TAMPERING IMAGE DATASET USING 

R-CNN (ALEXNET) ......................................................................................................... 161 

IV. OTHER PERFORMANCE METRIC FOR ATM TAMPERING IMAGE DATASET 

USING R-CNN (ALEXNET) ............................................................................................ 162 

APPENDIX E ....................................................................................................................... 163 

V. PERFORMANCE METRIC FOR 2018 ATM SYSTEM DEFECT DATASETS 

USING SVM ...................................................................................................................... 163 

II. OTHER PERFORMANCE METRICS FOR 2018 ATM SYSTEM DEFECT 

DATASETS USING SVM ................................................................................................. 165 

OTHER PERFORMANCE METRICS FOR 2018NCR ATM SYSTEM DEFECT 

DATASETS USING RANDOM FOREST ........................................................................ 167 

OTHER PERFORMANCE METRICS FOR 2018NCR ATM SYSTEM DEFECT 

DATASETS USING DECISION TREE ............................................................................ 169 

 
University of Ghana http://ugspace.ug.edu.gh 


ix 
 

LIST OF FIGURES 

Figure 1. 1 Fraud type and Gross loss in Ghana [7] ............................................................... 5 
 
Figure 2. 1 ATM Network Diagram ....................................................................................... 9 

Figure 2. 2 Class boundaries for SVM-OVR formulation of the three-class problem ......... 16 

Figure 2. 3 The distance between and of the multiclass classification ................................. 18 

Figure 2. 4 CNN Architecture .............................................................................................. 20 

Figure 2. 5 Stages of R-CNN forward Computation ............................................................ 22 
 
Figure 3. 1 Flowcharts for Proposed Design ........................................................................ 39 

Figure 3. 2 Sample Images for ATM Incident Attacks ........................................................ 44 

Figure 3. 3 Summaries of defect incident datasets ............................................................... 45 

Figure 3. 4 Plot of summaries of defect incident datasets .................................................... 45 

Figure 3. 5 Stages of R-CNN forward computation ............................................................. 47 

Figure 3. 6 RoI Pooling Layers ............................................................................................ 48 

Figure 3. 7 Standard formulation of SVM ............................................................................ 51 

Figure 3. 8 Linear separating hyperplanes for the nonseparable case of SVC by introducing 

the slack variable (ξ). ............................................................................................................... 56 

Figure 3. 9 Nonlinear separating hyperplane for the nonseparable case of SVM ................ 58 

Figure 3. 10 OVO approach on multiclass          Figure 3. 11 OVR approach on multiclass

 59 

Figure 3. 12 OVO approach on multiclass taking all points into account. ......................... 60 

Figure 3. 13 Raspberry pi 3 model B .................................................................................. 62 

Figure 3. 14 Pi Camera ....................................................................................................... 63 

Figure 3. 15 GPS and GSM module ................................................................................... 64 

Figure 3. 16 Main Building Block of ssdlite_MobileNet_V2 ............................................ 71 

Figure 3. 17 Operations of MobileNet V2 .......................................................................... 72 

Figure 3. 18 Architecture for R-CNN ................................................................................. 74 

Figure 3. 19 Stages of R-CNN forward computation ......................................................... 77 

Figure 3. 20 Conceptual model design and development of CNN with ssdlite MobileNet V2 

Architecture. 78 

Figure 3. 21 Flow chart model design and development of CNN / R-CNN ....................... 79 

University of Ghana http://ugspace.ug.edu.gh 


x 
 

Figure 3. 22 CNN Security Incident Detection Architecture .............................................. 80 

Figure 3. 23 CNN Classification Model Architecture ........................................................ 81 

Figure 3. 24 CNN Classification Model Architecture ........................................................ 81 

Figure 3. 25 Illustration of transformation between predicted and ground-truth bounding 

boxes 84 

Figure 3. 26 Conceptual model design and development of the support vector machines. 86 

Figure 3. 27 Data preprocessing for SVM training and testing. ......................................... 88 

Figure 3. 28 Flow chart for design and development of the support vector machines ....... 90 

Figure 3. 29 OVO classification for the multiclass problem. ............................................. 91 
 
Figure 4. 1 System Implementation Architecture for SRSAID ............................................ 95 

Figure 4. 2 Detection results control portal interface. .......................................................... 97 

Figure 4. 3 Proposed System Block Diagram ...................................................................... 98 

Figure 4. 4 Hardware Implementation Architecture for SRSAID ...................................... 101 

Figure 4. 5 The SRSAID Integration Architecture ............................................................. 102 
 
Figure 5. 1 Use case Diagram for SRSAID ........................................................................ 104 

Figure 5. 2 GSM alerts received from the Raspberry PI on a mobile phone...................... 105 

Figure 5. 3 Autocreation of Results Database .................................................................... 106 

Figure 5. 4 Dashboard Results ............................................................................................ 107 

Figure 5. 5 Location on a map through GPS. ..................................................................... 107 

Figure 5. 6 Security Incident Detection Using Faster R-CNN ........................................... 108 

Figure 5. 7 Region Proposal Extraction using types of security incidents. ........................ 109 

Figure 5. 8 Performance analysis mini-batch size of ssdlite_mobilenet_V2 and ALEXNET.

 112 

Figure 5. 9 Performance analysis of Epoch from steps 10-100. ......................................... 113 

Figure 5. 10 Performance analysis of Epoch from step 110-200 ...................................... 114 

Figure 5. 11 Performance analysis of Epoch from step 210-300 ...................................... 115 

Figure 5. 12 Performance analysis of Epoch from step 310-400 ...................................... 116 

Figure 5. 13 Performance analysis of Epoch from step 410-499 ...................................... 117 

Figure 5. 14 Performance analysis of cross-entropy and validation accuracy at every epoch.

 118 

Figure 5. 15 The SVM Optimization History Plot ............................................................ 118 

University of Ghana http://ugspace.ug.edu.gh 


xi 
 

Figure 5. 16 The slice plot of the SVM model optimization ............................................ 119 

Figure 5. 17 Graph showing the comparison of accuracy for classifiers in the raw state. 119 

Figure 5. 18 Graph showing the comparison of accuracy for classifiers after hyperparameter 

tuning. 120 

Figure 5. 19 Plot of Confusion Matrix with class labels................................................... 121 

Figure 5. 20 Plot of ACC of the SVM classifier of the predicted labels. ......................... 122 

Figure 5. 21 Classification Report .................................................................................... 123 

Figure 5. 22 Comparative Analysis on precision of the classifiers. ................................. 124 

Figure 5. 23 Comparative Analysis on roc_auc_score of the classifiers. ......................... 125 

 
University of Ghana http://ugspace.ug.edu.gh 


xii 
 

LIST OF TABLES 

Table 2. 1 Confusion Matrix ............................................................................................... 25 

Table 2. 2 Summary of Papers ............................................................................................ 28 
 
Table 3. 1 Summary of Image Dataset ................................................................................ 42 

Table 3. 2 Kernel Functions ................................................................................................ 58 

Table 3. 3 System Specifications ........................................................................................ 66 
 
Table 5. 1 Security Incident Detection using ssdlite_mobilenet_V2 ................................ 110 

Table 5. 2 Security Incident Detection using ALEXNET ................................................. 110 

Table 5. 3 Averages performance analysis of SVM classifiers......................................... 122 

Table 5. 4 Comparative Analysis on roc_auc_score of the classifiers .............................. 124 

 
University of Ghana http://ugspace.ug.edu.gh 


xiii 
 

LIST OF ABBREVIATIONS 

 
ATM                                            Automated Teller Machine 

CNN                                             Convolutional Neural Network 

FN                                                False Negative  

FP                                                 False Positive 

GPS                                             Global Positioning System  

GSM                                           Global System for Mobile Communication  

MCC                                          Matthews Correlation Coefficient 

R-CNN                                         Regional Convolutional Neural Network 

SRSAID                                       Self-Reporting System for ATM Incident Detection  

SVM                                             Support Vector Machines 

TN                                                True Negative 

TP                                                 True Positive                                       

 
University of Ghana http://ugspace.ug.edu.gh 


1 
 

CHAPTER ONE 

INTRODUCTION 

 
1.0 Introduction  

Technology has brought about many improvements in human livelihood, and one of the few is 

the usage of Automated Teller Machines (ATMs) for financial institutions. ATM is an 

electronic banking outlet that allows customers to complete basic transactions without the aid 

of a branch representative or teller. An ATM provides clients of a Bank with 24Hours deposit 

and withdrawal services [1][2]. Cash dispenser machines or ATMs are installed in Banks and 

other strategic locations for the convenience of the customer.  As we move towards social 

distancing and a cashless society, ATMs will continue to play a vital role in paperless financial 

transactions. All these show the importance of ATMs in financial transactions; however, 

security and other technical system defects remain a key challenge for Banks and their ATMs 

[2]. 

In security management, Banks deploy security personnel to monitor their ATMs in addition to 

other technologies such as Close Circuit Television (CCTV) cameras. Monitoring ATMs can 

become tedious due to the complexity and multiplicity of ATM devices located in and off the 

Bank’s premises. Some of the monitoring activities performed by the security personnel include 

the following[3]: 

❑ Monitoring security incidents and the operational status of ATMs. 

❑ Monitoring system defects incidents. 

❑ Protection from attempted ATM tampering, theft, and vandalism. 

University of Ghana http://ugspace.ug.edu.gh 


2 
 

❑ Dealing with perpetrators [4]. 

While some form of security is incorporated in ATM installations, these security mechanisms 

are often subjected to several escalating security incidents. These security incidents can be 

categorized as follows; 

❑ Card skimming 

❑ Card trapping 

❑ Transaction Reversal Fraud 

❑ Cash trapping 

❑ Physical attack 

❑ Robbery 

❑ Jackpotting 

❑ Logical attacks 

❑ Occlusion. 

 Banks and other financial institutions work with third-party companies like NCR Corporation 

for their ATM installations and related services. NCR Corporation is a leading tech company 

that provides digital solutions to companies, hospitals, and financial institutions. Its subsidiary 

company, NCR Ghana, is a significant industry player in the financial sector in Ghana. Their 

core services include installing, configuring, and troubleshooting ATM devices for the banks 

they work with. The problems associated with ATM systems, especially on the technical side, 

become a shared responsibility between financial institutions and the ATM Service Companies 

like NCR Ghana [5]. Some of the technical problems associated with ATMs include the 

following: 

❑ Dispenser Problem 

University of Ghana http://ugspace.ug.edu.gh 


3 
 

❑ Non-remedial call 

❑ Card reader problem 

❑ Pick drive mechanism 

To mitigate problems associated with ATMs, there is a need for a distributed solution that 

integrates system defect solutions with security incident solutions using a machine learning 

approach. This solution offers an advantage over the physical monitoring of ATMs; it also 

serves as a proactive mechanism for system defects and security incidents.   

1.1 Problem Statement 

Research on the ATM system, its management, and related challenges in the banking sector has 

been covered in many forms. While some are large –scale, few are done on electronic banking 

systems and networks such as ATMs [6]. The proliferation of ATMs in the country has aided 

business; it has also been a breeding ground for fraudsters. Security incidents in ATMs lead to 

financial losses for the banks. According to the Bank of Ghana (BoG) 2017 Annual Report, 

banks lost over GH₵ 30 million, a significant chunk of this due to ATM fraud [7]. 

Recently, most ATMs have in-built cameras that capture activities around them. While these 

cameras serve as security mechanisms, they do so in a passive state. For instance, the cameras 

can record footage of security incidents such as skimming and tampering on ATMs; the security 

camera cannot prevent fraudsters from perpetrating the act. This security mechanism (CCTV 

cameras in ATMs) has proven inefficient due to its inherent limitations, preventing ATM 

incidents in an active state or environment. On the other hand, technical defect incidents also 

share the challenge with ATM security incidents. This is because ATMs with technical 

University of Ghana http://ugspace.ug.edu.gh 


4 
 

problems can be left unattended for months. This is due to the lack of a self-reporting 

mechanism in ATMs.   

To ensure the availability, reliability, and security of ATM services, answers should be provided 

to the following fundamental questions: 

❑ What would be the reaction of the ATM user after reading a message from the screen 

that the machine is temporarily down?  

❑ How do we automatically detect ATM problems or incidents at the onset and make an 

intelligent decision on how to respond to them to bring perpetrators on board, improve 

profitability, reduce operational support costs, and deliver an amazing customer 

experience? 

ATM security incidents were reported at approximately Gh₵1.7million in the 2017 Bank of 

Ghana Annual Report. Again, Figure 1.1 shows the distribution of the banking sector fraud type 

and losses, most of which are cybercrime, in which ATMs fall under this category.  

University of Ghana http://ugspace.ug.edu.gh 


5 
 

Figure 1. 1 Fraud type and Gross loss in Ghana [7]  

 
ATM faults reported 3970 cases of the Dispenser problem according to NCR, Ghana, 2018 

ATM incidents report.  

ATM security and system defect incidents that cause financial loss and temporarily put the 

machine down can be prevented by automatically self-reporting these incidents to the Banks, 

ATM service providers, and security officials would efficiently enhance the handling of 

incidents. 

The proposed self-reporting system for ATM incidents detection uses Machine Learning 

techniques for image classification of attempted instances of ATM security incidents. This 

technique is extended to the ATM system defects aspect of the problem. This method uses 

internet of things (IoT) components such as Global System for Mobile Communications (GSM) 

Alerts embedded in a software application that interacts with a Global Positioning System 

University of Ghana http://ugspace.ug.edu.gh 


6 
 

(GPS) system, which points to the location of a detected incident using the Google Maps 

Application Programming Interface (API). 

1.2 Objectives of Study  

This study aims to design and implement a self-reporting security and system defect application 

for ATMs in Ghana.  

The specific objectives of this study are as follows:  

❑ Design and develop a security incident detection model for ATMs based on 

Convolutional Neural Networks (CNN). 

❑  Design and develop a defective incident detection model based on Support Vector 

Machines (SVMs). 

❑  Develop along with GSM alerts and indications on Google Maps for locating ATM 

incidents.  

❑ Develop and deploy a user interface (UI) that provides access to monitoring all ATMs 

and system resources on the ATM network. 

❑ Develop an integrated system for incident and defect detection modules with 

microcontroller-based hardware for overall proof of concept. 

❑ Test the proposed model on a real-world ATM terminal on a Bank’s ATM and inject a 

range of common incidents to evaluate efficacy. 

1.4 Justification of Study 

As we move towards a cashless economy, ATMs and Information Communication Technology 

(ICT) related technologies will play a vital role in banking activities. ATM security and its 

University of Ghana http://ugspace.ug.edu.gh 


7 
 

defect management continue to become a significant concern for Banks, ATM Service 

Providers, and other third-party stakeholders like state security, the Bank of Ghana, and the 

general public. The financial losses due to ATM fraud associated with tampering have been 

enormous, and there is an urgent need for improvements to mitigate such occurrences.  

While contributing to the scientific community, this research also addresses a real-world 

banking sector problem for decades. This research is not intended to phase out the work of 

ATM technicians; however, it seeks to augment and improve their services in a timely and 

efficient manner. 

1.5 Research Limitation 

The research takes into account two main aspects of ATM problems; security incidents and 

system defect incidents. While the security incidents are reported in real-time through the 

issuance of SMS alerts to mobile devices of the banks and ATM service providers, a dashboard 

interface also receives such reports. The system defect problems, on the other hand, are only 

reported on the dashboard. This implies a worker from the bank needs to monitor the system 

for periodic updates.   

1.6 Thesis Outline 

This study exhibited in this thesis gives details of Machine Learning techniques for ATMs to 

detect incidents and is organized into six chapters. 

The remaining chapters continue as follows: 

Chapter two presents a literature review of the current research on ATM incidents; Chapter two 

provides a literature review of the recent research on ATM incidents, Machine Learning 

University of Ghana http://ugspace.ug.edu.gh 


8 
 

techniques and algorithms, fine-tuning techniques, and their applications to detect ATM 

incidents. 

Chapter three discusses system requirements, specification analysis, blueprints, and the careful 

selection of development tools for the incident detection models. 

Chapter four focuses on the development of the models and discussions on accuracy metrics. 

Results and findings are interpreted as well. 

Chapter five presents the system simulation and experiment, implementation process, testing 

and results, and discussion of results. The outcome of the experimental processes will illustrate 

the capabilities of the proposed system to perform security incidents detection and security 

incidents categorization. Moreover, ATM system defects incident detection is discussed as 

well. 

Chapter six summarizes the findings of the work reported in this thesis, challenges and 

observations, and suggestions for future work.  

University of Ghana http://ugspace.ug.edu.gh 


9 
 

CHAPTER TWO 

LITERATURE REVIEW 

 
2.0 Introduction 

This chapter reviews related work on ATM security and defect incidents. It also reviews work 

on several methodologies and proposed solutions within the scope of ATM fraud and technical 

defects. These solutions' algorithms and performance metrics are presented in a tabular format.  

2.1 ATM Network Architecture 

 
Figure 2. 1 ATM Network Diagram 

 
University of Ghana http://ugspace.ug.edu.gh 


10 
 

Figure 2.1 shows the network diagram for an ATM system. In the diagram, the subcomponents 

of the network setup are listed below: 

❑ ATM  

❑ Switch 

❑ Router 

❑ File server 

❑ Firewall 

❑ Customer/services such a VISA card and Master Card. 

Based on technical information gathered at the NCR, the components of an ATM system can 

be grouped into two: thus, the upper part and the lower part. The upper part comprises the 

central processing unit (CPU) and the card reader. The lower part of the ATM consists of two 

parts: suction technology and domain safe. 

The Suction Technology 

The suction technology is a mechanism the ATM uses to dispense money to the upper part of 

the ATM. It enables the cassettes to present the dispensed cash in belts [8]. 

The Domain Safe 

This part of the ATM contains the presenter, the network card, and other networking 

components. The NCR ATMs run a middleware that enhances the network communication 

between the switch and the card reader [8].   

University of Ghana http://ugspace.ug.edu.gh 


11 
 

2.2 ATM Incident Detection 

Incident detection is finding an attacker's activities who deliberately seek to circumvent 

protocols on network infrastructure. Some malicious activities are mitigated by retracing, 

containing the threats, and removing their foothold [9]. Learning how attackers compromise 

systems and move around the network can better detect and stop attacks before valuable data is 

stolen or the system collapses. 

2.2.1 Object Detection 

Object detection is one of the classical problems of computer vision and is often described as a 

difficult task. Object detection is a computer vision task because it involves creating a solution 

invariant to deformation and changes in lighting and viewpoint. Object detection is a problem 

because it involves locating and classifying an image's regions [10]. We need to know where 

the object might be and how the image is segmented to detect the object. That creates a type of 

chicken-and-egg problem.  

To recognize the shape (class) of an object, its location must be known, and to identify the 

location of an object, its shape must be known [11]. Some visually dissimilar features, such as 

the clothes and faces of a human being, may be parts of the same object, but it is difficult to 

know this without recognizing the object first. On the other hand, some objects stand out 

slightly from the background, requiring separation before recognition[12].  Low-level visual 

features of an image, such as a saliency map, may be used as a guide for locating candidate 

objects [13].  

The location, shape, and size are typically defined using a bounding box stored in corner 

coordinates. Using a rectangle is more straightforward than using an arbitrarily shaped polygon, 

University of Ghana http://ugspace.ug.edu.gh 


12 
 

and many operations, such as convolution, are performed on rectangles in any case. The sub-

image in the bounding box is then classified by an algorithm trained using machine learning 

[14].  

The boundaries of the object can be further refined iteratively after making an initial guess [15]. 

During the 2000s, popular solutions for object detection utilized feature descriptors, such as 

scale-invariant feature transform (SIFT) [16] developed by David Lowe in 1999, and histogram 

of oriented gradients (HOG) popularized in 2005 [16]. In the 2010s, there has been a shift 

towards utilizing convolutional neural networks [14][10][17].  

Before the wide-scale adoption of CNNs, there were two competing solutions for generating 

bounding boxes. A dense set of region proposals is generated in the first solution, and most of 

these are rejected [18]. This typically involves a sliding window detector. The second solution 

generates a sparse set of bounding boxes using a region proposal method, such as Selective 

Search [12]. Combining sparse region proposals with convolutional neural networks has 

provided good results and is currently popular [10]. 

2.2.2 Computer Vision 

Computer vision is divided into object detection and extracting meaningful information from 

digital images or video content. This is distinct from mere image processing, which involves 

manipulating visual information on the pixel level. Computer vision application includes image 

classification, visual detection, 3D scene reconstruction from 2D images, image retrieval, 

augmented reality, machine vision, and traffic automation[19]. 

Today, deep learning and machine learning are necessary for many computer vision algorithms. 

The algorithms can be described as image processing and machine learning. Effective solutions 

University of Ghana http://ugspace.ug.edu.gh 


13 
 

require algorithms that can cope with the vast amount of information in visual images and, 

critically for many applications, can perform the computation in real time[20]. 

2.3 Machine Learning Classification Algorithms 

Classification is a technique to categorize datasets into a desired and distinct number of classes 

where labels can be assigned to each class. Classification applications are image classification, 

speech recognition, handwriting recognition, and others [21]. 

 
Classification algorithms weigh the input features so that the output separates one class from 

the other. Classifier training is performed to identify the weights (functions) that provide the 

data classes' most accurate and best separation. Classification decisions are made based on the 

results of classifiers. Such classification algorithms are Linear Discriminant Analysis (LDA), 

Support Vector Machines (SVM), Random Forest, and Artificial Neural Networks [21][22]. 

LDA is the most basic classifier that identifies the linear weighting of multifactorial data as a 

means to maximize the distance between the means of the classes. Classification algorithms 

such as SVM, ANN, and others perform well on large datasets and are the recent computational 

approaches that generate more complex divisions between classes of the datasets[15][19][23]. 

Multiple classifiers can also be used, and training and classification decisions can be made 

based on the results of all the classifiers. The success of the classification algorithms is 

determined by the performance metrics[22]. 

The classification algorithm processes the features to produce a class decision. These processes 

involve a statistical algorithm that maps the different classes into different regions of feature 

space.   

University of Ghana http://ugspace.ug.edu.gh 


14 
 

2.3.1 SVM-OVO-OVR Approach 

The most accurate methodology for ATM defect incidents datasets is a combination of the 

OVR approach and SVM using all components provides a powerful technique. A multi-class 

classification problem can be defined as: 

Given n  i.i.d. data points: 

( ) ( )1 1, ,...., , ,such that 1,...,  and {1,...., }d

n n i ix y x y x R for i n y K =        (2.1) 

The class label for the data point ix  determines a classifier with the decision function, ( )f x   

such that ( )y f x= where y is the class label for 𝑥 [24]. The performance of the classifier is 

measured in terms of total classification error or classification accuracy over a set of testing 

data. The classification error for data point x is defined as:  

                       ( )( )
( )0  

  ,   
1

if y f x
E y f x

otherwise

=
= 


     
 (2.2) 

There are two approaches suggested for multi-class SVM in literature [21]. One is considering 

all data in one optimization. The other is decomposing multi-class into a series of binary SVMs, 

such as "One-versus-Rest" (OVR) and "One-versus-One" (OVO) [24][25][22]. 

OVO-SVM is the most basic scheme used for the implementation of SVM-multiclass 

classification. With this simplest SVM extension to the k class−  problem, k  binary SVM 

models are constructed. In 
thj binary class−  SVM problem jclass C is separated from the 

remaining classes. All k  binary SVM classifiers are combined to make a final multiclass 

University of Ghana http://ugspace.ug.edu.gh 


15 
 

classifier. The remaining classifiers mean that all the data points from classes other than jc  are 

combined to form one jclassC . Given the ATM defects training dataset, correspondingly to 

: nXn R F in the feature space F , the calculated optimal hyperplane that separates data points 

from the jclassC  , and the icombined classC , is found using SVM methodology’. The optimal 

separating hyperplane distinguishing the jclassc and the icombined classc  icombined classc  is 

represented as: 

( ) ( ). , 1,.....j j jg x w x b j k= + =         (2.3) 

Assuming the training dataset is correctly classified, as shown in Figure 4.2.4, the SVC 

computes the hyperplane to maximize the margin separating the classes [26] (area of failure 

description, cause code description, and problem code description). 

The problem is a quadratic optimization problem (QP) which gives the dual formulation of the 

Langrangian. The dual Langrangian (LD) is maximized with non-negativity ( )jg x   and can be 

determined by solving the following dual from: 

( )
1 1

1
Maximize : ,

2

n n m

D i i j i j i j

i i j

L y y k x x  
= =

= −    

 (2.4) 

 
0 1,....,i iSubject to C n  =  

               
And 

1

0
n

i i

i

y
=

=        (2.5) 

                                                                                                                                
University of Ghana http://ugspace.ug.edu.gh 


16 
 

The decision rule  ( )jf x   that assigns the vector x  to the jclassC  or the combined iclassc  is 

given by: 

( ) ( )( )j jf x sign g x=      (2.6) 

 
If there are no positive votes or more than one classifier with positive votes, then no decision 

about the class label is made.  

g
1

 (x) = 0

   g
3

      (x) = 0

    g  (x) = 0
2

Class 1      Class 2

     
Class 3

 
Figure 2. 2 Class boundaries for SVM-OVR formulation of the three-class problem 

 
The main difficulty in this approach is that the outputs of the classifiers ( )jf x  are binary 

values. The usual way to handle this problem is to ignore the sign operator in the equation. 

After finding all the optimal hyperplanes given by ( ) 1,.....ig x for j k= , we say 𝑥 is in the class 

which has the largest value of the decision function and is given by: 

 arg max 1,.... , jclass label x j k g x= =                             (2.7)                                                         

University of Ghana http://ugspace.ug.edu.gh 


17 
 

In this approach, the index of the largest component of the discriminant functions: 

( ) , 1,....jg x j k=                                (2.8)                                                                                        

This is assigned to the data point x  [24]. This approach is called winner-takes-all. To make a 

final decision, k   binary problem should be solved. A dual problem has to be translated to solve 

each binary problem containing n  data points.  

Hence, ( ).k n variable quadratic programming problems are to be solved. Class boundaries for 

the three-class problem are shown above in Figure: 2.2. 

Mathematical Foundation for Support Vector Machines (SVM) 

Support vector machines (SVM) are statistical machine learning algorithms that classify by 

finding a hyper-plane that maximizes the margins between classes”. “SVM is limited to very 

large datasets due to the dense nature and memory requirement of the quadratic form of the 

dataset” [26]. Given the ATM defects datasets, correspondingly to : nXn R F in the feature 

space F . 

A hyperplane is defined by its normal vector . “Given a hyperplane   and a point x define 

0x to be the closest point to x  the hyperplane, which is the closest point to x  that satisfies

0. 0w x =  as in figure 2.3. 

 
University of Ghana http://ugspace.ug.edu.gh 


18 
 

W. x = k

W. x = k

x*
0

  x*

 
Figure 2. 3 The distance between and of the multiclass classification 

 
The following two equations are obtained: 

. for some ,w x k=  

0. 0w x =       (2.9) 

Subtracting the above equation, we obtained the following: 

( )0.w x x k− =      (2.10) 

Dividing by the norm of w , this is obtained: 

( )0.
|| || || ||

w k
x x

w w
− =      (2.11) 

0are a unit vector and is parallel to 
|| ||

w
x x w

w
−  

University of Ghana http://ugspace.ug.edu.gh 


19 
 

0|| ||
|| ||

k
x x

w
− =                              (2.12)                                                                                            

Nevertheless, SVM is an excellent example of supervised learning that maximizes the 

generalization by maximizing the margin and supports the nonlinear separation using 

kernelization. SVM tries to avoid over-fitting and under-fitting. The margin in SVM denotes 

the distance from the boundary to the closest data point in the feature space [22][21][25]  

In general, there may be many separating hyperplanes. In this problem, this separating 

hyperplane is the boundary separating a given ATM defect incident class from the rest (OVR) 

or separating two different ATM defect incident classes (AP). The hyperplane computed by the 

SVM is the maximal margin hyperplane, the hyperplane with maximal distance to the nearest 

data point. Finding the SVM solution requires training an SVM, which entails solving a convex 

quadratic program with as many variables as training points[27]. 

Using the OVR methodology to combine binary SVM classifiers into a multiclass classifier. A 

separate SVM is trained for each class, and the winning class has the largest margin, which can 

be considered a signed confidence measure. In the experiments described in this thesis, there 

were few data points in many dimensions. Therefore, a kernel corresponding to a linear 

(regularized) classifier was used as the SVM solution. Although we did allow the hyperplane 

to make misclassifications, in all cases involving the full 16,063 dimensions, each OVR 

hyperplane fully separated the training data with no errors. Some training errors in some 

experiments involved explicit feature selection with very few features. This may indicate that 

we could select a very small number of features and then use a nonlinear kernel function to 

improve classification; however, preliminary experiments with this approach yielded no 

improvement over the linear case. An SVM is trained using all features. The features are ranked 

University of Ghana http://ugspace.ug.edu.gh 


20 
 

according to the magnitude of the elements of the resulting hyper plane, so the importance of 

feature i is the weight of the ith element of the hyper-plane. 

2.3.2 Convolutional Neural Networks (CNN) 

The convolutional neural network is a deep learning algorithm that can take an input image, 

assign weights and biases to various aspects or objects in the image, and distinguish one from 

the other, as indicated in figure 2.4. A convolutional neural network is mainly for image 

classification.  

A concept inspired the basic idea of CNN in biology called the receptive field [13][20]. 

Receptive fields are a feature of the animal visual cortex [28]. They act as detectors sensitive 

to certain stimulus types, for example, edges. They are found across the visual field and overlap 

each other.  

flattened

Conv-1
Convolution

5*5
Kernel valid padding

Max pooling
2*2

Conv-2
5*5

Convolution
Kernel valid padding

Max pooling
2*2

FC-3
Fully connected 
Neural Network

Activation

FC-4
Fully connected 
Neural Network

Input (f)
28*28*1

n1
Channel 
24*24*

n1

n1
Channel 
12*12

n1

n2
Channel 

8*8
n2

n2
Channel 
24*24

n1

n3
units

Filter matrix g
output

Robbery

Physical attack

skimming

Occlusion

 
Figure 2. 4 CNN Architecture 

University of Ghana http://ugspace.ug.edu.gh 


21 
 

This biological function can be approximated in computers using convolution [20]. In image 

processing, images can be filtered using convolution to produce different visible effects. Figure 

2.4 shows how a hand-selected convolutional filter detects horizontal edges from an image, 

functioning similarly to a receptive field. The discrete convolution operation between an Image 

f and a filter matrix g is defined as:  

         
1 1

, , * , , ,
M N

m n

h x y f x y g x y f n m g x n y m
= =

= = − −      (2.13) 

In effect, the dot product of the filter g   and a sub-image of f   (with same dimensions as g) 

entered on coordinates ;x y   produces the pixel value of h  at coordinates ;x y . The size of the 

filter matrix adjusts the size of the receptive field. Aligning the filter successively with every 

sub-image of f  produces the output pixel matrix h . In neural networks, the output matrix is 

also called a feature map or an activation map after computing the activation function). Edges 

need to be treated as a special case [20]. The output size decreases slightly with every 

convolution if an image is not padded. 

2.3.3 Regional Convolutional Neural Networks (RCNN) 

Regional Convolutional Neural Network (R-CNN) is a state-of-art visual object detection 

system that combines bottom-up region proposal with rich features computed by a 

convolutional neural network. R-CNN forward computation has several stages [10][13][20]. 

First, the regions of interest (ROIs) are generated. The ROIs are category-independent bounding 

boxes with a high likelihood of containing an interesting object. A separate method called 

Selective Search is used to generate these regions [12][17][29]. 

Given an image, the R-CNN uses selective search to generate around 2000 region proposals to 

compute features using a Convolutional Neural Network (CNN). Region proposals are regions 

University of Ghana http://ugspace.ug.edu.gh 


22 
 

that include the potential object. It will be wrapped as 277 277  RGB to fit into CNN. Feature 

extraction is done in the CNN layers and passed to multiple binary classifiers to determine the 

class for particular regions. Figure 2.5 illustrates the R-CNN stages. 

Label 1

Label 2

Label 3

Label 9

Label 10

Label 11

1 
original input 

image

2
Feature 
proposal 

extraction

3
A wraped
proposal 

4
Classification using CNN

5
Classification

 
Figure 2. 5 Stages of R-CNN forward Computation 

 
R-CNN forward computation has several stages. First, the regions of interest are generated. The 

Regions of interest are category-independent bounding boxes with a high likelihood of 

containing an interesting object. Selective Search [12] is used for generating these.  

Next, a convolutional network extract features from each region proposal. The sub-image 

contained in the bounding box is warped to match the input size of the CNN and then fed to the 

network. After the network has extracted features from the input, the features are input to the 

regional convolutional neural network (R-CNN) that provides the final classification. The 

requirements in the pre-processing stage in a convolutional neural network are much lower than 

University of Ghana http://ugspace.ug.edu.gh 


23 
 

in other classification algorithms. CNNs have the ability to their filters or detectors through 

enough training. 

A concept inspired the architecture of CNN in biology called the receptive field [10]. Receptive 

fields are a feature of the animal visual cortex [28]. They act as detectors sensitive to certain 

stimulus types, for example, edges. They are found across the visual field  

and overlap each other [20]. 

2.4 Transfer Learning 

Transfer learning (TL) is a research problem in machine learning (ML) that focuses on storing 

knowledge gained while solving one problem and applying it to a different but related problem. 

Transfer learning is an approach to transferring a part of the network that has already been 

trained on a similar task while adding one or more layers at the end and then re-train the model. 

Transfer learning is the idea of overcoming the isolated learning paradigm and utilizing 

knowledge acquired for one task to solve related ones[30][31]. Transfer learning is something 

that data scientists and researchers believe can further our progress toward Artificial general 

intelligence (AGI)[32]. AGI is the hypothetical intelligence that can understand or learn any 

intellectual task that a human being can do [31]. It is a primary goal of some artificial 

intelligence research and a common topic in science fiction and future studies. AGI can also be 

referred to as strong AI, full AI, or general intelligent action [31]. Some academic sources 

reserve "strong AI" for machines that can experience consciousness. Today's AI is speculated 

to be decades away from AGI [31]. 

In transfer learning, the neural network (NN) such as CNN is trained in two stages, namely: 

❑ Pre-training  

❑ Fine-tuning. 

University of Ghana http://ugspace.ug.edu.gh 

https://en.wikipedia.org/wiki/Artificial_general_intelligence
https://en.wikipedia.org/wiki/Human_being
https://en.wikipedia.org/wiki/Artificial_intelligence
https://en.wikipedia.org/wiki/Artificial_intelligence
https://en.wikipedia.org/wiki/Science_fiction
https://en.wikipedia.org/wiki/Futures_studies
https://en.wikipedia.org/wiki/Chinese_room#Strong_AI


24 
 

The network is trained on a large-scale benchmark dataset representing a wide range of 

categories with the pre-training. For fine-tuning stage, the network is further trained on the 

specific target task of interest, which usually has fewer labeled examples than the pre-training 

datasets. Such neural network architectures include ALEXNET, 

ImageNet, ResNet50, Inception, Mobilenet V2, and others. 

Researching in transfer learning, some studies suggest that deep learning models trained for a 

classification task can be employed for classification. Thus, the CNNs models trained on a 

specific dataset or task can be fine-tuned for a new task, even in a different domain [33]. It has 

been applied successfully for visual categorization tasks in object recognition, image 

classification, and human action recognition[33]. 

 
2.5 Performance Metrics for Security Incidents Detection 

Among the commonly used performance metrics for security incident detection are the mini-

batch size and epoch.  

Mini-Batch size: Mini-batch size is the subset of the training images at every epoch. It is used 

to update the weights. A different mini-batch is used in each iteration. The mini-batch accuracy 

reported during training relates to the accuracy of the particular mini-batch at the given 

iteration[34]. 

Epoch: An epoch is a hyperparameter characterized before training a model in deep learning. 

One epoch is the point at which a whole dataset is passed forward and reversed through the 

neural network system once[34][33]. 

University of Ghana http://ugspace.ug.edu.gh 

https://arxiv.org/abs/1512.03385
https://arxiv.org/abs/1512.00567


25 
 

Confidence Score: A confidence Score is an ordered set of values that can be easily 

compared. It is a decimal number between 0 and1, which can be interpreted as a percentage of 

confidence. 

For defects incident detection, in a multiclass classification, there are four (4) main categories 

a predicted sample of a classifier will belong to [35]: 

❑ True positives (TP) 

❑ False positives (FP) 

❑ True negatives (TN) 

❑ False negatives (FN) 

These four categories are used to form the confusion metrics [35] as indicated in the table 

below: 

Table 2. 1 Confusion Matrix 

     Actual condition positive Actual condition negatives 

Predicted positive condition TP FP 

Predicted negative condition FN TN 

 
Among the commonly used performance metrics are accuracy and error rate. The number of 

correctly classified instances determines the accuracy of the classifier, and the error rate is the 

number of instances that are incorrectly classified [35][25]. 

The overall accuracy is given by:           

TP TN
A

TP T FP FN

+
=

+ + +
     (2.14) 

University of Ghana http://ugspace.ug.edu.gh 


26 
 

r

TP TN
E

TP TN FP FN

+
=

+ + +
     (2.15) 

However, accuracy can be misleading for highly imbalanced datasets as these metrics favor the 

majority class [36][35].  

For instance, in a dataset where the number of majority instances largely outnumbers the 

number of minority instances by a ratio 99 :1, the classifier will likely classify all the majority 

instances correctly and misclassify the only one available minority instance. Hence there will 

be 99  true negatives, 0  false negatives, 1 false-positive, and 0  true positives, resulting in an 

accuracy of 99% , which is very misleading since the accurate prediction of the positive class 

is usually more desirable [35]. 

0 99
0.99

0 99 1 0
Acc

+
= =

+ + +
     (2.16) 

To avoid this inconsistency, it is recommended that performance measures based on class 

metrics are used [35] [37]. These metrics are listed below: 

TP
TPR

TP FN
=

+
     (2.17) 

TN
TNR

TN FP
=

+
     (2.18) 

FP
FPR

TN FP
=

+

FN
FNR

TP FN
=

+
    (2.19) 

1

2

TPR FPR
AUC

+ −
=            (2.20) 

1

TPR N
G Mean


− =        (2.21) 

University of Ghana http://ugspace.ug.edu.gh 


27 
 

( )( )( )( )

TP TN FP FN
MCC

TP FP TP FN TN FP TN FN

 − 
=

+ + + +
   (2.22)  

University of Ghana http://ugspace.ug.edu.gh 


28 
 

2.6 Summary of Papers 

Table 2.2 shows the summary of relevant journal articles that were reviewed.  

Table 2. 2 Summary of Papers 

 
SECURITY INCIDENT  
PAPER TITLE 

  
GENERAL 

OVERVIEW 

 
METHODOLOGY 

 
STRENGTH AND 

WEAKNESSES 

 
ACCURACY 

1.  Fast R-CNN for 

object detection 

 
Author(s): 

Girshick, Ross 

The author(s) looked 

at ways to improve the 

efficacy of neural 

network models in 

object detection.  

 
Datasets used in the 

project: 

1. VGG16 

2. ImageNet 

  [10]  

The paper proposed the Fast 

Region-based Convolutional 

Network method (Fast R-

CNN) for object detection.  

Three experiments are used 

to pre-trained ImageNet 

models that are available 

online.  

The first is the CaffeNet 

(essentially AlexNet from R-

CNN. 

The second net- work is 

VGG CNN M 1024 

The final network is the very 

deep VGG16 model. 

Fast R-CNN was 

implemented in Python and 

C++ (using Caffe) and is 

Strength: 

1. Fast R-CNN trains very 

deep VGG16 network 9 times 

faster than R-CNN. 

2. Simulations of Fast R-CNN 

show 213 times faster at test-

time and achieve a higher 

Mean Average Precision 

(mAP) on PASCAL VOC 

2012.  

3. Compared to SPPnet, Fast 

R-CNN trains VGG16 3× 

faster, tests 10× faster, and is 

more accurate. 

Weakness: 

The author(s) acknowledge the 

need to further develop this 

method due to undiscovered 

techniques that allow dense 

R-CNN achieved a 

Mean Average 

Precision (mAP)    

of 66.0%, SPPnet, 

63.1% 

And Fast R-CNN, 

66.6% 

University of Ghana http://ugspace.ug.edu.gh 


29 
 

available under the open-

source MIT License. 

boxes to perform and sparse 

proposals. 

 
2.  ATM- Security 

using machine 

learning techniques 

in IoT 

 
Author(s):  Udhaya 

Kumar N,  Sri Vasu 

R,  Subash S,  

Sharmila Rani D. 

In this paper, the 

authors designed a 

security system that 

gives access to the user 

of an ATM only after 

identifying the user's 

image taken by the 

CCTV. 

The image captured 

will be compared to 

the image stored in a 

database. 

This project will give 

access to the user only 

after identifying the 

user's image taken by 

the CCTV in the ATM 

and comparing the 

specified image with 

the user's image stored 

in the database created 

during the account 

creation, which comes 

under the banking 

session of banks. 

Sometimes, the 

authorized user cannot 

use the ATM for 

emergency purposes. 

In such cases, the OTP 

is sent to the user's 

registered mobile 

This paper proposed a 

security system for ATMs 

using deep learning 

techniques for face detection 

and recognition.  

IoT components like a 

Camera, RFID reader, Tag, 

Relay, Motor, and a 

Raspberry pi 3 (2015 

version) were used.  

The authors used the 

OpenCV platform and 

Python to implement the 

Local Binary Patterns (LBP) 

algorithm.  

And an alert message is sent 

to the authorized user as a 

text message if the user is 

found to be the third 

Strength: 

The proposed solution operates 

in real-time, thereby providing 

security in an active state. 

 
Weaknesses: 

The system is fully reliant on 

face detection and recognition 

system. This makes it 

imperative for the user to be 

physically present and granted 

access to the ATM. As much as 

they provided an alternative 

with an OTP sent to the user in 

such circumstances, it is less 

effective in the real-world 

scenario.  

 
The method 

detected faces 

without given  

accuracies 

University of Ghana http://ugspace.ug.edu.gh 


30 
 

number, and the 

person who came 

instead of the 

authorized user has to 

enter the OTP that the 

authorized user 

received. This method 

will reduce the risk of 

ATM usage by the 

common people [38].  

Datasets used in the 

project: 

1. Dataset1 

2. Dataset2 

3. Fast Edge 

Detection Using 

Structured Forests 

 
Author(s): Piotr 

Dollar and C. 

Lawrence Zitnick 

In this paper, the 

author(s) analyzed 

edge detection as a 

critical component of 

many vision systems, 

including object 

detectors and image 

segmentation 

algorithms.  

Patches of edges 

exhibit well-known 

forms of local 

structure, such as 

straight lines or T-

junctions.  

This paper takes 

advantage of the 

structure in local 

image patches to learn 

an accurate and 

We predict local edge masks 

in a structured learning 

framework applied to 

random decision forests. Our 

novel approach to learning 

decision trees robustly maps 

the structured labels to a 

discrete space where 

standard information gain 

measures may be evaluated. 

A result is an approach that 

obtains real-time 

performance that is orders of 

magnitude faster than many 

competing state-of-the-art 

approaches while also 

achieving state-of-the-art 

edge detection results on the  

Strength: 

The proposed solution operates 

in real-time, thereby providing 

security in an active state. 

 
Weaknesses: 

The system is fully reliant on 

face detection and recognition 

system. This makes it 

imperative for the user to be 

physically present and granted 

access to the ATM. As much as 

they provided an alternative 

with an OTP sent to the user in 

such circumstances, it is less 

effective in the real-world 

scenario.  

 
Initially, their 

methodology 

achieved 71% ODS 

accuracy and 

increased to 75% 

ODS after the 

parameter sweep. 

 
University of Ghana http://ugspace.ug.edu.gh 


31 
 

computationally 

efficient edge detector.  

Finally, we show the 

potential of our 

approach as a general-

purpose edge detector 

by showing our 

learned edge models 

generalize well across 

datasets [39]. 

 
Datasets used in the 

project: 

1. BSDS500 

Segmentation dataset 

2.  NYU Depth dataset 

4.  Deep Learn 

Helmets-Enhancing 

Security at ATMs  

 
Author(s):  K. 

Bavithra Devi,  S. 

Mohamed Mansoor 

Roomi,  M. Meena. 

 The authors took a 

comprehensive survey 

on ATM infrastructure 

in India, with over 1.2 

billion people. Their 

survey took into 

perspective the 

perennial security 

challenges of ATMs. 

The author(s) 

advanced their 

analysis that real-time 

intelligent video 

analytics offers 

advanced monitoring 

capabilities that give 

sophisticated video 

surveillance to 

The author(s) designed an 

image detection of the 

helmet using Deep Learning 

Convolutional Neural 

Network (CNN) architecture 

such as VGGNET (Visual 

Geometry Group) and 

ALEXNET. The helmet 

region is detected using 

(Region Convolutional 

Neural Network) RCNN 

with 15 layers. The 

performance of this 

technique has been tested on 

880 test images out of 1880 

images in a database. The 

parameters are chosen to 

compare the different mini-

Strength: 

The proposed solution operates 

in real-time, thereby providing 

security in an active state. 

 
Weaknesses: 

 
The system is fully reliant on a 

helmet detection and 

recognition system.  Suppose  

the person’s head is bald or 

any object resembling a 

helmet. A person without a 

helmet is detected with high 

accuracy. 

The accuracy for 

helmet detection 

using ALEXNET is 

96.03 

University of Ghana http://ugspace.ug.edu.gh 


32 
 

recognize abnormal 

activities.  

Persons wearing 

helmets in the ATM 

center is one of the 

anomalous activities. 

In such a scenario, an 

automatic helmet 

detection algorithm is 

required to alert the 

person wearing a 

helmet in ATM. 

[39]. 

batch sizes and epochs in 

ALEXNET 

5. Selective Search 

for Object 

Recognition 

 
Author(s): J. R. R. 

Uijlings · K. E. A. 

van de Sande · T. 

Gevers A. W. M. 

Smeulders 

The authors in this 

paper addressed the 

problem of generating 

possible object 

locations for use in 

object recognition.  

Our selective search 

results in a small set of 

data-driven, class-

independent, high-

quality locations, 

yielding 99% recall 

and a Mean Average 

Best Overlap of 0.879 

at 10,097 locations. 

Compared to an 

exhaustive search, the 

reduced number of 

locations enables more 

robust machine-

learning techniques 

and appearance 

The authors introduced 

selective search, combining 

the strength of an exhaustive 

search and segmentation. 

Like segmentation, their 

work used image structure to 

guide the sampling process. 

The exhaustive search was 

aimed at the capture of all 

possible object locations. 

Instead of a single technique 

to generate possible object 

locations, we diversify our 

search and use various 

complementary image 

partitioning to deal with as 

many image conditions as 

possible.  

Strength: 

 
The proposed solution operates 

in real-time, thereby providing 

security in an active state. 

 
Weaknesses: 

 
The system is fully reliant on a 

helmet detection and 

recognition system.  Suppose  

the person’s head is bald or 

any object resembling a 

helmet. A person without a 

helmet is detected with high 

accuracy. 

The selective 

search for object 

recognition is 99% 

University of Ghana http://ugspace.ug.edu.gh 


33 
 

models for object 

recognition. In this 

paper, we show that 

our selective search 

allows the powerful 

Bag-of-Words model 

for recognition. The 

selective search 

software is made 

publicly available[12]. 

 
6. Anomaly 

Detection on ATMs 

via Time Series 

Motif Discovery. 

 
Dirk Walther, 

Maximilian 

Riesenhuber, 

Tomaso Poggio 

 
The authors looked at 

a 2014 incident where 

skimming attacks on 

ATMs resulted in 

approximately 280 

million Euros in losses 

within the European 

Union sub-region[40].  

 
The authors used an 

innovative piezoelectric 

sensor network to capture 

the ATM state and analyze 

the occurring vibrations. 

The complex quad-tree 

wavelet packet transform 

inspected the captured 

signals, which provided 

them with a broad frequency 

analysis of a signal in 

various scales. 

Features were extracted 

from the selected scale 

based on the information 

content to detect motifs. The 

detected motifs provided the 

detected prototype patterns 

for anomaly detection or 

classification tasks. 

Strength: 

 
The practical results showed 

that the proposed approach 

could classify normal and 

abnormal signals via motif 

discovery 

 
Weaknesses: 

The system is fully reliant on 

signal classification and does 

not and do not report on time. 

 
Motif achieved 

60.0% 

classification 

results, 58.8 F-

Measure, 71.4 

sensitivity, and 50.0 

precision.  

 
University of Ghana http://ugspace.ug.edu.gh 


34 
 

DEFECT INCIDENTS 

 
PAPER TITLE 

 
7.  Automated Teller 

Machine Analysis 

under Host-Bank 

Systems through 

Telephone Network 

 
Author(s):  Kuldeep 

Nagiya, Mangey 

Ram. 

This work 

demonstrates the 

performance of an 

ATM network. The 

different types of 

component failure, 

such as an ATM, 

telephone network, 

power supply, etc., 

were taken for the 

study.  

The author(s) 

considered ATM 

functionality problems 

of power supply:  

1. Power supply 

through electricity 

board and  

2. Power supply 

through a generator. 

[41]. 

The various performance and 

reliability characteristics of 

the ATM network were 

accessed by using a 

supplementary variable 

technique, Laplace 

transformation, and the 

Markov process.  

  
The authors used  
mathematical modeling, 

Laplace transformation, 

supplementary variable 

techniques, and the Markov 

process 

Strength: 

 
The reliability characteristics of 

the ATM network were found 

through the approach. 

 
Weaknesses: 

 
The technique relies on the 

power supply components of 

the ATM and not the essential 

components such as the card 

reader, dispenser, and others 

which are the main components 

that users of ATMs access. 

The technique is inefficient 

because detecting only power 

failure does not make the ATM 

defective. 

 
The technique 

achieved the 

following 

reliabilities: 

At time 0, reliability 

is 1, at time 1; 

reliability is 

0.985042, time 2, 

reliability is 

0.970173; through 

time 15, with the 

reliability of 

0.787659 

8. ATM management 

prediction using 

Artificial 

Intelligence 

techniques: A survey 

 
Author(s):  Seyed 

Mohammad Hossein 

In this paper, the 

author(s) discussed 

forecasting cash 

demand, fraud 

detection, ATM 

failure, user interface, 

replenishment 

strategy, ATM 

Artificial Intelligence (AI) 

techniques were discussed to 

detect fraud, failure, 

replenishment, and crash 

prediction.  

Several statistical methods 

used to evaluate these 

forecasts are also covered in 

this paper.  

Strength: 

 
The techniques used yielded 

appreciable results 

 
Weaknesses: 

 
The ANN model of 

20.6 mean average 

percentage error 

(MAPE) was 

achieved, SVR 

achieved 25.1, 

Stepwise 

Autoregressive 

achieved 46.55, 

University of Ghana http://ugspace.ug.edu.gh 


35 
 

Hasheminejad and 

Zahra Reisjafari 

location, and customer 

behavior[42]. 

 
Moreover, we review AI 

techniques such as neural 

networks, regressions, and 

support vector machines and 

their results in graphs in 

different sections. The 

literature covered in this 

paper is related to the past ten 

years (2006-2016). The 

approaches studied in this 

paper are compared 

regarding data sets and 

prediction performance, 

accuracy, etc. We also 

provide a list of data sets 

available for the scientific 

community to research in 

this field. Finally, open 

issues and future works are 

presented in each of these 

items.  

Different datasets were used for 

each technique; hence, there is 

no comparison point. 

Holt-Winters 

Additive achieved 

53.05, and 

Exponential 

Smoothening of 

55.87. 

9. Agent-Based 

Faults Monitoring in 

Automatic Teller 

Machines. 

 
Author(s): Bashir 

Sulaimon Adebayo, 

Mohammed Idris 

Kolo. 

In this paper, the 

authors worked on 

ATM systems in 

Nigeria. The main area 

of this research was 

the challenges in 

maximizing the 

uptime of ATMs due 

to a wide gap in fault 

detection, notification, 

and correction of the 

ATMs.  

One way to alleviate 

this situation is 

The authors proposed 

architecture for rule-based 

and intelligent agent-based 

monitoring and management 

of ATMs.  

The agents remotely monitor 

the ATMs and control 

functions such as software 

maintenance.   

A system administrator can 

securely modify agents' 

monitoring policies and 

control functions. The 

framework presented 

Strength: 

 
Reduction in the mean time to 

repair (MTTR) by quickly 

isolating problems in critical 

business transactions. 

 
• Ability to use remote 

diagnosis information to 

minimize the number of trips 

made to the ATM. 

• Ability to monitor individual 

processes on the ATM and reset 

when necessary. 

The paper is purely 

a software approach 

and does not have a 

prototype to 

determine the 

accuracy of the 

technique. 

University of Ghana http://ugspace.ug.edu.gh 


36 
 

through intelligent 

monitoring of ATMs 

by resident software 

agents that monitor the 

device and report 

faulty components in 

real-time to facilitate 

quick response [43]. 

includes a software fault 

monitor, hardware fault 

monitor, and transaction 

monitor.  

Finally, a set of utility 

support agents, caller, and 

log agents alert the network 

operator, log error, and 

transaction information in a 

database. 

• Ability to dynamically update 

diagnosis rule by changing to 

remote diagnosis information 

to minimize the number of trips 

made to the ATM. 

• Ability to monitor individual 

processes on the ATM and reset 

when necessary. 

• Ability to dynamically update 

diagnosis rules with changing 

environments. 

 
Weaknesses: 

 
The agents residing on the 

ATM devices can declare the 

state of the ATM after the 

failures of the components, 

which does not minimize the 

downtime experience. 

10.  ATM 

Availability 

Management System 

 
Author(s):  Sujata 

Rao1 and Hrushikesh 

Mane2 

The author(s) worked 

on the ATM 

monitoring process as 

a key ATM 

Availability 

Management Solution 

component.  

Handling the incidents 

related to ATMs and 

monitoring the device 

level health of the 

entire ATM fleet is the 

prime study of this 

paper. The Master 

View helps a financial 

Various reports and statistics 

about the transactions across 

multiple demographics 

provide helpful information 

like a) the cities using ATMs 

heavily, b) the most popular 

transactions, c) ATMs 

having heavy transaction 

volumes, etc. Monitoring 

ATMs helps significantly 

identify the various causes 

behind slow or failed 

customer transactions to 

reduce service incidents and 

maximize the ATM uptime 

Strength: 

 
The Master View Resolve 

(MVR) can carry out 

maintenance tasks. 

 
Weaknesses: 

 
The MVR has the limitation of 

detecting ATM problems at the 

onset. 

Master View 

Resolve (MVR) can 

handle ATM 

network failure of 

more than 90%. 

University of Ghana http://ugspace.ug.edu.gh 


37 
 

institute improve the 

availability of their 

self- service terminals 

and minimize 

downtime. [44]. 

through Master View 

Resolve (MVR).  
University of Ghana http://ugspace.ug.edu.gh 


38 
 

2.7 Conclusion and Summary of Literature Review 

The body of academic literature in the aforementioned papers presented in this chapter proves 

a point for the need to adopt the machine learning approach for this work. The machine learning 

approach provides an avenue for a dynamic, efficient, lightweight methodology. From the 

literature, some significant setbacks in using a machine learning approach are the lack of a 

dataset for specific problems and a biased dataset in some instances. These challenges have 

been solved using some machine learning techniques and also the use of synthetic data. In 

effect, the primary technique for this work uses machine learning models for incident detection. 

  
University of Ghana http://ugspace.ug.edu.gh 


39 
 

CHAPTER THREE 

METHODOLOGY 

 
3.0 Introduction 

This chapter discusses the research strategy, the research method, the research approach, the 

methods of data collection, and the sample selection. It also includes the research process, the 

type of data analysis, the ethical considerations, and the research limitations of the project. This 

study involves a field survey and software development: 

3.1 Proposed System Design 

The flowchart for the SRSAID system design is shown in figure 3.1. The figure shows the 

various stages in blocks as well as the processes involved for each block. 

 
start

Is Dataset 
Noisy?

No

Send GSM alerts 
and indicate on 

Google Map

End

Is there 
incident?

Display ATM 
state on 
software 

dashboard

No

Process data

Collected ATM 
camera feeds and 
system faults Processing 

using R-CNN 
with ssdlite 

mobilenet V2 
architecture 

         
Categorization of 

incidents from 
the ATM

is processed 
data  camera 

feeds?

yes

No

Processing 
using
SVM

Put the processed 
dataset in classes

yes

  
Figure 3. 1 Flowcharts for Proposed Design 

University of Ghana http://ugspace.ug.edu.gh 


40 
 

3.2 Field Survey 

A field survey was conducted to solicit information on the user experience of persons who have 

used ATMs over the past 12 months. The population group for this survey was a cross-section 

of bank customers on the University of Ghana campus, some of the banks' staff, and NCR staff. 

A structured questionnaire was presented to the selected people, and the answers provided were 

used in the analysis. 

The main reason for using these instruments was to collect enough firsthand information from 

respondents. Drawing from [45], it is argued that with a semi-structured interview, the 

interviewer has more freedom to pursue his idea and can improvise the questions [46] confirmed 

the use of the interview by stating that it is a face-to-face questioning of respondents to obtain 

information. The study was based on primary and secondary sources of data.  

The primary source of data was obtained from interviews and questionnaires, which were 

administered to ten (10 officials of NCR and ten (10) clients and staff from different banks who 

are using the electronic-banking service (ATMs) of the banks. Secondary data was also 

collected from research reports, Agricultural Development Bank (ADB), Ghana, Ghana 

Commercial Bank (GCB), Barclays Bank, Ghana, Republic Bank, Ghana, and other published 

materials.  

The purposive sampling method that allows the researcher to select particular participants 

needed for specific information was adopted to select the NCR officials directly related to ATM 

banking services of all the banks in Ghana. Ten (10) officials of NCR, Ghana, which included 

the manager of the ATMs in Ghana, the Head of engineering, and eight (8) staff from NCR as 

University of Ghana http://ugspace.ug.edu.gh 


41 
 

primary respondents. Random sampling was adopted to select 10 customers of different banks 

that use NCR ATMs. 

3.2.1 Data Collection 

Computer vision base approaches depend on collected data or image datasets. It is essential to 

analyze object features and to review the performance of detection algorithms. Some databases 

are available for object, character, and scene recognition[34]. There is no database for ATM 

incidents (different images showing security and defect incidents). Hence, thousand (1000) plus 

images (images showing the various ATM security incidents) were collected from the internet, 

and the faults dataset in excel format was collected from the ATM service provider (NCR). 

3.3 Data Pre-processing  

3.3.1 Security Incident Data Pre-processing 

The data preprocessing stage is the first significant stage in developing the incident detection 

system. This stage involves using data mining techniques to transform the data from its raw 

form into the required format used by the convolutional neural network classifier (CNN-C) to 

detect and identify ATM security incidents. The data preprocessing stage involves removing 

unwanted images and creating bounding boxes to select the parts of the images that fit for 

security incidents. This ensures that only valid and relevant information is extracted for the next 

process.  

Before the data preprocessing, images were downloaded from the internet and others with 

TECHNO POP 2 Plus camera into a folder on the raspberry pi named image2. The image data 

preprocessing involves these steps:  

University of Ghana http://ugspace.ug.edu.gh 


42 
 

❑ Image data filtering and selection 

❑ Feature selection and extraction 

❑ Feature adjustment. 

3.3.2 Image Dataset and Description 

The security incident experimental setup used one thousand, one hundred and sixty-five (1,165) 

image datasets. One thousand, one hundred and fifty-five were obtained from the internet, and 

the ten were the images captured with the raspberry pi for testing purposes. The images from 

the internet depict all the activities of security incidents of the ATM. For this study, the 

multiclass datasets sourced from the internet were improved and grouped into a multiclass 

classification problem. Table 3.1 shows a summary of the image datasets and their class 

distributions: 

Table 3. 1 Summary of Image Dataset 

Dataset Number of Attributes 

ATM out of service 62 

card trapping 50 

cash trapping 64 

Jackpotting 46 

logical attack 44 

Malware 78 

University of Ghana http://ugspace.ug.edu.gh 


43 
 

Occlusion 73 

physical attack 204 

Robbery 126 

Skimming 55 

transaction reversal fraud 50 

 
Samples of Skimming Attack Images 

  
Samples of Physical Attack Images 

  
Samples of Robbery Images  

University of Ghana http://ugspace.ug.edu.gh 


44 
 

Figure 3. 2 Sample Images for ATM Incident Attacks 

 
Defects datasets  

Eleven thousand, four hundred and fifty-two (11452) 2018 ATM defects incidents were used 

in the experimental setup obtained from NCR, Ghana. The NCR defects datasets consist of 

various ATM systems technical problems encountered in 2018, obtained through the 

company’s database or logs. The multi-class datasets from the NCR Ghana were modified into 

binary classification problems for this study. Figure 3.3 shows part of the multiclass datasets 

used by the company and their class distributions: 

University of Ghana http://ugspace.ug.edu.gh 


45 
 

Figure 3. 3 Summaries of defect incident datasets 

 
Figure 3. 4 Plot of summaries of defect incident datasets 

 
University of Ghana http://ugspace.ug.edu.gh 


46 
 

In addition to the images, data obtained from NCR Ghana for ATM Defects was used for the 

system defect aspect of this project. 

3.4 Machine Learning Classification Algorithms 

To assess the effectiveness and efficiency of the proposed technique, Machine Learning 

techniques are used. These algorithms were Regional Convolutional Neural Network (R-CNN) 

and Support Vector Machines (SVM) for defects classifications. These Machine Learning 

algorithms are considered: Deep learning Regional Convolutional Neural Network (R-CNN), 

which regression model for image classifications fine-tuned, and Support Vector Machines 

(SVM) for defects classifications. 

3.4.1 Regional Convolutional Neural Network 

R-CNN forward computation has several stages, shown in figure 10.  First, the regions of 

interest (RoIs) are generated. The RoIs are category-independent bounding boxes that are 

highly likely to contain an exciting object [10]. This study uses a different method called the 

integrated method, which works similarly to selective search [20] [12], to generate these 

through Labelimg. These methods are discussed in further detail in figure 3.5. Next, a 

convolutional network extract features from each region proposal. The sub-image contained in 

the bounding box is warped to match the input size of the CNN and then fed to the network. 

After the network has extracted features from the input, the features are input to a faster regional 

convolutional network (F-RCNN) that provides the final classification. 

 
University of Ghana http://ugspace.ug.edu.gh 


47 
 

1. Original
Input image 

Physical attack (No) 

2. Feature 
proposal extraction

A wraped 
proposal 3. Compute features using CNN

              
 4. 
Classification

Skimming (Yes) 

Robbery (No)

Logical attack (No) 

Malware (No) 

Card trapping (No) 

Cash trapping (No) 

Occlusion (No) 

Jackpotting (No) 

TRF (No) 

 
Figure 3. 5 Stages of R-CNN forward computation 

 
The method is trained in multiple stages, beginning with the convolutional network 

[10][14][17]. After the CNN has been trained, the Faster-RCNN is fitted to the CNN features. 

Finally, the region proposal-generating method is trained. 

Fast R-CNN 

The method receives an input image plus regions of interest computed from the image. As in 

R-CNN, the RoIs are generated using an external method [10][14][17]. The image is processed 

using a CNN containing several convolutional and max-pooling layers. The convolutional 

feature map generated after these layers is input to an RoI pooling layer, as shown in the figure. 

This extracts a fixed-length feature vector for each RoI from the feature map [20]. The feature 

vectors are then input to fully connected layers that are connected to two output layers: a 

SoftMax layer that produces probability estimates for the object classes and a real-valued layer 

that outputs bounding box co-ordinates computed using regression (meaning refinements to the 

initial candidate boxes)[20] [47][12][10]. 

University of Ghana http://ugspace.ug.edu.gh 


48 
 

2

2

8

5

5

8

2

2

16

5

5

16
2
2

32

50

100*100
96*96 48*48 44*44 22*22

18*18
9*9

2

16
128

1024

T4T3T2

T1
Max

 pooling 1Convolution
2

Max
 pooling 1

Convolution
1

Max
 pooling 1

Convolution

 
Figure 3. 6 RoI Pooling Layers 

 
3.4.2 Support Vector Machines 

Support Vector Machines (SVM) is a machine learning tool for classification and regression. 

Support Vector Machine is based on supervised learning, which classifies points to one of two 

disjoint half-spaces [24][22][21]”. It uses nonlinear mapping to convert the original data into a 

higher dimension. Its objective is to construct a function that correctly predicts the class to 

which the new and old points belong. With an appropriate nonlinear mapping, two data sets can 

always be divided by hyperplane. 

 Hyperplane separates the tuples of one class from another and defines decision boundaries. 

Many hyper planes separate the data, but only one will achieve maximum separation. The main 

reason behind maximum margin or separation is that if we use a decision boundary to classify, 

it may end up nearer to one set of datasets than others[21][24]. This was the case when data is 

University of Ghana http://ugspace.ug.edu.gh 


49 
 

linear, but we mostly find that data is non-linear and the data set is inseparable, so we use 

kernels.  

The core purpose of SVM is to separate the data with decision boundaries and extend it to non-

linear boundaries using kernel trick [24]. The significant benefit of SVM is its versatility 

meaning that different kernel functions can be specified for the decision function. Available 

kernels are provided, but it is also possible to specify custom kernels. SVM becomes prominent 

when we use pixel maps as input; it gives accuracy equivalent to neural networks with 

elaborated features in a handwriting recognition task. Support vector machine is used for many 

applications, such as text categorization, pattern recognition, face recognition, and handwriting 

analysis, especially for classification and regression applications. 

Neural Networks are more accessible to apply than support vector machine, but sometimes it 

provides unsatisfactory results. For example, even in perceptron learning algorithms, gradient 

descent is slower than SVM learning. SVM is unbeaten when used for pattern classification 

problems. One of the significant challenges is choosing a suitable kernel for a given application 

[24]. But there are many standard or default choices, such as Gaussian or polynomial kernel, 

but if these prove worthless, more elaborate kernels are needed. 

Traditional Classification approaches perform poorly when working directly because of high 

data dimensionality, but support vector machines can avoid very high dimensionality 

representations. Support vector machine is the most promising technique and approach 

compared to others. Support vector machine scales fairly well to high dimensional data, and the 

trade-off between classifier complexity and error can be controlled explicitly. 

Another benefit of SVMs and kernel methods is that one can design and use a kernel for a 

particular problem that could be applied directly to the data without needing a feature extraction 

University of Ghana http://ugspace.ug.edu.gh 


50 
 

process. It is imperative in situations where a lot of data structure is lost by the feature extraction 

process. An example is text processing. Limitations of SVM are speed and size in training and 

testing [24][21]. Discrete data presents another problem. The most severe difficulty with SVMs 

is the high algorithmic complexity and extensive memory requirements. The development of 

SVM is utterly different from standard algorithms used for learning, and SVM provides fresh 

insight into this learning. 

SVM is an excellent example of supervised learning that maximizes generalization by 

maximizing the margin and supports kernelization's nonlinear separation. SVM tries to avoid 

overfitting and underfitting. The margin in SVM denotes the distance from the boundary to the 

closest data points in the feature space. Given the 2018 incident dataset correspondingly to 

: nXn R F in the feature space F . 

The calculated linear hyperplane dividing them into two labeled classes (problem code 

description and all other classes) can be mathematically obtained as: 

0, ,T n

iw x b w R b R+ =       (3.1) 

Assuming the training dataset is correctly classified, as shown in figure 3.7: 

University of Ghana http://ugspace.ug.edu.gh 


51 
 

Misclassified 
point

Remaining
classes

Support 
vectors

Problem 
Code 

Description
(PCD)

( , ) ( )T T

i j ik x x x  1

( ) 1Tw x b + = −

( ) 0Tw x b + =

( ) 1Tw x b + =+

0
 =

b

arg 2( TM in w w=

 
Figure 3. 7 Standard formulation of SVM 

 
This means the SVC computes the hyperplane to maximize the margin separating the classes 

(problem and all other classes). The SVC is a hyperplane that separates the problem state from 

all other classes with a maximum margin in the simplest linear form. Finding this hyperplane 

involves obtaining two parallel hyperplanes, as shown in Figure 3.7 above, equal distance to 

the maximum margin. If all the training dataset satisfies the constraints as follows: 

1,for 1

1,for 1

T

i i

T

i i

w x b y

w x b y

 +  = +


+  − = −

     (3.2) 

Where ω is the normal to the hyperplane, is | | /b w the perpendicular distance from the 

hyperplane to the origin, and ||w|| is the Euclidean norm of w . The separating hyperplane is 

defined by the plane 0T

iw x b+ = , and the above constraints in (2) are combined to form: 

( ) 1T

i iy w x b+ +      (3.3) 

University of Ghana http://ugspace.ug.edu.gh 


52 
 

The pair of hyperplanes that gives the maximum margin (c) can be found by minimizing || || 2w  

subject to constraint in (9)”. This leads to a quadratic optimization problem formulated as: 

Minimize  

( )
2

,
2

w
f w b =      (3.5) 

Subject to 

( )   1,    1, ..T

i iy w x b i n+   =     (3.6) 

This problem is reformulated by introducing Lagrange multipliers, ( )1,..., }i i n = for each 

constraint and subtracting them from the function”. This ( )( )T

if x w x b+  results in establishing 

the primal Lagrangian function: 

( ) ( )( )
2

, , 1 {( )} ,
2

n
T

P i i

i

w
L w b y w x b = + − +    (3.7) 

1,...........i n =  

Taking the partial derivatives of ( , , )PL w b  ) with respect to , & ,w b   respectively, and applying 

the duality theory yields: 

1

0
n

P
i i i

i

L
w y x

w


=


= → =


     (3.8) 

The problem defined in (5) is a quadratic optimization (QP) problem. “Maximizing the primal 

problem PL  with respect to αi, subject to the constraints that the gradient of PL  with respect to 

w and b    vanish, and that 0,i   gives the following two conditions”: 

1

n

i i i

i

w y x
−

=     

 (3.9) 

University of Ghana http://ugspace.ug.edu.gh 


53 
 

1

0,
n

i i

i

w y
−

= =                (3.10) 

Substituting these constraints gives the dual formulation of the Lagrangian: 

( ) ( )
1 1 1

1
, , ,

2

n n n

P i i j i j i j

i i j

maximize L w b y y x x    
= = =

= −   

         (3.11) 

1

subject to 0, 0; 1,.....
n

i i

i

y i n 
−

=  =      (3.12) 

But the values of , ,i w  and b are obtained from these respective equations, namely: 

                                           
1

n

i i i

i

w y x
−

=            (3.13) 

( )
1

1 1
2

T T

i i i i i ib Min y w x Max y w x= = + =     (3.14) 

Also, the Lagrange multiplier is computed using the following: 

( )( )1 0T

i iy w x b − + =      (3.15) 

                                           
Hence, this dual Lagrangian is 𝐿𝐷 maximized with respect to its nonnegative i   to give a 

standard quadratic optimization problem”. “The respective training vectors are called support 

vectors. With the input dataset xi as a nonzero Lagrangian multiplier i , 

( ) 1T

i iy w x b+ =                              (3.16) 

The equation above gives the support vectors (SVs). Although the SVM classifier can only 

have a linear hyperplane as its decision surface, its formulation can be extended to build a 

nonlinear SVM. SVMs with nonlinear decision surfaces can classify nonlinearly separable 

University of Ghana http://ugspace.ug.edu.gh 


54 
 

data by introducing a soft margin hyperplane, as shown in Figure 3.8: Introducing the slack 

variable into the constraints yields: 

1 ,for 1,T

i i iw x b y+  − =  

      1 ,for 1,T

i i iw x b y+  − + = −  

                                            0 .i i  
                                                                  

(3.17) 

These slack variables help to find the hyperplane that provides the minimum number of training 

errors. “Modifying equation (4) to include the slack variable yields: 

1, , 2

n

i

i

Mininmize w
C

b


  −

+       

( )subject to      1 ( ) 1 0, 0.T

i i i i iy w x b  − + + −        (3.18)