SELF-REPORTING SYSTEM FOR INCIDENTS DETECTION IN AUTOMATED TELLER MACHINE (ATM) USING MACHINE LEARNING TECHNIQUES BY IVY NKRUMAH PAYNE (10701873) THIS THESIS IS SUBMITTED TO THE UNIVERSITY OF GHANA, LEGON, IN PARTIAL FULFILMENT OF THE REQUIREMENT FOR THE AWARD OF MPHIL COMPUTER ENGINEERING DEGREE DEPARTMENT OF COMPUTER ENGINEERING SCHOOL OF ENGINEERING SCIENCES UNIVERSITY OF GHANA, LEGON SEPTEMBER 2021 University of Ghana http://ugspace.ug.edu.gh i DECLARATION I, Ivy Nkrumah Payne, author of this thesis, hereby declare that the work presented in this thesis, Self-Reporting System for Incidents Detection In Automated Teller Machine (ATM) Using Deep Learning Techniques, is my work, produced from research undertaken under supervision in the Department of Computer Engineering, School of Engineering Sciences, University of Ghana, Legon from September 2019 to July 2021. This work has never been presented either in whole or in part for any other degree in this University or elsewhere. October 11, 2022 ………………………………………… ………………………………………. Ivy Nkrumah Payne Date (Student) October 11, 2022 …………………………………… ………………………………………. Prof. Robert Adjetey Sowah Date (Principal Supervisor) University of Ghana http://ugspace.ug.edu.gh ii DEDICATION This work is dedicated to YAHWEH, GOD ALMIGHTY, and the memory of my late mother, Faustina Serwah Agyarkoh. University of Ghana http://ugspace.ug.edu.gh iii ACKNOWLEDGEMENT First of all, thanks and exaltation go to God Almighty for granting the strength and wisdom throughout this academic journey. I wish to extend my appreciation to my supervisor, Dr. Robert Adjetey Sowah, for all the ideas, guidance, and encouragement toward the successful completion of this research. I also wish to thank all Department of Computer Engineering members, especially Dr. Wiafe Owusu-Banahene, Dr. Godfrey A. Mills, Dr. Nii Longdon Sowah, and the entire department graduate committee for their input in this work. University of Ghana http://ugspace.ug.edu.gh iv ABSTRACT Automated Teller Machines (ATMs) have increased over the past decade due to their advantages in the banking sector. ATMs provide convenience to customers, optimizes banking operations, and minimizes transaction cost. However, undesirable security incidents such as tempering, skimming, physical attacks, robbery, and transaction reversal fraud may occur on ATM systems and negatively affect the user experience and banking institutions. ATM incidents occur either by system defect or through a deliberate act of physical attack by an intruder. In most security incidents, financial losses are imminent, and the customers' confidence in banking reduces. Developing a Self-Reporting System for ATM Incident Detection (SRSAID) is needed to avert the threats posed by security incidents on ATM systems. This research uses a machine-learning approach to solve this problem. Regional Convolutional Neural Network (R-CNN) and Support Vector Machine (SVM) algorithms are used to develop a detection model that detects occurrences of security incidents on an ATM system. Datasets used in the machine learning model development were obtained from NCR Ghana and the online repository. Experimental results showed that two CNN architecture models, ALEXNET and ssdlite_mobilenet_V2, obtained an accuracy score of 80% and 96%, respectively. SVM classifiers were developed using the linear, polynomial, and radial basis kernels, getting accuracy scores of 70.6%, 72.56%, and 81.21%, respectively. The initial results necessitated hyperparameter optimization to improve the performance of the classifiers. This resulted in improved accuracy scores of 76%, 77%, and 86% for linear, polynomial, and radial basis kernels, respectively, for the SVM models. The machine learning model was later deployed on a Raspberry Pi system which connected to a web application that provided a graphical user interface for user interactivity and viewing of reports. University of Ghana http://ugspace.ug.edu.gh v TABLE OF CONTENTS DECLARATION ....................................................................................................................... i DEDICATION .......................................................................................................................... ii ACKNOWLEDGEMENT ...................................................................................................... iii ABSTRACT ............................................................................................................................. iv TABLE OF CONTENTS ......................................................................................................... v LIST OF FIGURES ................................................................................................................ ix LIST OF TABLES ................................................................................................................. xii LIST OF ABBREVIATIONS .............................................................................................. xiii CHAPTER ONE ...................................................................................................................... 1 INTRODUCTION .................................................................................................................... 1 1.0 Introduction ................................................................................................................. 1 1.1 Problem Statement ...................................................................................................... 3 1.2 Objectives of Study ..................................................................................................... 6 1.4 Justification of Study ................................................................................................... 6 1.5 Research Limitation .................................................................................................... 7 1.6 Thesis Outline .............................................................................................................. 7 CHAPTER TWO ..................................................................................................................... 9 LITERATURE REVIEW ........................................................................................................ 9 2.0 Introduction ................................................................................................................. 9 2.1 ATM Network Architecture ........................................................................................ 9 2.2 ATM Incident Detection ........................................................................................... 11 2.2.1 Object Detection................................................................................................. 11 2.2.2 Computer Vision ................................................................................................ 12 2.3 Machine Learning Classification Algorithms ........................................................... 13 2.3.1 SVM-OVO-OVR Approach .............................................................................. 14 2.3.2 Convolutional Neural Networks (CNN) ............................................................ 20 2.3.3 Regional Convolutional Neural Networks (RCNN) .......................................... 21 2.4 Transfer Learning ...................................................................................................... 23 2.5 Performance Metrics for Security Incidents Detection ............................................. 24 University of Ghana http://ugspace.ug.edu.gh vi 2.6 Summary of Papers ................................................................................................... 28 2.7 Conclusion and Summary of Literature Review ....................................................... 38 CHAPTER THREE ............................................................................................................... 39 METHODOLOGY ................................................................................................................. 39 3.0 Introduction ............................................................................................................... 39 3.1 Proposed System Design ........................................................................................... 39 3.2 Field Survey .............................................................................................................. 40 3.2.1 Data Collection................................................................................................... 41 3.4 Data Pre-processing ................................................................................................... 41 3.4.1 Security Incident Data Pre-processing ............................................................... 41 3.4.2 Image Dataset and Description .......................................................................... 42 3.5 Machine Learning Classification Algorithms ........................................................... 46 3.5.1 Regional Convolutional Neural Network........................................................... 46 3.5.2 Support Vector Machines ................................................................................... 48 3.6 Tools Used ................................................................................................................. 61 3.6.1 Software Component .......................................................................................... 61 3.6.2 Hardware Component ........................................................................................ 62 3.6.3 System Specifications ........................................................................................ 65 CHAPTER FOUR .................................................................................................................. 66 3.7 Experimental Method ................................................................................................ 67 3.7.1 Security Incident Detection Training ................................................................. 67 SYSTEM DESIGN AND DEVELOPMENT ....................................................................... 94 4.0 Introduction ............................................................................................................... 94 4.1 Proposed System Design ........................................................................................... 94 4.2 Incident Detection ..................................................................................................... 95 4.2.1 CNN and SVM Incident Detection System Implementation and Testing. .............. 95 4.3 Implementation of SRSAID ...................................................................................... 98 4.4 The Hardware Architecture ..................................................................................... 101 4.5 System Integration Architecture .............................................................................. 102 SYSTEM IMPLEMENTATION AND TESTING ............................................................ 103 5.0 Introduction ............................................................................................................. 103 University of Ghana http://ugspace.ug.edu.gh vii 5.1 Testing of SRSAID ................................................................................................. 103 5.2 Results and Discussions for Security Incident Detection ........................................ 104 5.3 Security Incident Detection Using ssdlite_mobilenet_V2 ...................................... 108 5.3.1 Regional Proposal Extraction ........................................................................... 109 5.3.2 Performance Analysis for Different Mini Batch Size and Epoch .................... 110 5.4 Results and Discussion for Defects Incident Detection .......................................... 118 However, the plots of accuracy, AUC, G-mean, and MCC of the classifier are shown in APPENDIX B. ...................................................................................................................... 123 5.5 Research Contribution ............................................................................................. 126 CHAPTER SIX .................................................................................................................... 128 CONCLUSION AND RECOMMENDATION ................................................................. 128 6.1 Conclusion ............................................................................................................... 128 6.2 Recommendations ................................................................................................... 131 APPENDICES ...................................................................................................................... 136 APPENDIX A ....................................................................................................................... 136 I. EXPERIMENTAL PROCESS IMPLEMENTATION (SECURITY INCIDENT DETECTION) .................................................................................................................... 136 APPENDIX B ....................................................................................................................... 142 I. FULL IMPLEMENTATION OF SRSAID (SECURITY INCIDENT DETECTION) 142 II. CODE IMPLEMENTATION FOR SOFTWARE: DJANGO VIEW .................... 143 APPENDIX C ....................................................................................................................... 146 EXPERIMENTAL PROCESS IMPLEMENTATION (DEFECTS INCIDENT DETECTION) .................................................................................................................... 146 APPENDICES ...................................................................................................................... 149 APPENDIX B ....................................................................................................................... 149 I. PLOT OF PERFORMANCE METRICS FOR DEFECT INCIDENT DETECTION USING SVM CLASSIFIER ............................................................................................... 149 APPENDIX C ....................................................................................................................... 159 I. PERFORMANCE METRIC FOR ATM SECURITY INCIDENTS IMAGE DATASET USING ............................................................................................................. 159 R-CNN ( SSDLITE_MOBILENET_V2) ........................................................................... 159 University of Ghana http://ugspace.ug.edu.gh viii II. OTHER PERFORMANCE METRIC FOR ATM SECURITY INCIDENTS IMAGE DATASET USING R-CNN ( SSDLITE_MOBILENET_V2) .......................................... 160 III. PERFORMANCE METRIC FOR ATM TAMPERING IMAGE DATASET USING R-CNN (ALEXNET) ......................................................................................................... 161 IV. OTHER PERFORMANCE METRIC FOR ATM TAMPERING IMAGE DATASET USING R-CNN (ALEXNET) ............................................................................................ 162 APPENDIX E ....................................................................................................................... 163 V. PERFORMANCE METRIC FOR 2018 ATM SYSTEM DEFECT DATASETS USING SVM ...................................................................................................................... 163 II. OTHER PERFORMANCE METRICS FOR 2018 ATM SYSTEM DEFECT DATASETS USING SVM ................................................................................................. 165 OTHER PERFORMANCE METRICS FOR 2018NCR ATM SYSTEM DEFECT DATASETS USING RANDOM FOREST ........................................................................ 167 OTHER PERFORMANCE METRICS FOR 2018NCR ATM SYSTEM DEFECT DATASETS USING DECISION TREE ............................................................................ 169 University of Ghana http://ugspace.ug.edu.gh ix LIST OF FIGURES Figure 1. 1 Fraud type and Gross loss in Ghana [7] ............................................................... 5 Figure 2. 1 ATM Network Diagram ....................................................................................... 9 Figure 2. 2 Class boundaries for SVM-OVR formulation of the three-class problem ......... 16 Figure 2. 3 The distance between and of the multiclass classification ................................. 18 Figure 2. 4 CNN Architecture .............................................................................................. 20 Figure 2. 5 Stages of R-CNN forward Computation ............................................................ 22 Figure 3. 1 Flowcharts for Proposed Design ........................................................................ 39 Figure 3. 2 Sample Images for ATM Incident Attacks ........................................................ 44 Figure 3. 3 Summaries of defect incident datasets ............................................................... 45 Figure 3. 4 Plot of summaries of defect incident datasets .................................................... 45 Figure 3. 5 Stages of R-CNN forward computation ............................................................. 47 Figure 3. 6 RoI Pooling Layers ............................................................................................ 48 Figure 3. 7 Standard formulation of SVM ............................................................................ 51 Figure 3. 8 Linear separating hyperplanes for the nonseparable case of SVC by introducing the slack variable (ξ). ............................................................................................................... 56 Figure 3. 9 Nonlinear separating hyperplane for the nonseparable case of SVM ................ 58 Figure 3. 10 OVO approach on multiclass Figure 3. 11 OVR approach on multiclass 59 Figure 3. 12 OVO approach on multiclass taking all points into account. ......................... 60 Figure 3. 13 Raspberry pi 3 model B .................................................................................. 62 Figure 3. 14 Pi Camera ....................................................................................................... 63 Figure 3. 15 GPS and GSM module ................................................................................... 64 Figure 3. 16 Main Building Block of ssdlite_MobileNet_V2 ............................................ 71 Figure 3. 17 Operations of MobileNet V2 .......................................................................... 72 Figure 3. 18 Architecture for R-CNN ................................................................................. 74 Figure 3. 19 Stages of R-CNN forward computation ......................................................... 77 Figure 3. 20 Conceptual model design and development of CNN with ssdlite MobileNet V2 Architecture. 78 Figure 3. 21 Flow chart model design and development of CNN / R-CNN ....................... 79 University of Ghana http://ugspace.ug.edu.gh x Figure 3. 22 CNN Security Incident Detection Architecture .............................................. 80 Figure 3. 23 CNN Classification Model Architecture ........................................................ 81 Figure 3. 24 CNN Classification Model Architecture ........................................................ 81 Figure 3. 25 Illustration of transformation between predicted and ground-truth bounding boxes 84 Figure 3. 26 Conceptual model design and development of the support vector machines. 86 Figure 3. 27 Data preprocessing for SVM training and testing. ......................................... 88 Figure 3. 28 Flow chart for design and development of the support vector machines ....... 90 Figure 3. 29 OVO classification for the multiclass problem. ............................................. 91 Figure 4. 1 System Implementation Architecture for SRSAID ............................................ 95 Figure 4. 2 Detection results control portal interface. .......................................................... 97 Figure 4. 3 Proposed System Block Diagram ...................................................................... 98 Figure 4. 4 Hardware Implementation Architecture for SRSAID ...................................... 101 Figure 4. 5 The SRSAID Integration Architecture ............................................................. 102 Figure 5. 1 Use case Diagram for SRSAID ........................................................................ 104 Figure 5. 2 GSM alerts received from the Raspberry PI on a mobile phone...................... 105 Figure 5. 3 Autocreation of Results Database .................................................................... 106 Figure 5. 4 Dashboard Results ............................................................................................ 107 Figure 5. 5 Location on a map through GPS. ..................................................................... 107 Figure 5. 6 Security Incident Detection Using Faster R-CNN ........................................... 108 Figure 5. 7 Region Proposal Extraction using types of security incidents. ........................ 109 Figure 5. 8 Performance analysis mini-batch size of ssdlite_mobilenet_V2 and ALEXNET. 112 Figure 5. 9 Performance analysis of Epoch from steps 10-100. ......................................... 113 Figure 5. 10 Performance analysis of Epoch from step 110-200 ...................................... 114 Figure 5. 11 Performance analysis of Epoch from step 210-300 ...................................... 115 Figure 5. 12 Performance analysis of Epoch from step 310-400 ...................................... 116 Figure 5. 13 Performance analysis of Epoch from step 410-499 ...................................... 117 Figure 5. 14 Performance analysis of cross-entropy and validation accuracy at every epoch. 118 Figure 5. 15 The SVM Optimization History Plot ............................................................ 118 University of Ghana http://ugspace.ug.edu.gh xi Figure 5. 16 The slice plot of the SVM model optimization ............................................ 119 Figure 5. 17 Graph showing the comparison of accuracy for classifiers in the raw state. 119 Figure 5. 18 Graph showing the comparison of accuracy for classifiers after hyperparameter tuning. 120 Figure 5. 19 Plot of Confusion Matrix with class labels................................................... 121 Figure 5. 20 Plot of ACC of the SVM classifier of the predicted labels. ......................... 122 Figure 5. 21 Classification Report .................................................................................... 123 Figure 5. 22 Comparative Analysis on precision of the classifiers. ................................. 124 Figure 5. 23 Comparative Analysis on roc_auc_score of the classifiers. ......................... 125 University of Ghana http://ugspace.ug.edu.gh xii LIST OF TABLES Table 2. 1 Confusion Matrix ............................................................................................... 25 Table 2. 2 Summary of Papers ............................................................................................ 28 Table 3. 1 Summary of Image Dataset ................................................................................ 42 Table 3. 2 Kernel Functions ................................................................................................ 58 Table 3. 3 System Specifications ........................................................................................ 66 Table 5. 1 Security Incident Detection using ssdlite_mobilenet_V2 ................................ 110 Table 5. 2 Security Incident Detection using ALEXNET ................................................. 110 Table 5. 3 Averages performance analysis of SVM classifiers......................................... 122 Table 5. 4 Comparative Analysis on roc_auc_score of the classifiers .............................. 124 University of Ghana http://ugspace.ug.edu.gh xiii LIST OF ABBREVIATIONS ATM Automated Teller Machine CNN Convolutional Neural Network FN False Negative FP False Positive GPS Global Positioning System GSM Global System for Mobile Communication MCC Matthews Correlation Coefficient R-CNN Regional Convolutional Neural Network SRSAID Self-Reporting System for ATM Incident Detection SVM Support Vector Machines TN True Negative TP True Positive University of Ghana http://ugspace.ug.edu.gh 1 CHAPTER ONE INTRODUCTION 1.0 Introduction Technology has brought about many improvements in human livelihood, and one of the few is the usage of Automated Teller Machines (ATMs) for financial institutions. ATM is an electronic banking outlet that allows customers to complete basic transactions without the aid of a branch representative or teller. An ATM provides clients of a Bank with 24Hours deposit and withdrawal services [1][2]. Cash dispenser machines or ATMs are installed in Banks and other strategic locations for the convenience of the customer. As we move towards social distancing and a cashless society, ATMs will continue to play a vital role in paperless financial transactions. All these show the importance of ATMs in financial transactions; however, security and other technical system defects remain a key challenge for Banks and their ATMs [2]. In security management, Banks deploy security personnel to monitor their ATMs in addition to other technologies such as Close Circuit Television (CCTV) cameras. Monitoring ATMs can become tedious due to the complexity and multiplicity of ATM devices located in and off the Bank’s premises. Some of the monitoring activities performed by the security personnel include the following[3]: ❑ Monitoring security incidents and the operational status of ATMs. ❑ Monitoring system defects incidents. ❑ Protection from attempted ATM tampering, theft, and vandalism. University of Ghana http://ugspace.ug.edu.gh 2 ❑ Dealing with perpetrators [4]. While some form of security is incorporated in ATM installations, these security mechanisms are often subjected to several escalating security incidents. These security incidents can be categorized as follows; ❑ Card skimming ❑ Card trapping ❑ Transaction Reversal Fraud ❑ Cash trapping ❑ Physical attack ❑ Robbery ❑ Jackpotting ❑ Logical attacks ❑ Occlusion. Banks and other financial institutions work with third-party companies like NCR Corporation for their ATM installations and related services. NCR Corporation is a leading tech company that provides digital solutions to companies, hospitals, and financial institutions. Its subsidiary company, NCR Ghana, is a significant industry player in the financial sector in Ghana. Their core services include installing, configuring, and troubleshooting ATM devices for the banks they work with. The problems associated with ATM systems, especially on the technical side, become a shared responsibility between financial institutions and the ATM Service Companies like NCR Ghana [5]. Some of the technical problems associated with ATMs include the following: ❑ Dispenser Problem University of Ghana http://ugspace.ug.edu.gh 3 ❑ Non-remedial call ❑ Card reader problem ❑ Pick drive mechanism To mitigate problems associated with ATMs, there is a need for a distributed solution that integrates system defect solutions with security incident solutions using a machine learning approach. This solution offers an advantage over the physical monitoring of ATMs; it also serves as a proactive mechanism for system defects and security incidents. 1.1 Problem Statement Research on the ATM system, its management, and related challenges in the banking sector has been covered in many forms. While some are large –scale, few are done on electronic banking systems and networks such as ATMs [6]. The proliferation of ATMs in the country has aided business; it has also been a breeding ground for fraudsters. Security incidents in ATMs lead to financial losses for the banks. According to the Bank of Ghana (BoG) 2017 Annual Report, banks lost over GH₵ 30 million, a significant chunk of this due to ATM fraud [7]. Recently, most ATMs have in-built cameras that capture activities around them. While these cameras serve as security mechanisms, they do so in a passive state. For instance, the cameras can record footage of security incidents such as skimming and tampering on ATMs; the security camera cannot prevent fraudsters from perpetrating the act. This security mechanism (CCTV cameras in ATMs) has proven inefficient due to its inherent limitations, preventing ATM incidents in an active state or environment. On the other hand, technical defect incidents also share the challenge with ATM security incidents. This is because ATMs with technical University of Ghana http://ugspace.ug.edu.gh 4 problems can be left unattended for months. This is due to the lack of a self-reporting mechanism in ATMs. To ensure the availability, reliability, and security of ATM services, answers should be provided to the following fundamental questions: ❑ What would be the reaction of the ATM user after reading a message from the screen that the machine is temporarily down? ❑ How do we automatically detect ATM problems or incidents at the onset and make an intelligent decision on how to respond to them to bring perpetrators on board, improve profitability, reduce operational support costs, and deliver an amazing customer experience? ATM security incidents were reported at approximately Gh₵1.7million in the 2017 Bank of Ghana Annual Report. Again, Figure 1.1 shows the distribution of the banking sector fraud type and losses, most of which are cybercrime, in which ATMs fall under this category. University of Ghana http://ugspace.ug.edu.gh 5 Figure 1. 1 Fraud type and Gross loss in Ghana [7] ATM faults reported 3970 cases of the Dispenser problem according to NCR, Ghana, 2018 ATM incidents report. ATM security and system defect incidents that cause financial loss and temporarily put the machine down can be prevented by automatically self-reporting these incidents to the Banks, ATM service providers, and security officials would efficiently enhance the handling of incidents. The proposed self-reporting system for ATM incidents detection uses Machine Learning techniques for image classification of attempted instances of ATM security incidents. This technique is extended to the ATM system defects aspect of the problem. This method uses internet of things (IoT) components such as Global System for Mobile Communications (GSM) Alerts embedded in a software application that interacts with a Global Positioning System University of Ghana http://ugspace.ug.edu.gh 6 (GPS) system, which points to the location of a detected incident using the Google Maps Application Programming Interface (API). 1.2 Objectives of Study This study aims to design and implement a self-reporting security and system defect application for ATMs in Ghana. The specific objectives of this study are as follows: ❑ Design and develop a security incident detection model for ATMs based on Convolutional Neural Networks (CNN). ❑ Design and develop a defective incident detection model based on Support Vector Machines (SVMs). ❑ Develop along with GSM alerts and indications on Google Maps for locating ATM incidents. ❑ Develop and deploy a user interface (UI) that provides access to monitoring all ATMs and system resources on the ATM network. ❑ Develop an integrated system for incident and defect detection modules with microcontroller-based hardware for overall proof of concept. ❑ Test the proposed model on a real-world ATM terminal on a Bank’s ATM and inject a range of common incidents to evaluate efficacy. 1.4 Justification of Study As we move towards a cashless economy, ATMs and Information Communication Technology (ICT) related technologies will play a vital role in banking activities. ATM security and its University of Ghana http://ugspace.ug.edu.gh 7 defect management continue to become a significant concern for Banks, ATM Service Providers, and other third-party stakeholders like state security, the Bank of Ghana, and the general public. The financial losses due to ATM fraud associated with tampering have been enormous, and there is an urgent need for improvements to mitigate such occurrences. While contributing to the scientific community, this research also addresses a real-world banking sector problem for decades. This research is not intended to phase out the work of ATM technicians; however, it seeks to augment and improve their services in a timely and efficient manner. 1.5 Research Limitation The research takes into account two main aspects of ATM problems; security incidents and system defect incidents. While the security incidents are reported in real-time through the issuance of SMS alerts to mobile devices of the banks and ATM service providers, a dashboard interface also receives such reports. The system defect problems, on the other hand, are only reported on the dashboard. This implies a worker from the bank needs to monitor the system for periodic updates. 1.6 Thesis Outline This study exhibited in this thesis gives details of Machine Learning techniques for ATMs to detect incidents and is organized into six chapters. The remaining chapters continue as follows: Chapter two presents a literature review of the current research on ATM incidents; Chapter two provides a literature review of the recent research on ATM incidents, Machine Learning University of Ghana http://ugspace.ug.edu.gh 8 techniques and algorithms, fine-tuning techniques, and their applications to detect ATM incidents. Chapter three discusses system requirements, specification analysis, blueprints, and the careful selection of development tools for the incident detection models. Chapter four focuses on the development of the models and discussions on accuracy metrics. Results and findings are interpreted as well. Chapter five presents the system simulation and experiment, implementation process, testing and results, and discussion of results. The outcome of the experimental processes will illustrate the capabilities of the proposed system to perform security incidents detection and security incidents categorization. Moreover, ATM system defects incident detection is discussed as well. Chapter six summarizes the findings of the work reported in this thesis, challenges and observations, and suggestions for future work. University of Ghana http://ugspace.ug.edu.gh 9 CHAPTER TWO LITERATURE REVIEW 2.0 Introduction This chapter reviews related work on ATM security and defect incidents. It also reviews work on several methodologies and proposed solutions within the scope of ATM fraud and technical defects. These solutions' algorithms and performance metrics are presented in a tabular format. 2.1 ATM Network Architecture Figure 2. 1 ATM Network Diagram University of Ghana http://ugspace.ug.edu.gh 10 Figure 2.1 shows the network diagram for an ATM system. In the diagram, the subcomponents of the network setup are listed below: ❑ ATM ❑ Switch ❑ Router ❑ File server ❑ Firewall ❑ Customer/services such a VISA card and Master Card. Based on technical information gathered at the NCR, the components of an ATM system can be grouped into two: thus, the upper part and the lower part. The upper part comprises the central processing unit (CPU) and the card reader. The lower part of the ATM consists of two parts: suction technology and domain safe. The Suction Technology The suction technology is a mechanism the ATM uses to dispense money to the upper part of the ATM. It enables the cassettes to present the dispensed cash in belts [8]. The Domain Safe This part of the ATM contains the presenter, the network card, and other networking components. The NCR ATMs run a middleware that enhances the network communication between the switch and the card reader [8]. University of Ghana http://ugspace.ug.edu.gh 11 2.2 ATM Incident Detection Incident detection is finding an attacker's activities who deliberately seek to circumvent protocols on network infrastructure. Some malicious activities are mitigated by retracing, containing the threats, and removing their foothold [9]. Learning how attackers compromise systems and move around the network can better detect and stop attacks before valuable data is stolen or the system collapses. 2.2.1 Object Detection Object detection is one of the classical problems of computer vision and is often described as a difficult task. Object detection is a computer vision task because it involves creating a solution invariant to deformation and changes in lighting and viewpoint. Object detection is a problem because it involves locating and classifying an image's regions [10]. We need to know where the object might be and how the image is segmented to detect the object. That creates a type of chicken-and-egg problem. To recognize the shape (class) of an object, its location must be known, and to identify the location of an object, its shape must be known [11]. Some visually dissimilar features, such as the clothes and faces of a human being, may be parts of the same object, but it is difficult to know this without recognizing the object first. On the other hand, some objects stand out slightly from the background, requiring separation before recognition[12]. Low-level visual features of an image, such as a saliency map, may be used as a guide for locating candidate objects [13]. The location, shape, and size are typically defined using a bounding box stored in corner coordinates. Using a rectangle is more straightforward than using an arbitrarily shaped polygon, University of Ghana http://ugspace.ug.edu.gh 12 and many operations, such as convolution, are performed on rectangles in any case. The sub- image in the bounding box is then classified by an algorithm trained using machine learning [14]. The boundaries of the object can be further refined iteratively after making an initial guess [15]. During the 2000s, popular solutions for object detection utilized feature descriptors, such as scale-invariant feature transform (SIFT) [16] developed by David Lowe in 1999, and histogram of oriented gradients (HOG) popularized in 2005 [16]. In the 2010s, there has been a shift towards utilizing convolutional neural networks [14][10][17]. Before the wide-scale adoption of CNNs, there were two competing solutions for generating bounding boxes. A dense set of region proposals is generated in the first solution, and most of these are rejected [18]. This typically involves a sliding window detector. The second solution generates a sparse set of bounding boxes using a region proposal method, such as Selective Search [12]. Combining sparse region proposals with convolutional neural networks has provided good results and is currently popular [10]. 2.2.2 Computer Vision Computer vision is divided into object detection and extracting meaningful information from digital images or video content. This is distinct from mere image processing, which involves manipulating visual information on the pixel level. Computer vision application includes image classification, visual detection, 3D scene reconstruction from 2D images, image retrieval, augmented reality, machine vision, and traffic automation[19]. Today, deep learning and machine learning are necessary for many computer vision algorithms. The algorithms can be described as image processing and machine learning. Effective solutions University of Ghana http://ugspace.ug.edu.gh 13 require algorithms that can cope with the vast amount of information in visual images and, critically for many applications, can perform the computation in real time[20]. 2.3 Machine Learning Classification Algorithms Classification is a technique to categorize datasets into a desired and distinct number of classes where labels can be assigned to each class. Classification applications are image classification, speech recognition, handwriting recognition, and others [21]. Classification algorithms weigh the input features so that the output separates one class from the other. Classifier training is performed to identify the weights (functions) that provide the data classes' most accurate and best separation. Classification decisions are made based on the results of classifiers. Such classification algorithms are Linear Discriminant Analysis (LDA), Support Vector Machines (SVM), Random Forest, and Artificial Neural Networks [21][22]. LDA is the most basic classifier that identifies the linear weighting of multifactorial data as a means to maximize the distance between the means of the classes. Classification algorithms such as SVM, ANN, and others perform well on large datasets and are the recent computational approaches that generate more complex divisions between classes of the datasets[15][19][23]. Multiple classifiers can also be used, and training and classification decisions can be made based on the results of all the classifiers. The success of the classification algorithms is determined by the performance metrics[22]. The classification algorithm processes the features to produce a class decision. These processes involve a statistical algorithm that maps the different classes into different regions of feature space. University of Ghana http://ugspace.ug.edu.gh 14 2.3.1 SVM-OVO-OVR Approach The most accurate methodology for ATM defect incidents datasets is a combination of the OVR approach and SVM using all components provides a powerful technique. A multi-class classification problem can be defined as: Given n i.i.d. data points: ( ) ( )1 1, ,...., , ,such that 1,..., and {1,...., }d n n i ix y x y x R for i n y K =  (2.1) The class label for the data point ix determines a classifier with the decision function, ( )f x such that ( )y f x= where y is the class label for 𝑥 [24]. The performance of the classifier is measured in terms of total classification error or classification accuracy over a set of testing data. The classification error for data point x is defined as: ( )( ) ( )0   , 1 if y f x E y f x otherwise = =   (2.2) There are two approaches suggested for multi-class SVM in literature [21]. One is considering all data in one optimization. The other is decomposing multi-class into a series of binary SVMs, such as "One-versus-Rest" (OVR) and "One-versus-One" (OVO) [24][25][22]. OVO-SVM is the most basic scheme used for the implementation of SVM-multiclass classification. With this simplest SVM extension to the k class− problem, k binary SVM models are constructed. In thj binary class− SVM problem jclass C is separated from the remaining classes. All k binary SVM classifiers are combined to make a final multiclass University of Ghana http://ugspace.ug.edu.gh 15 classifier. The remaining classifiers mean that all the data points from classes other than jc are combined to form one jclassC . Given the ATM defects training dataset, correspondingly to : nXn R F in the feature space F , the calculated optimal hyperplane that separates data points from the jclassC , and the icombined classC , is found using SVM methodology’. The optimal separating hyperplane distinguishing the jclassc and the icombined classc icombined classc is represented as: ( ) ( ). , 1,.....j j jg x w x b j k= + = (2.3) Assuming the training dataset is correctly classified, as shown in Figure 4.2.4, the SVC computes the hyperplane to maximize the margin separating the classes [26] (area of failure description, cause code description, and problem code description). The problem is a quadratic optimization problem (QP) which gives the dual formulation of the Langrangian. The dual Langrangian (LD) is maximized with non-negativity ( )jg x and can be determined by solving the following dual from: ( ) 1 1 1 Maximize : , 2 n n m D i i j i j i j i i j L y y k x x   = = = −  (2.4) 0 1,....,i iSubject to C n  = And 1 0 n i i i y = = (2.5) University of Ghana http://ugspace.ug.edu.gh 16 The decision rule ( )jf x that assigns the vector x to the jclassC or the combined iclassc is given by: ( ) ( )( )j jf x sign g x= (2.6) If there are no positive votes or more than one classifier with positive votes, then no decision about the class label is made. g 1 (x) = 0 g 3 (x) = 0 g (x) = 0 2 Class 1 Class 2 Class 3 Figure 2. 2 Class boundaries for SVM-OVR formulation of the three-class problem The main difficulty in this approach is that the outputs of the classifiers ( )jf x are binary values. The usual way to handle this problem is to ignore the sign operator in the equation. After finding all the optimal hyperplanes given by ( ) 1,.....ig x for j k= , we say 𝑥 is in the class which has the largest value of the decision function and is given by: arg max 1,.... , jclass label x j k g x= = (2.7) University of Ghana http://ugspace.ug.edu.gh 17 In this approach, the index of the largest component of the discriminant functions: ( ) , 1,....jg x j k= (2.8) This is assigned to the data point x [24]. This approach is called winner-takes-all. To make a final decision, k binary problem should be solved. A dual problem has to be translated to solve each binary problem containing n data points. Hence, ( ).k n variable quadratic programming problems are to be solved. Class boundaries for the three-class problem are shown above in Figure: 2.2. Mathematical Foundation for Support Vector Machines (SVM) Support vector machines (SVM) are statistical machine learning algorithms that classify by finding a hyper-plane that maximizes the margins between classes”. “SVM is limited to very large datasets due to the dense nature and memory requirement of the quadratic form of the dataset” [26]. Given the ATM defects datasets, correspondingly to : nXn R F in the feature space F . A hyperplane is defined by its normal vector . “Given a hyperplane  and a point x define 0x to be the closest point to x the hyperplane, which is the closest point to x that satisfies 0. 0w x = as in figure 2.3. University of Ghana http://ugspace.ug.edu.gh 18 W. x = k W. x = k x* 0 x* Figure 2. 3 The distance between and of the multiclass classification The following two equations are obtained: . for some ,w x k= 0. 0w x = (2.9) Subtracting the above equation, we obtained the following: ( )0.w x x k− = (2.10) Dividing by the norm of w , this is obtained: ( )0. || || || || w k x x w w − = (2.11) 0are a unit vector and is parallel to || || w x x w w − University of Ghana http://ugspace.ug.edu.gh 19 0|| || || || k x x w − = (2.12) Nevertheless, SVM is an excellent example of supervised learning that maximizes the generalization by maximizing the margin and supports the nonlinear separation using kernelization. SVM tries to avoid over-fitting and under-fitting. The margin in SVM denotes the distance from the boundary to the closest data point in the feature space [22][21][25] In general, there may be many separating hyperplanes. In this problem, this separating hyperplane is the boundary separating a given ATM defect incident class from the rest (OVR) or separating two different ATM defect incident classes (AP). The hyperplane computed by the SVM is the maximal margin hyperplane, the hyperplane with maximal distance to the nearest data point. Finding the SVM solution requires training an SVM, which entails solving a convex quadratic program with as many variables as training points[27]. Using the OVR methodology to combine binary SVM classifiers into a multiclass classifier. A separate SVM is trained for each class, and the winning class has the largest margin, which can be considered a signed confidence measure. In the experiments described in this thesis, there were few data points in many dimensions. Therefore, a kernel corresponding to a linear (regularized) classifier was used as the SVM solution. Although we did allow the hyperplane to make misclassifications, in all cases involving the full 16,063 dimensions, each OVR hyperplane fully separated the training data with no errors. Some training errors in some experiments involved explicit feature selection with very few features. This may indicate that we could select a very small number of features and then use a nonlinear kernel function to improve classification; however, preliminary experiments with this approach yielded no improvement over the linear case. An SVM is trained using all features. The features are ranked University of Ghana http://ugspace.ug.edu.gh 20 according to the magnitude of the elements of the resulting hyper plane, so the importance of feature i is the weight of the ith element of the hyper-plane. 2.3.2 Convolutional Neural Networks (CNN) The convolutional neural network is a deep learning algorithm that can take an input image, assign weights and biases to various aspects or objects in the image, and distinguish one from the other, as indicated in figure 2.4. A convolutional neural network is mainly for image classification. A concept inspired the basic idea of CNN in biology called the receptive field [13][20]. Receptive fields are a feature of the animal visual cortex [28]. They act as detectors sensitive to certain stimulus types, for example, edges. They are found across the visual field and overlap each other. flattened Conv-1 Convolution 5*5 Kernel valid padding Max pooling 2*2 Conv-2 5*5 Convolution Kernel valid padding Max pooling 2*2 FC-3 Fully connected Neural Network Activation FC-4 Fully connected Neural Network Input (f) 28*28*1 n1 Channel 24*24* n1 n1 Channel 12*12 n1 n2 Channel 8*8 n2 n2 Channel 24*24 n1 n3 units Filter matrix g output Robbery Physical attack skimming Occlusion Figure 2. 4 CNN Architecture University of Ghana http://ugspace.ug.edu.gh 21 This biological function can be approximated in computers using convolution [20]. In image processing, images can be filtered using convolution to produce different visible effects. Figure 2.4 shows how a hand-selected convolutional filter detects horizontal edges from an image, functioning similarly to a receptive field. The discrete convolution operation between an Image f and a filter matrix g is defined as:           1 1 , , * , , , M N m n h x y f x y g x y f n m g x n y m = = = = − − (2.13) In effect, the dot product of the filter g and a sub-image of f (with same dimensions as g) entered on coordinates ;x y produces the pixel value of h at coordinates ;x y . The size of the filter matrix adjusts the size of the receptive field. Aligning the filter successively with every sub-image of f produces the output pixel matrix h . In neural networks, the output matrix is also called a feature map or an activation map after computing the activation function). Edges need to be treated as a special case [20]. The output size decreases slightly with every convolution if an image is not padded. 2.3.3 Regional Convolutional Neural Networks (RCNN) Regional Convolutional Neural Network (R-CNN) is a state-of-art visual object detection system that combines bottom-up region proposal with rich features computed by a convolutional neural network. R-CNN forward computation has several stages [10][13][20]. First, the regions of interest (ROIs) are generated. The ROIs are category-independent bounding boxes with a high likelihood of containing an interesting object. A separate method called Selective Search is used to generate these regions [12][17][29]. Given an image, the R-CNN uses selective search to generate around 2000 region proposals to compute features using a Convolutional Neural Network (CNN). Region proposals are regions University of Ghana http://ugspace.ug.edu.gh 22 that include the potential object. It will be wrapped as 277 277 RGB to fit into CNN. Feature extraction is done in the CNN layers and passed to multiple binary classifiers to determine the class for particular regions. Figure 2.5 illustrates the R-CNN stages. Label 1 Label 2 Label 3 Label 9 Label 10 Label 11 1 original input image 2 Feature proposal extraction 3 A wraped proposal 4 Classification using CNN 5 Classification Figure 2. 5 Stages of R-CNN forward Computation R-CNN forward computation has several stages. First, the regions of interest are generated. The Regions of interest are category-independent bounding boxes with a high likelihood of containing an interesting object. Selective Search [12] is used for generating these. Next, a convolutional network extract features from each region proposal. The sub-image contained in the bounding box is warped to match the input size of the CNN and then fed to the network. After the network has extracted features from the input, the features are input to the regional convolutional neural network (R-CNN) that provides the final classification. The requirements in the pre-processing stage in a convolutional neural network are much lower than University of Ghana http://ugspace.ug.edu.gh 23 in other classification algorithms. CNNs have the ability to their filters or detectors through enough training. A concept inspired the architecture of CNN in biology called the receptive field [10]. Receptive fields are a feature of the animal visual cortex [28]. They act as detectors sensitive to certain stimulus types, for example, edges. They are found across the visual field and overlap each other [20]. 2.4 Transfer Learning Transfer learning (TL) is a research problem in machine learning (ML) that focuses on storing knowledge gained while solving one problem and applying it to a different but related problem. Transfer learning is an approach to transferring a part of the network that has already been trained on a similar task while adding one or more layers at the end and then re-train the model. Transfer learning is the idea of overcoming the isolated learning paradigm and utilizing knowledge acquired for one task to solve related ones[30][31]. Transfer learning is something that data scientists and researchers believe can further our progress toward Artificial general intelligence (AGI)[32]. AGI is the hypothetical intelligence that can understand or learn any intellectual task that a human being can do [31]. It is a primary goal of some artificial intelligence research and a common topic in science fiction and future studies. AGI can also be referred to as strong AI, full AI, or general intelligent action [31]. Some academic sources reserve "strong AI" for machines that can experience consciousness. Today's AI is speculated to be decades away from AGI [31]. In transfer learning, the neural network (NN) such as CNN is trained in two stages, namely: ❑ Pre-training ❑ Fine-tuning. University of Ghana http://ugspace.ug.edu.gh https://en.wikipedia.org/wiki/Artificial_general_intelligence https://en.wikipedia.org/wiki/Human_being https://en.wikipedia.org/wiki/Artificial_intelligence https://en.wikipedia.org/wiki/Artificial_intelligence https://en.wikipedia.org/wiki/Science_fiction https://en.wikipedia.org/wiki/Futures_studies https://en.wikipedia.org/wiki/Chinese_room#Strong_AI 24 The network is trained on a large-scale benchmark dataset representing a wide range of categories with the pre-training. For fine-tuning stage, the network is further trained on the specific target task of interest, which usually has fewer labeled examples than the pre-training datasets. Such neural network architectures include ALEXNET, ImageNet, ResNet50, Inception, Mobilenet V2, and others. Researching in transfer learning, some studies suggest that deep learning models trained for a classification task can be employed for classification. Thus, the CNNs models trained on a specific dataset or task can be fine-tuned for a new task, even in a different domain [33]. It has been applied successfully for visual categorization tasks in object recognition, image classification, and human action recognition[33]. 2.5 Performance Metrics for Security Incidents Detection Among the commonly used performance metrics for security incident detection are the mini- batch size and epoch. Mini-Batch size: Mini-batch size is the subset of the training images at every epoch. It is used to update the weights. A different mini-batch is used in each iteration. The mini-batch accuracy reported during training relates to the accuracy of the particular mini-batch at the given iteration[34]. Epoch: An epoch is a hyperparameter characterized before training a model in deep learning. One epoch is the point at which a whole dataset is passed forward and reversed through the neural network system once[34][33]. University of Ghana http://ugspace.ug.edu.gh https://arxiv.org/abs/1512.03385 https://arxiv.org/abs/1512.00567 25 Confidence Score: A confidence Score is an ordered set of values that can be easily compared. It is a decimal number between 0 and1, which can be interpreted as a percentage of confidence. For defects incident detection, in a multiclass classification, there are four (4) main categories a predicted sample of a classifier will belong to [35]: ❑ True positives (TP) ❑ False positives (FP) ❑ True negatives (TN) ❑ False negatives (FN) These four categories are used to form the confusion metrics [35] as indicated in the table below: Table 2. 1 Confusion Matrix Actual condition positive Actual condition negatives Predicted positive condition TP FP Predicted negative condition FN TN Among the commonly used performance metrics are accuracy and error rate. The number of correctly classified instances determines the accuracy of the classifier, and the error rate is the number of instances that are incorrectly classified [35][25]. The overall accuracy is given by: TP TN A TP T FP FN + = + + + (2.14) University of Ghana http://ugspace.ug.edu.gh 26 r TP TN E TP TN FP FN + = + + + (2.15) However, accuracy can be misleading for highly imbalanced datasets as these metrics favor the majority class [36][35]. For instance, in a dataset where the number of majority instances largely outnumbers the number of minority instances by a ratio 99 :1, the classifier will likely classify all the majority instances correctly and misclassify the only one available minority instance. Hence there will be 99 true negatives, 0 false negatives, 1 false-positive, and 0 true positives, resulting in an accuracy of 99% , which is very misleading since the accurate prediction of the positive class is usually more desirable [35]. 0 99 0.99 0 99 1 0 Acc + = = + + + (2.16) To avoid this inconsistency, it is recommended that performance measures based on class metrics are used [35] [37]. These metrics are listed below: TP TPR TP FN = + (2.17) TN TNR TN FP = + (2.18) FP FPR TN FP = + FN FNR TP FN = + (2.19) 1 2 TPR FPR AUC + − = (2.20) 1 TPR N G Mean  − = (2.21) University of Ghana http://ugspace.ug.edu.gh 27 ( )( )( )( ) TP TN FP FN MCC TP FP TP FN TN FP TN FN  −  = + + + + (2.22) University of Ghana http://ugspace.ug.edu.gh 28 2.6 Summary of Papers Table 2.2 shows the summary of relevant journal articles that were reviewed. Table 2. 2 Summary of Papers SECURITY INCIDENT PAPER TITLE GENERAL OVERVIEW METHODOLOGY STRENGTH AND WEAKNESSES ACCURACY 1. Fast R-CNN for object detection Author(s): Girshick, Ross The author(s) looked at ways to improve the efficacy of neural network models in object detection. Datasets used in the project: 1. VGG16 2. ImageNet [10] The paper proposed the Fast Region-based Convolutional Network method (Fast R- CNN) for object detection. Three experiments are used to pre-trained ImageNet models that are available online. The first is the CaffeNet (essentially AlexNet from R- CNN. The second net- work is VGG CNN M 1024 The final network is the very deep VGG16 model. Fast R-CNN was implemented in Python and C++ (using Caffe) and is Strength: 1. Fast R-CNN trains very deep VGG16 network 9 times faster than R-CNN. 2. Simulations of Fast R-CNN show 213 times faster at test- time and achieve a higher Mean Average Precision (mAP) on PASCAL VOC 2012. 3. Compared to SPPnet, Fast R-CNN trains VGG16 3× faster, tests 10× faster, and is more accurate. Weakness: The author(s) acknowledge the need to further develop this method due to undiscovered techniques that allow dense R-CNN achieved a Mean Average Precision (mAP) of 66.0%, SPPnet, 63.1% And Fast R-CNN, 66.6% University of Ghana http://ugspace.ug.edu.gh 29 available under the open- source MIT License. boxes to perform and sparse proposals. 2. ATM- Security using machine learning techniques in IoT Author(s): Udhaya Kumar N, Sri Vasu R, Subash S, Sharmila Rani D. In this paper, the authors designed a security system that gives access to the user of an ATM only after identifying the user's image taken by the CCTV. The image captured will be compared to the image stored in a database. This project will give access to the user only after identifying the user's image taken by the CCTV in the ATM and comparing the specified image with the user's image stored in the database created during the account creation, which comes under the banking session of banks. Sometimes, the authorized user cannot use the ATM for emergency purposes. In such cases, the OTP is sent to the user's registered mobile This paper proposed a security system for ATMs using deep learning techniques for face detection and recognition. IoT components like a Camera, RFID reader, Tag, Relay, Motor, and a Raspberry pi 3 (2015 version) were used. The authors used the OpenCV platform and Python to implement the Local Binary Patterns (LBP) algorithm. And an alert message is sent to the authorized user as a text message if the user is found to be the third Strength: The proposed solution operates in real-time, thereby providing security in an active state. Weaknesses: The system is fully reliant on face detection and recognition system. This makes it imperative for the user to be physically present and granted access to the ATM. As much as they provided an alternative with an OTP sent to the user in such circumstances, it is less effective in the real-world scenario. The method detected faces without given accuracies University of Ghana http://ugspace.ug.edu.gh 30 number, and the person who came instead of the authorized user has to enter the OTP that the authorized user received. This method will reduce the risk of ATM usage by the common people [38]. Datasets used in the project: 1. Dataset1 2. Dataset2 3. Fast Edge Detection Using Structured Forests Author(s): Piotr Dollar and C. Lawrence Zitnick In this paper, the author(s) analyzed edge detection as a critical component of many vision systems, including object detectors and image segmentation algorithms. Patches of edges exhibit well-known forms of local structure, such as straight lines or T- junctions. This paper takes advantage of the structure in local image patches to learn an accurate and We predict local edge masks in a structured learning framework applied to random decision forests. Our novel approach to learning decision trees robustly maps the structured labels to a discrete space where standard information gain measures may be evaluated. A result is an approach that obtains real-time performance that is orders of magnitude faster than many competing state-of-the-art approaches while also achieving state-of-the-art edge detection results on the Strength: The proposed solution operates in real-time, thereby providing security in an active state. Weaknesses: The system is fully reliant on face detection and recognition system. This makes it imperative for the user to be physically present and granted access to the ATM. As much as they provided an alternative with an OTP sent to the user in such circumstances, it is less effective in the real-world scenario. Initially, their methodology achieved 71% ODS accuracy and increased to 75% ODS after the parameter sweep. University of Ghana http://ugspace.ug.edu.gh 31 computationally efficient edge detector. Finally, we show the potential of our approach as a general- purpose edge detector by showing our learned edge models generalize well across datasets [39]. Datasets used in the project: 1. BSDS500 Segmentation dataset 2. NYU Depth dataset 4. Deep Learn Helmets-Enhancing Security at ATMs Author(s): K. Bavithra Devi, S. Mohamed Mansoor Roomi, M. Meena. The authors took a comprehensive survey on ATM infrastructure in India, with over 1.2 billion people. Their survey took into perspective the perennial security challenges of ATMs. The author(s) advanced their analysis that real-time intelligent video analytics offers advanced monitoring capabilities that give sophisticated video surveillance to The author(s) designed an image detection of the helmet using Deep Learning Convolutional Neural Network (CNN) architecture such as VGGNET (Visual Geometry Group) and ALEXNET. The helmet region is detected using (Region Convolutional Neural Network) RCNN with 15 layers. The performance of this technique has been tested on 880 test images out of 1880 images in a database. The parameters are chosen to compare the different mini- Strength: The proposed solution operates in real-time, thereby providing security in an active state. Weaknesses: The system is fully reliant on a helmet detection and recognition system. Suppose the person’s head is bald or any object resembling a helmet. A person without a helmet is detected with high accuracy. The accuracy for helmet detection using ALEXNET is 96.03 University of Ghana http://ugspace.ug.edu.gh 32 recognize abnormal activities. Persons wearing helmets in the ATM center is one of the anomalous activities. In such a scenario, an automatic helmet detection algorithm is required to alert the person wearing a helmet in ATM. [39]. batch sizes and epochs in ALEXNET 5. Selective Search for Object Recognition Author(s): J. R. R. Uijlings · K. E. A. van de Sande · T. Gevers A. W. M. Smeulders The authors in this paper addressed the problem of generating possible object locations for use in object recognition. Our selective search results in a small set of data-driven, class- independent, high- quality locations, yielding 99% recall and a Mean Average Best Overlap of 0.879 at 10,097 locations. Compared to an exhaustive search, the reduced number of locations enables more robust machine- learning techniques and appearance The authors introduced selective search, combining the strength of an exhaustive search and segmentation. Like segmentation, their work used image structure to guide the sampling process. The exhaustive search was aimed at the capture of all possible object locations. Instead of a single technique to generate possible object locations, we diversify our search and use various complementary image partitioning to deal with as many image conditions as possible. Strength: The proposed solution operates in real-time, thereby providing security in an active state. Weaknesses: The system is fully reliant on a helmet detection and recognition system. Suppose the person’s head is bald or any object resembling a helmet. A person without a helmet is detected with high accuracy. The selective search for object recognition is 99% University of Ghana http://ugspace.ug.edu.gh 33 models for object recognition. In this paper, we show that our selective search allows the powerful Bag-of-Words model for recognition. The selective search software is made publicly available[12]. 6. Anomaly Detection on ATMs via Time Series Motif Discovery. Dirk Walther, Maximilian Riesenhuber, Tomaso Poggio The authors looked at a 2014 incident where skimming attacks on ATMs resulted in approximately 280 million Euros in losses within the European Union sub-region[40]. The authors used an innovative piezoelectric sensor network to capture the ATM state and analyze the occurring vibrations. The complex quad-tree wavelet packet transform inspected the captured signals, which provided them with a broad frequency analysis of a signal in various scales. Features were extracted from the selected scale based on the information content to detect motifs. The detected motifs provided the detected prototype patterns for anomaly detection or classification tasks. Strength: The practical results showed that the proposed approach could classify normal and abnormal signals via motif discovery Weaknesses: The system is fully reliant on signal classification and does not and do not report on time. Motif achieved 60.0% classification results, 58.8 F- Measure, 71.4 sensitivity, and 50.0 precision. University of Ghana http://ugspace.ug.edu.gh 34 DEFECT INCIDENTS PAPER TITLE 7. Automated Teller Machine Analysis under Host-Bank Systems through Telephone Network Author(s): Kuldeep Nagiya, Mangey Ram. This work demonstrates the performance of an ATM network. The different types of component failure, such as an ATM, telephone network, power supply, etc., were taken for the study. The author(s) considered ATM functionality problems of power supply: 1. Power supply through electricity board and 2. Power supply through a generator. [41]. The various performance and reliability characteristics of the ATM network were accessed by using a supplementary variable technique, Laplace transformation, and the Markov process. The authors used mathematical modeling, Laplace transformation, supplementary variable techniques, and the Markov process Strength: The reliability characteristics of the ATM network were found through the approach. Weaknesses: The technique relies on the power supply components of the ATM and not the essential components such as the card reader, dispenser, and others which are the main components that users of ATMs access. The technique is inefficient because detecting only power failure does not make the ATM defective. The technique achieved the following reliabilities: At time 0, reliability is 1, at time 1; reliability is 0.985042, time 2, reliability is 0.970173; through time 15, with the reliability of 0.787659 8. ATM management prediction using Artificial Intelligence techniques: A survey Author(s): Seyed Mohammad Hossein In this paper, the author(s) discussed forecasting cash demand, fraud detection, ATM failure, user interface, replenishment strategy, ATM Artificial Intelligence (AI) techniques were discussed to detect fraud, failure, replenishment, and crash prediction. Several statistical methods used to evaluate these forecasts are also covered in this paper. Strength: The techniques used yielded appreciable results Weaknesses: The ANN model of 20.6 mean average percentage error (MAPE) was achieved, SVR achieved 25.1, Stepwise Autoregressive achieved 46.55, University of Ghana http://ugspace.ug.edu.gh 35 Hasheminejad and Zahra Reisjafari location, and customer behavior[42]. Moreover, we review AI techniques such as neural networks, regressions, and support vector machines and their results in graphs in different sections. The literature covered in this paper is related to the past ten years (2006-2016). The approaches studied in this paper are compared regarding data sets and prediction performance, accuracy, etc. We also provide a list of data sets available for the scientific community to research in this field. Finally, open issues and future works are presented in each of these items. Different datasets were used for each technique; hence, there is no comparison point. Holt-Winters Additive achieved 53.05, and Exponential Smoothening of 55.87. 9. Agent-Based Faults Monitoring in Automatic Teller Machines. Author(s): Bashir Sulaimon Adebayo, Mohammed Idris Kolo. In this paper, the authors worked on ATM systems in Nigeria. The main area of this research was the challenges in maximizing the uptime of ATMs due to a wide gap in fault detection, notification, and correction of the ATMs. One way to alleviate this situation is The authors proposed architecture for rule-based and intelligent agent-based monitoring and management of ATMs. The agents remotely monitor the ATMs and control functions such as software maintenance. A system administrator can securely modify agents' monitoring policies and control functions. The framework presented Strength: Reduction in the mean time to repair (MTTR) by quickly isolating problems in critical business transactions. • Ability to use remote diagnosis information to minimize the number of trips made to the ATM. • Ability to monitor individual processes on the ATM and reset when necessary. The paper is purely a software approach and does not have a prototype to determine the accuracy of the technique. University of Ghana http://ugspace.ug.edu.gh 36 through intelligent monitoring of ATMs by resident software agents that monitor the device and report faulty components in real-time to facilitate quick response [43]. includes a software fault monitor, hardware fault monitor, and transaction monitor. Finally, a set of utility support agents, caller, and log agents alert the network operator, log error, and transaction information in a database. • Ability to dynamically update diagnosis rule by changing to remote diagnosis information to minimize the number of trips made to the ATM. • Ability to monitor individual processes on the ATM and reset when necessary. • Ability to dynamically update diagnosis rules with changing environments. Weaknesses: The agents residing on the ATM devices can declare the state of the ATM after the failures of the components, which does not minimize the downtime experience. 10. ATM Availability Management System Author(s): Sujata Rao1 and Hrushikesh Mane2 The author(s) worked on the ATM monitoring process as a key ATM Availability Management Solution component. Handling the incidents related to ATMs and monitoring the device level health of the entire ATM fleet is the prime study of this paper. The Master View helps a financial Various reports and statistics about the transactions across multiple demographics provide helpful information like a) the cities using ATMs heavily, b) the most popular transactions, c) ATMs having heavy transaction volumes, etc. Monitoring ATMs helps significantly identify the various causes behind slow or failed customer transactions to reduce service incidents and maximize the ATM uptime Strength: The Master View Resolve (MVR) can carry out maintenance tasks. Weaknesses: The MVR has the limitation of detecting ATM problems at the onset. Master View Resolve (MVR) can handle ATM network failure of more than 90%. University of Ghana http://ugspace.ug.edu.gh 37 institute improve the availability of their self- service terminals and minimize downtime. [44]. through Master View Resolve (MVR). University of Ghana http://ugspace.ug.edu.gh 38 2.7 Conclusion and Summary of Literature Review The body of academic literature in the aforementioned papers presented in this chapter proves a point for the need to adopt the machine learning approach for this work. The machine learning approach provides an avenue for a dynamic, efficient, lightweight methodology. From the literature, some significant setbacks in using a machine learning approach are the lack of a dataset for specific problems and a biased dataset in some instances. These challenges have been solved using some machine learning techniques and also the use of synthetic data. In effect, the primary technique for this work uses machine learning models for incident detection. University of Ghana http://ugspace.ug.edu.gh 39 CHAPTER THREE METHODOLOGY 3.0 Introduction This chapter discusses the research strategy, the research method, the research approach, the methods of data collection, and the sample selection. It also includes the research process, the type of data analysis, the ethical considerations, and the research limitations of the project. This study involves a field survey and software development: 3.1 Proposed System Design The flowchart for the SRSAID system design is shown in figure 3.1. The figure shows the various stages in blocks as well as the processes involved for each block. start Is Dataset Noisy? No Send GSM alerts and indicate on Google Map End Is there incident? Display ATM state on software dashboard No Process data Collected ATM camera feeds and system faults Processing using R-CNN with ssdlite mobilenet V2 architecture Categorization of incidents from the ATM is processed data camera feeds? yes No Processing using SVM Put the processed dataset in classes yes Figure 3. 1 Flowcharts for Proposed Design University of Ghana http://ugspace.ug.edu.gh 40 3.2 Field Survey A field survey was conducted to solicit information on the user experience of persons who have used ATMs over the past 12 months. The population group for this survey was a cross-section of bank customers on the University of Ghana campus, some of the banks' staff, and NCR staff. A structured questionnaire was presented to the selected people, and the answers provided were used in the analysis. The main reason for using these instruments was to collect enough firsthand information from respondents. Drawing from [45], it is argued that with a semi-structured interview, the interviewer has more freedom to pursue his idea and can improvise the questions [46] confirmed the use of the interview by stating that it is a face-to-face questioning of respondents to obtain information. The study was based on primary and secondary sources of data. The primary source of data was obtained from interviews and questionnaires, which were administered to ten (10 officials of NCR and ten (10) clients and staff from different banks who are using the electronic-banking service (ATMs) of the banks. Secondary data was also collected from research reports, Agricultural Development Bank (ADB), Ghana, Ghana Commercial Bank (GCB), Barclays Bank, Ghana, Republic Bank, Ghana, and other published materials. The purposive sampling method that allows the researcher to select particular participants needed for specific information was adopted to select the NCR officials directly related to ATM banking services of all the banks in Ghana. Ten (10) officials of NCR, Ghana, which included the manager of the ATMs in Ghana, the Head of engineering, and eight (8) staff from NCR as University of Ghana http://ugspace.ug.edu.gh 41 primary respondents. Random sampling was adopted to select 10 customers of different banks that use NCR ATMs. 3.2.1 Data Collection Computer vision base approaches depend on collected data or image datasets. It is essential to analyze object features and to review the performance of detection algorithms. Some databases are available for object, character, and scene recognition[34]. There is no database for ATM incidents (different images showing security and defect incidents). Hence, thousand (1000) plus images (images showing the various ATM security incidents) were collected from the internet, and the faults dataset in excel format was collected from the ATM service provider (NCR). 3.3 Data Pre-processing 3.3.1 Security Incident Data Pre-processing The data preprocessing stage is the first significant stage in developing the incident detection system. This stage involves using data mining techniques to transform the data from its raw form into the required format used by the convolutional neural network classifier (CNN-C) to detect and identify ATM security incidents. The data preprocessing stage involves removing unwanted images and creating bounding boxes to select the parts of the images that fit for security incidents. This ensures that only valid and relevant information is extracted for the next process. Before the data preprocessing, images were downloaded from the internet and others with TECHNO POP 2 Plus camera into a folder on the raspberry pi named image2. The image data preprocessing involves these steps: University of Ghana http://ugspace.ug.edu.gh 42 ❑ Image data filtering and selection ❑ Feature selection and extraction ❑ Feature adjustment. 3.3.2 Image Dataset and Description The security incident experimental setup used one thousand, one hundred and sixty-five (1,165) image datasets. One thousand, one hundred and fifty-five were obtained from the internet, and the ten were the images captured with the raspberry pi for testing purposes. The images from the internet depict all the activities of security incidents of the ATM. For this study, the multiclass datasets sourced from the internet were improved and grouped into a multiclass classification problem. Table 3.1 shows a summary of the image datasets and their class distributions: Table 3. 1 Summary of Image Dataset Dataset Number of Attributes ATM out of service 62 card trapping 50 cash trapping 64 Jackpotting 46 logical attack 44 Malware 78 University of Ghana http://ugspace.ug.edu.gh 43 Occlusion 73 physical attack 204 Robbery 126 Skimming 55 transaction reversal fraud 50 Samples of Skimming Attack Images Samples of Physical Attack Images Samples of Robbery Images University of Ghana http://ugspace.ug.edu.gh 44 Figure 3. 2 Sample Images for ATM Incident Attacks Defects datasets Eleven thousand, four hundred and fifty-two (11452) 2018 ATM defects incidents were used in the experimental setup obtained from NCR, Ghana. The NCR defects datasets consist of various ATM systems technical problems encountered in 2018, obtained through the company’s database or logs. The multi-class datasets from the NCR Ghana were modified into binary classification problems for this study. Figure 3.3 shows part of the multiclass datasets used by the company and their class distributions: University of Ghana http://ugspace.ug.edu.gh 45 Figure 3. 3 Summaries of defect incident datasets Figure 3. 4 Plot of summaries of defect incident datasets University of Ghana http://ugspace.ug.edu.gh 46 In addition to the images, data obtained from NCR Ghana for ATM Defects was used for the system defect aspect of this project. 3.4 Machine Learning Classification Algorithms To assess the effectiveness and efficiency of the proposed technique, Machine Learning techniques are used. These algorithms were Regional Convolutional Neural Network (R-CNN) and Support Vector Machines (SVM) for defects classifications. These Machine Learning algorithms are considered: Deep learning Regional Convolutional Neural Network (R-CNN), which regression model for image classifications fine-tuned, and Support Vector Machines (SVM) for defects classifications. 3.4.1 Regional Convolutional Neural Network R-CNN forward computation has several stages, shown in figure 10. First, the regions of interest (RoIs) are generated. The RoIs are category-independent bounding boxes that are highly likely to contain an exciting object [10]. This study uses a different method called the integrated method, which works similarly to selective search [20] [12], to generate these through Labelimg. These methods are discussed in further detail in figure 3.5. Next, a convolutional network extract features from each region proposal. The sub-image contained in the bounding box is warped to match the input size of the CNN and then fed to the network. After the network has extracted features from the input, the features are input to a faster regional convolutional network (F-RCNN) that provides the final classification. University of Ghana http://ugspace.ug.edu.gh 47 1. Original Input image Physical attack (No) 2. Feature proposal extraction A wraped proposal 3. Compute features using CNN 4. Classification Skimming (Yes) Robbery (No) Logical attack (No) Malware (No) Card trapping (No) Cash trapping (No) Occlusion (No) Jackpotting (No) TRF (No) Figure 3. 5 Stages of R-CNN forward computation The method is trained in multiple stages, beginning with the convolutional network [10][14][17]. After the CNN has been trained, the Faster-RCNN is fitted to the CNN features. Finally, the region proposal-generating method is trained. Fast R-CNN The method receives an input image plus regions of interest computed from the image. As in R-CNN, the RoIs are generated using an external method [10][14][17]. The image is processed using a CNN containing several convolutional and max-pooling layers. The convolutional feature map generated after these layers is input to an RoI pooling layer, as shown in the figure. This extracts a fixed-length feature vector for each RoI from the feature map [20]. The feature vectors are then input to fully connected layers that are connected to two output layers: a SoftMax layer that produces probability estimates for the object classes and a real-valued layer that outputs bounding box co-ordinates computed using regression (meaning refinements to the initial candidate boxes)[20] [47][12][10]. University of Ghana http://ugspace.ug.edu.gh 48 2 2 8 5 5 8 2 2 16 5 5 16 2 2 32 50 100*100 96*96 48*48 44*44 22*22 18*18 9*9 2 16 128 1024 T4T3T2 T1 Max pooling 1Convolution 2 Max pooling 1 Convolution 1 Max pooling 1 Convolution Figure 3. 6 RoI Pooling Layers 3.4.2 Support Vector Machines Support Vector Machines (SVM) is a machine learning tool for classification and regression. Support Vector Machine is based on supervised learning, which classifies points to one of two disjoint half-spaces [24][22][21]”. It uses nonlinear mapping to convert the original data into a higher dimension. Its objective is to construct a function that correctly predicts the class to which the new and old points belong. With an appropriate nonlinear mapping, two data sets can always be divided by hyperplane. Hyperplane separates the tuples of one class from another and defines decision boundaries. Many hyper planes separate the data, but only one will achieve maximum separation. The main reason behind maximum margin or separation is that if we use a decision boundary to classify, it may end up nearer to one set of datasets than others[21][24]. This was the case when data is University of Ghana http://ugspace.ug.edu.gh 49 linear, but we mostly find that data is non-linear and the data set is inseparable, so we use kernels. The core purpose of SVM is to separate the data with decision boundaries and extend it to non- linear boundaries using kernel trick [24]. The significant benefit of SVM is its versatility meaning that different kernel functions can be specified for the decision function. Available kernels are provided, but it is also possible to specify custom kernels. SVM becomes prominent when we use pixel maps as input; it gives accuracy equivalent to neural networks with elaborated features in a handwriting recognition task. Support vector machine is used for many applications, such as text categorization, pattern recognition, face recognition, and handwriting analysis, especially for classification and regression applications. Neural Networks are more accessible to apply than support vector machine, but sometimes it provides unsatisfactory results. For example, even in perceptron learning algorithms, gradient descent is slower than SVM learning. SVM is unbeaten when used for pattern classification problems. One of the significant challenges is choosing a suitable kernel for a given application [24]. But there are many standard or default choices, such as Gaussian or polynomial kernel, but if these prove worthless, more elaborate kernels are needed. Traditional Classification approaches perform poorly when working directly because of high data dimensionality, but support vector machines can avoid very high dimensionality representations. Support vector machine is the most promising technique and approach compared to others. Support vector machine scales fairly well to high dimensional data, and the trade-off between classifier complexity and error can be controlled explicitly. Another benefit of SVMs and kernel methods is that one can design and use a kernel for a particular problem that could be applied directly to the data without needing a feature extraction University of Ghana http://ugspace.ug.edu.gh 50 process. It is imperative in situations where a lot of data structure is lost by the feature extraction process. An example is text processing. Limitations of SVM are speed and size in training and testing [24][21]. Discrete data presents another problem. The most severe difficulty with SVMs is the high algorithmic complexity and extensive memory requirements. The development of SVM is utterly different from standard algorithms used for learning, and SVM provides fresh insight into this learning. SVM is an excellent example of supervised learning that maximizes generalization by maximizing the margin and supports kernelization's nonlinear separation. SVM tries to avoid overfitting and underfitting. The margin in SVM denotes the distance from the boundary to the closest data points in the feature space. Given the 2018 incident dataset correspondingly to : nXn R F in the feature space F . The calculated linear hyperplane dividing them into two labeled classes (problem code description and all other classes) can be mathematically obtained as: 0, ,T n iw x b w R b R+ =   (3.1) Assuming the training dataset is correctly classified, as shown in figure 3.7: University of Ghana http://ugspace.ug.edu.gh 51 Misclassified point Remaining classes Support vectors Problem Code Description (PCD) ( , ) ( )T T i j ik x x x  1 ( ) 1Tw x b + = − ( ) 0Tw x b + = ( ) 1Tw x b + =+ 0  = b arg 2( TM in w w= Figure 3. 7 Standard formulation of SVM This means the SVC computes the hyperplane to maximize the margin separating the classes (problem and all other classes). The SVC is a hyperplane that separates the problem state from all other classes with a maximum margin in the simplest linear form. Finding this hyperplane involves obtaining two parallel hyperplanes, as shown in Figure 3.7 above, equal distance to the maximum margin. If all the training dataset satisfies the constraints as follows: 1,for 1 1,for 1 T i i T i i w x b y w x b y  +  = +  +  − = − (3.2) Where ω is the normal to the hyperplane, is | | /b w the perpendicular distance from the hyperplane to the origin, and ||w|| is the Euclidean norm of w . The separating hyperplane is defined by the plane 0T iw x b+ = , and the above constraints in (2) are combined to form: ( ) 1T i iy w x b+ + (3.3) University of Ghana http://ugspace.ug.edu.gh 52 The pair of hyperplanes that gives the maximum margin (c) can be found by minimizing || || 2w subject to constraint in (9)”. This leads to a quadratic optimization problem formulated as: Minimize ( ) 2 , 2 w f w b = (3.5) Subject to ( ) 1,  1, ..T i iy w x b i n+   =  (3.6) This problem is reformulated by introducing Lagrange multipliers, ( )1,..., }i i n = for each constraint and subtracting them from the function”. This ( )( )T if x w x b+ results in establishing the primal Lagrangian function: ( ) ( )( ) 2 , , 1 {( )} , 2 n T P i i i w L w b y w x b = + − + (3.7) 1,...........i n = Taking the partial derivatives of ( , , )PL w b  ) with respect to , & ,w b  respectively, and applying the duality theory yields: 1 0 n P i i i i L w y x w  =  = → =   (3.8) The problem defined in (5) is a quadratic optimization (QP) problem. “Maximizing the primal problem PL with respect to αi, subject to the constraints that the gradient of PL with respect to w and b vanish, and that 0,i  gives the following two conditions”: 1 n i i i i w y x − = (3.9) University of Ghana http://ugspace.ug.edu.gh 53 1 0, n i i i w y − = = (3.10) Substituting these constraints gives the dual formulation of the Lagrangian: ( ) ( ) 1 1 1 1 , , , 2 n n n P i i j i j i j i i j maximize L w b y y x x     = = = = −  (3.11) 1 subject to 0, 0; 1,..... n i i i y i n  − =  = (3.12) But the values of , ,i w and b are obtained from these respective equations, namely: 1 n i i i i w y x − = (3.13) ( ) 1 1 1 2 T T i i i i i ib Min y w x Max y w x= = + = (3.14) Also, the Lagrange multiplier is computed using the following: ( )( )1 0T i iy w x b − + = (3.15) Hence, this dual Lagrangian is 𝐿𝐷 maximized with respect to its nonnegative i to give a standard quadratic optimization problem”. “The respective training vectors are called support vectors. With the input dataset xi as a nonzero Lagrangian multiplier i , ( ) 1T i iy w x b+ = (3.16) The equation above gives the support vectors (SVs). Although the SVM classifier can only have a linear hyperplane as its decision surface, its formulation can be extended to build a nonlinear SVM. SVMs with nonlinear decision surfaces can classify nonlinearly separable University of Ghana http://ugspace.ug.edu.gh 54 data by introducing a soft margin hyperplane, as shown in Figure 3.8: Introducing the slack variable into the constraints yields: 1 ,for 1,T i i iw x b y+  − = 1 ,for 1,T i i iw x b y+  − + = − 0 .i i   (3.17) These slack variables help to find the hyperplane that provides the minimum number of training errors. “Modifying equation (4) to include the slack variable yields: 1, , 2 n i i Mininmize w C b    − +  ( )subject to 1 ( ) 1 0, 0.T i i i i iy w x b  − + + −   (3.18)