Hindawi Applied Computational Intelligence and So Computing Volume 2021, Article ID 6672578, 13 pages https://doi.org/10.1155/2021/6672578 Research Article Analysis and Implementation of Optimization Techniques for Facial Recognition Justice Kwame Appati ,1 Huzaifa Abu ,1 Ebenezer Owusu ,1 and Kwaku Darkwah2 1Department of Computer Science, University of Ghana, Accra, Ghana 2Department of Mathematics, Kwame Nkrumah University of Science and Technology, Kumasi, Ghana Correspondence should be addressed to Ebenezer Owusu; ebeowusu@ug.edu.gh Received 1 November 2020; Revised 25 February 2021; Accepted 5 March 2021; Published 12 March 2021 Academic Editor: Cheng-Jian Lin Copyright © 2021 Justice Kwame Appati et al. +is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Amidst the wide spectrum of recognition methods proposed, there is still the challenge of these algorithms not yielding optimal accuracy against illumination, pose, and facial expression. In recent years, considerable attention has been on the use of swarm intelligence methods to help resolve some of these persistent issues. In this study, the principal component analysis (PCA) method with the inherent property of dimensionality reduction was adopted for feature selection. +e resultant features were optimized using the particle swarm optimization (PSO) algorithm. For the purpose of performance comparison, the resultant features were also optimized with the genetic algorithm (GA) and the artificial bee colony (ABC). +e optimized features were used for the recognition using Euclidean distance (EUD), K-nearest neighbor (KNN), and the support vector machine (SVM) as classifiers. Experimental results of these hybrid models on the ORL dataset reveal an accuracy of 99.25% for PSO and KNN, followed by ABC with 93.72% and GA with 87.50%. On the central, an experimentation of the PSO, GA, and ABC on the YaleB dataset results in 100% accuracy demonstrating their efficiencies over the state-of-the art methods. 1. Introduction researchers continue to propose newer algorithms that outperform existing ones. Automated biometric recognition is fast gaining recognition Since automated face recognition study is new as as the most trusted security systems in the 21st century. +is compared to fingerprint and others already stated, the is perhaps attributed to the recent significant advances in problems associated with it are still eminent. For example, parallel processing techniques and also the search for most Zhang, Luo, Loy, and Tang [3] perceived the problem of reliable security systems due to the sharp increases in crimes facial landmark detection, which is among the central focus worldwide. +e earliest biometric features that were auto- of the system development. Most of the face detection al- mated for recognition include fingerprints where the unique gorithms are slow and produce poor recognition accuracies ridge skin patterns were utilized. Others include the retina, (Owusu, Zhan, and Mao, 2014). Other unraveled challenges iris, palm, skin, and nose tip. Fingerprints, retina, and iris in the face recognition research have to do with occlusion, recognition systems are known to yield very accurate results pose variation, illumination normalization, age, and gender [1], but hardened criminals, being sensitively aware of the [4]. In unconstrained environments, there is a significant security implications, mostly avoid presenting their bio- decrease in recognition accuracy, thus making it difficult to metric features to be captured into databases. +us, auto- accurately identify faces. +erefore, there is a need to have mated face recognition systems are now the obvious choice techniques that improve face recognition in these envi- [2] because people cannot hide their facial images from ronments. Tu, Li, and Zhao [5] attempted to solve the installed CCTV cameras all the time. +is makes the problem of illumination and pose by using DL-Net and technology the least intrusive and a hotbed research area as N-Net methods. However, this method could not adequately 2 Applied Computational Intelligence and Soft Computing account for large-scale normalized albedo images and face recognition, learning time, and finally, object recognition time. recognition in the wild. Another challenge in the face rec- +e solution of the problem of low face recognition accuracy ognition research has to do with testing for the efficacy of the due to large samples and limited availability of training samples experimental results. +ere are no standard datasets that is was solved by He, Wu, Sun, and Tan [11] when they proposed generally recognized by the research community to be used cross-modality images of heterogeneous face recognition for testing. +e use of specific datasets depends on the in- (HRF). +e study proposed the Wasserstein CNN framework dividual researcher’s choice. Most of the datasets are pre- that utilizes one network to project near infra-red and visual meditated and therefore do not represent a real-world images to a Euclidean space.+eproposedmethod is amodality scenario. In terms of ethnicity too, there is a challenge. invariant deep feature learning architecture for NIR-VIS HFR. Currently, there is no dataset that is well-balanced for race, +e Wasserstein space that separates the NIR and VIS distri- gender, and age. bution is subsequently computed, and then, the correlation is +e problem of nonuniform illumination also arises levied on the connected layers to mitigate overfitting on small when the lighting conditions vary at different angles. +us, NIR datasets. the proportion of light reflected by the face is different. +is Similarly, Rahimzadeh, Arashloo, and Kittler [12] solved phenomenon can lead to the misidentification of an indi- the optimization problem of MAP inference using the vidual [6]. Similarly, a random gyration due to individual Markov random field (MRF) model by utilizing the pro- movement can also lead to misclassifications in 4D recog- cessing power of the GPU’s. +e multiresolution analysis nitions. An input image and interperson image could appear technique, incremental subgradient approach, and efficient dissimilar due to the rotation of the image [7]. +e main message passing approach were used to obtain the maxi- purpose of this study is to explore the popular techniques mum efficiency gain. Efficiency was enhanced by using the and bring forth an approach that leverage on computational multiresolutional daisy features to attain invariance against cost. Moreover, this method will take into account illumi- occlusion and illumination.+e proposed approach reduced nation, pose, and the facial expression. +e proposed ap- the computational cost by 200% when compared to baseline proach enhances the outcomes of the principal component methods. Likewise, Chan et al. [13] attempted the problem of analysis (PCA) technique using the optimization techniques training and adapting deep learning networks to different approach. Additionally, the improvement in accuracy in this data and tasks. Chan et al. offered a method of passing research transform to a general improvement in the security images into a cascaded principal component analysis (PCA) and integrity of biometric locks. filter for training PCANet. PCANet is subsequently used for In this study, we explored the question of which is the feature extraction using the MultiPIE, extended YaleB, AR, finest or suitable optimization algorithm to use to maximize FERET, and LWF databases. Moreover, PCANet is also a recognition. It also responds to which classifier suits the reference for reviewing advanced deep learning architec- recommended approach and again which method utilizes tures containing a large number of image classifications. less computational resource and time.+e proposed method Also, Deng, Hu, Wu, and Guo [14] put forward the creation for this research requires the preprocessing of image; then, of a face image to mitigate varying illumination and pose, features are subsequently examined and extracted using respectively, using only one frontal face image to develop an PCA. +is will be followed by the augmentation of the said extended generic elastic model (GEM) and a multidepth features using PSO, ABC, and GA with classification cul- model. Pose-aware metric learning (PAML) was learned by minating the entire process. means of linear regression to synthesize each pose in their corresponding metric space, and it yielded an accuracy of 2. Related Works 100%. Chen et al.[15] on the other hand proposed a residual- based deep face reconstruction neural network for the ex- Face recognition is mainly performed in four phases, visa-a-viz., traction of features from varying poses and illumination. feature extraction, face detection, face synthesis, and recognition +is method changes illumination and pose images to [8]. Chihaoui et al. [9] stated that face recognition techniques frontal face images with an average lighting condition. By are mainly in three categories. +e first is the use of procedures comparing the proposed triplet loss and the Euclidean loss, that require the usage the whole face as input. +e second the experimentation proved better for the performance of approach is considering only some features or regions of the the latter over the former. However, only one database was face, and the final method is the simultaneous usage of global used for this study, and there were no results to compare the and local facial traits. Furthermore, numerous datasets are proposed method with. geared towards the solution of specific face recognition prob- Tu, Li, and Zhao [5] also solved the problem of illu- lems, and these datasets are taken under laboratory conditions. mination, pose, and expression by using a DL-Net and However, there are some datasets that attempt to solve multiple normalization network (N-Net). +e DL-Net purges the problems and are taken under real-world conditions [9]. illumination and then rebuilds the input image to an albedo Fazilov, Mirzaev, and Mirzaeva [10] examined an algorithm to image. +e N-Net normalizes the albedo image and extracts enhance the classification of objects in higher dimensions. +e features by supervised learning. +e MultiPIE database es- proposed algorithm formed a subset of correlated images, and tablishes efficiency of the proposed method in augmenting then, a feature representation was elected to build elementary face recognition accuracy under illumination, expression, transformation models in the representative features’ subspace. and varying poses. +e study concludes by stating that the +e algorithm pursues the augmentation of the accuracy of extracted features can improve conventional feature Applied Computational Intelligence and Soft Computing 3 extraction methods. Zhang et al. [16] also proposed an PCA+wavelet, CA, 2DPCA+DWT, and local binary pat- emotion recognition model with better accuracy than the tern algorithms. +ey claimed that this approach can be SOTA model. +ey extracted the facial expressions of seven extended to illumination, age, or partial occlusion problems. different emotions. +e extracted image is filtered through a Interestingly, Duong, Luu, Quach, and Bui [22] presented an combination of the Shannon entropy and multiscale feature approach to deep appearance models (DAM) that accurately extraction, and the result is classified using a fuzzy support capture shape and texture variation under large variations vector machine (SVM). +e study used the stratified cross- using the deep Boltzmann machine (DBM). DAM replaced validation as the validation metric, and thus, an overall the active appearance model (AAM). +is method begins by accuracy of 96.77% accuracy was achieved. Ghazi and Ekenel employing the use of DBM to ascertain the landmark dis- [17] improved the accuracy under occlusion, variations in tribution points on the face data, and then, the facial data are illumination, and misalignment of facial features by using vectorized as a texture model. +e two layers (shape and two deep CNNmodels, VGG-Face and Lightened pretrained texture) are then interpreted by constructing and using a on large datasets. +ese datasets were then used to extract high-level layer. +e LFW, Helen, and FG-NET databases facial features. +ey also used 5 databases to attempt a were used for the experimentation. +e RMSE values of the solution to the problem. +e AR face dataset was used as the proposed method to the controlled method (bicubic and analytical tool for the effects of facial obstruction, CMU PIE, AAM) showed a significant improvement in the recognition and the Extended Yale dataset B to analyze the variation in rate [22]. illumination. +e color FERETdatabase was used for impact Duan and Tan [23] also proposed a method of the low analysis on view invariance, and last, the FRGC dataset is for complexity method of learning pose-invariant features evaluation of multiview catalogues. +e authors then used without the need for prior pose information. +e proposed the Facial Bounding Box Extension to scan the entire head approach removes the pose from a face image and, by so and extract deep features, thus improving the results. +ey doing, extracts local features. Self-similarity features are first compared their results between the Facial Bounding Box generated from a face image when the distance that separates Extension to other methods, and there was a significant the features of different nonoverlapping blocks is evaluated. improvement in results [18]. However, Zhang et al. opti- +en, the linear transformation is subtracted from the local mized face landmark detection by taking advantage of features, and the transformation matrix is acquired by re- supplementary data from the attributes of the features. +e ducing the distance between pose variant features. +is study proposed feature extraction using four convolutional matrix is created while discriminative information across layers. Each one of these layers produces several feature persons is retained. Nevertheless, Singh, Zaveri, and maps that are activated using rectified linear units.+e layers Raghuwanshi [24] have proposed a rough membership are then coupled using max-pooling to produce a shared classifier (RMF) for the classification of pose images. Feature vector. +e Multi-Attribute Facial Landmark (MAFL), extraction was performed using log-Gabor, and SVD’s are AFLW, and Caltech Occluded Faces in the Wild (COFW) used for the reduction of redundant features. KNN classifier are subjects to mean error and failure rate validation. +e is finally applied on the reduced Gabor features. ORL, study concluded that the auxiliary task is more efficient by Georgian Face database, CMU PIE, Head Pose Image da- learning the dynamic task coefficient, and this, in turn, tabases were used with similar performance metrics to Duan makes the proposed method more robust to occluded faces and Tan [23]. +e study concluded that the proposed and significant view invariance [19]. method is best suited for mug shots in law enforcement. +is approach encouraged Ding and Tao [20] to pro- Moreover, it improves the recognition of face images with pound a homographic pose normalization approach which occlusion, and the method is augmented using modeling handles the loss of semantic correspondence, occlusion, and techniques to gain improved results. However, the use of nonlinear facial texture wrapping in PIFR. +e proposed three methods for testing reduces the optimality of the method first projects a lattice of three-dimensional facial proposed methods for substantial datasets with varying landmarks into a two-dimensional face for feature extrac- images. Nevertheless, Zhao, Li, and Liu [25] have proposed a tion. Second, an optimal warp is appraised using a homo- MSA+PCA for pose-invariant FR. First, features are graphic corrective texture deformation due to pose extracted using the affine-invariant multiscale autoconvo- variation. +is is performed around each landmark on the lution (MSA) transformation. Furthermore, the decorrela- local patch. +e restored occluded features are used for face tion of these traits and the reduction of theMSA proportions recognition using established face descriptors [20]. How- are performed using principal component analysis. Finally, ever, Sharma and Patterh [21] proposed a technique, the principal components with the highest eigenvalues are whereby the face is identified by the Viola–Jones algorithm. classified using KNN. +e experimentation points out how +en, the eyes, nose, and mouth are discovered by means of computationally expensive the proposed method is during the proposed hybrid PCA. +e features are subsequently the MSA feature extraction phase. mined using LBP for every part found. PCA is then applied Abdalhamid and Jeberson [26] presented an abled pose- to each feature extracted for recognition. +e ORL face invariant FR system via artificial bee colony optimized dataset was used with the recognition rate as the recognition K-nearest neighbor classifier (ABC-KNN).+e method used metric. +e study concluded that there is a higher recog- video as input for conversion into frames. During the nition rate for the proposed hybrid PCA approach for preprocessing of the converted images, the adaptive Lee filter varying facial expressions and pose when pitted with SOTA, (ALF) was applied for image enhancement by removing 4 Applied Computational Intelligence and Soft Computing noise. +e Viola–Jones (VJ) algorithm is then used for face feature subspace known as the eigenface. +is eigenspace segmentation from the right eyes, nose, and mouth. Com- represents the locus of the covariance matrix of the feature plete-LBP (CLBP), center symmetric local binary pattern landmarks. Despite its usefulness, they are computationally (CS-LBP) features, Gabor features (GF), and patterns of expensive given a higher dimensional data. +is necessitates gradient orientation magnitudes (POEM) descriptors are the adoption of an alternate algorithm with similar prop- used for when quirks are extracted from the segmented erties and structures [30] as PCA but relatively inexpensive image. ABC-KNN is applied as classification for the image. known as singular value decomposition (SVD). Taking a Recognition accuracy was the performance evaluation matrix X with dimension n x m, a PCA can be defined as the metric. Consequently, F. Zhang, Yu, Mao, Gou, and Zhan Eigen decomposition of the covariance matrix XTX. +is [27] propounded an approach for the PIFER framework yields an eigenvalue λ with its corresponding eigenvectors based on feature learning using deep learning. +e PCA-Net W. +ese eigenvectors are used as the transformation op- used frontal images that were not labeled during the learning erator on X to obtain a new matrix T with the same di- process of the features. +e latter are consequently used by mension as X as shown in CNN for feature mapping across the space separating the nonfrontal and frontal faces. +e novel description gener- T � XW . (1) ated by the maps is then used to describe nonfrontal faces to achieve a standard characteristic to describe arbitrary faces. Equation (1) is with the assumption that all components +e multiview robust features are then trained using a single (i.e., columns) in W are principal. However, in practice, classifier for varying poses. BU-3DFE Static FEW was used some of these components are expected to be redundant; during the experimentation stage and recognition as a hence, W is ordered by λ. With the ordered W, truncations performance evaluation metric. After this technique has can be performed using the first r components for analysis. been contrasted with other techniques and frameworks, the By implication, we have Wr being anm by rmatrix giving us proposed process seems to outperform SOTA techniques. the new transformed matrix Tr shown in Additionally, this method can be used to pose robust feature extraction when trained instead of training the model for Tr � XWr . (2) different pose variations. As stated earlier, operations of PCA are expensive, and Finally, Sang, Li, and Zhao’s [28] method for PIFR fuses SVD with properties mathematically identical to PCA is texture and depth into a framework using joint Bayesian preferred for implementation. Equation (3) shows the SVD classifiers. +e output is then identified using a similarity of X. estimator between the input and the face database. However, there is a high computational cost for recognition of face X � μΣ ∗V , (3) images in large face databases. Furthermore, experimenta- ∗ tion was extensive for various poses, and multiple methods where μ is the left singular vector, V is the conjugate were not compared to the current method. transpose of the right singular vector, and Σ contains the singular values on its diagonals. Computing the eigenvalue T T 3. Research Methodology decomposition for X X with equation (3) to obtain μσμ , it becomes obvious that W is identical to V, while the ordered +e research design for this study includes image pre- singular values (σ1σ2σ3 . . .) are proportional to λ. Again, processing, feature extraction with PCA, the optimization of with the property that μ and V are unitary matrices, we have these features using PSO, ABC, and GA, and finally the μ∗μ � I , (4) classification of objects using KNN, SVM, and EUD. +e datasets for the study are YaleB and AT&Tpopularly known as ORL. +ese datasets were selected with the justification V∗V � I , (5) that they have well-defined challenges necessary for vali- where I is the identity matrix. From equations (1) and (3) dating the facial recognition algorithm. Subsequent sections and noting that W is identical to V, we have explain in detail the major parts of the study design. ∗ T � XV � μΣV V � μΣI , (6) 3.1. Feature Extraction. +is component of the design ac- T � μΣ . (7) quires relevant biometric descriptors from a given image. In the process, high volume of data is obtained making it +ese equations further justify why SVD is computa- necessary to select only high contributing descriptors. tionally inexpensive compared to PCA which computes the Several techniques exist for this task; however, PCA is covariance XTX. Taking the principal components of adopted for this study due to its popularity and efficiency in equation (7), we have this domain [29]. Tr � μrΣr . (8) 3.1.1. Principal Component Analysis. +e primary goal of Finally, since the requirement is W and not the Eigen principal component analysis for facial recognition is the decomposition of XTX, SVD can be used to efficiently transformation of higher dimensional data into a lower compute W. Applied Computational Intelligence and Soft Computing 5 3.2. FeatureOptimization. +e section of the study describes Step four: if onlooker and employed bees are unable to the swarm intelligence algorithms used for the feature op- identify new and better candidate solution through the timization. Among these methods are artificial bee colony, local search after some predefined iterations, the so- genetic algorithm, and particle swam optimization. lution xi is discarded and substituted with scout bees’ new solution. +ese scout bees then use random global selection to search for new solutions. 3.2.1. Artificial Bee Colony. +e artificial bee colony (ABC) is one of the swarm-based algorithms designed with the Step five: step two to four is repeated until the defined foraging actions of the honeybees. +e four components of stopping criterion is met returning the optimal output the behavioral model of ABC are mainly the food source, scouting bees, onlooker bees, and employed bees. +e food 3.2.2. Genetic Algorithm. Genetic algorithm (GA) on the source denotes a possible solution to the clustering other hand is based on genetics and the theory of natural problem as the scout bee carries out a global search. +is selection. It is a stochastic algorithm which finds the best search is performed stochastically, while the onlooker and solution by effectively finding the global optimum in a larger employed bee search for adjacent solutions. +e employed space. A nonnegative fitness value is obtained using the bees subsequently evaluate the precision of the solution fitness function. +is value is used to summarize how close from the previously stored solutions in memory. +is in- the optimal solution is to the global best (Mahmud, Haque, formation is successively passed on to onlooker bees in the Zuhori, and Pal, 2014). A GA begins by generating random dance area. +is ensures that the best food source is chosen, numbers (called chromosomes) with population size n. Each and the stagnated food sources within an already set cycle chromosome has its fitness value computed, and the stop- are abandoned and replaced with new sources. +is process ping criterion is checked. +e GA operators such as selec- is repeated until there is a convergence to obtain the op- tion, crossover, and mutation to drive the chromosomes timal solution. Mathematically, we have the following toward convergence are explained further. steps. Step one: randomly initialize solutions Selection. +is operator creates offspring from an existing x for i � {1, 2, . . . , FS}, where i represents each food population by using a process comparable to natural se-i source, and FS represents the total food source. Fur- lection in biological lifeforms. Selection once more accen- thermore, initialize onlookers and employ bees using a tuates on the better performance of individuals in the random function generator in population. +is helps with the expectancy of their offspring having the likelihood of carrying on the genetic information x � x rand (0, 1)􏼐x − x 􏼑, (9) to a successive generation. Consequently, the convergence isij j max j j impacted greatly by the magnitude of the selection process. where xij � [xi1, xi2, . . . xi D] is a vector of length D Hence, the selection criteria should prevent premature with xmaxj and xj denoting the maximum and mini- convergence by maintaining population diversity and bal- mum values of the jth dimension. ance with the crossover and mutation operations. Step two: iteratively new solutions are found by each Crossover. +e crossover operator mixes information be- employed bee using tween two parents in a manner matching sexual repro- duction. +e objective of the crossover procedure is to give vij � xjφij􏼐xij − xj 􏼑, (10) “birth” to an improved offspring. +is is achieved by ex- ploring different portions of the search space. where vij � [vi1, vi2, . . . , vi D] signifies the new solu- tions within the local range of xij � [xi1, xi2, . . . , xi D] Mutation. Mutation procedure changes the values of the and φij ϵ (−1, 1). +e sum of the Euclidean distance randomly selected bit within each string, thereby preventing between the sample points and their cluster midpoints the GA from being stuck at the local minimum through the is known to be inversely proportional to the fitness scattering of genetic data, hence maintaining the variation in value of all candidate sources. In the selection of the the population. +is process is repeated until the optimal sources, a greedy algorithm is employed by comparing solution is achieved or the predetermined number of gen- the fitness values of old and new positions. erations elapses. Step three: probability pi of the solution xi is computed using 3.2.3. Particle Swarm Optimization. Particle swarm opti- fit p � i , (11) mization (PSO) is also an optimization algorithm influencedi fs 􏽐n�1 fitn by biology. It was derived by observing the collective be- havior and swarming of a flock of birds and fish schools [30]. where fiti is the fitness value of xi. Onlooker bees use +e algorithm comprises of solutions known as population, this probability to select new xi values by searching for with each having a series of parameters which represent a the local optimums while following step two to cal- coordinate in a space with multiple dimensions. Further- culate the fitness value. more, a collection of these particles becomes a population 6 Applied Computational Intelligence and Soft Computing with the particles probing the search space to find the op- xi(t + 1) � xi(t) + vi(t + 1). (17) timal solution. Each particle tracks its former optimal so- lution in memory and then labels these solutions as the (c) Evaluate the fitness of the particle personal best and global best. +e locus of the ith particle is f(xi(t + 1))≥f(pi) then defined in the D-multidimensional space as (d) If f(xi(t + 1))≥f(pi), update personal best: pi � (x (t + 1)) xi � 􏼂xi1, xi2, xi3, . . . , xi D􏼃, (12) i(e) If f(xi(t + 1))≥f(g), update global best: g � and the population of the swarm as (xi(t + 1)). X � 􏼂x , x , x , . . . , x 􏼃. (13) (3) Assign the best solution to g at the end of the it-1 2 3 N erative process. +e particles then iteratively update their respective positions in the parameter space when searching for the 3.4. Classification. After the optimization of the extracted optimal solution using feature vectors, classification models are built to address x (t + 1) � x (t) + v (t + 1) , (14) the face recognition challenges. +ere are myriads ofi i i predefined models for this task given the feature set. where vi is the velocity components of the i th particle along Among these are SVM, KNN, K-means, Euclidean dis- the D-dimensions with t and t+1, indicating a dual con- tance, VGGNet, and CNN. Other pretrained face classifiers secutive run of the process. Velocity of the ith particle is such as the VGG-Face also exits which estimate the sim- defined in equation (15) with three terms: the first is inertia ilarity between the face image of a subject and relevant which prevents the particles from drastically changing di- features selected from the face images in the database. In rection, the second term describes the ability of particles this study, the Euclidean distance (EUD), K-nearest returning to the previously known best position, and the last Neighbor (KNN), and the support vector machine (SVM) term describes the particles moving (swarm) closer to the were used. best position: vi(t + 1) � vi(t) + c1( pi − xi(t)􏼁R1 + c2( g − xi(t)􏼁R2, 4. Implementation of Methods (15) where pi is the personal best of the particle, 4.1. Implementation Pipeline. +e implementation pipeline g is the global best, and and , in the range of 0≤ ≤ 4, are the for this study is shown in Figure 1. From the figure, everyc1 c2 c1, c2 cognitive and social coefficients respectively. Finally, R and image undergoes a series of preprocessing and subsequent1 R2 are the two diagonal matrices randomly generated from a feature selection and finally features optimization. +ese uniform distribution in [0,1].+is ensures that the social and optimized features are trained for feature matching. cognitive components have a random effect on the velocity update in equation (15). Since the particles are derived from 4.2. Environmental Setup. +e face recognition system the convergence of the personal and global best solutions, implemented in this study was developed, trained, and tested the stochastic weight of the two accelerating terms and the using Matlab R2018b on an HP desktop processor Intel trajectories are semirandom. +is requires that equations Core™ i7-770T CPU @ 2.90GHz, Linux Ubuntu 20.04 LT®S (14) and (15 are iterated until a stopping criterion is met. operating system. Algorithmically, we have the following pseudocode. 4.3. Image Preprocessing. +e first step taken in image 3.3. PSO Algorithm. analysis is the preprocessing of the image for undesirable (1) N particle initialization noise. +ese components are detrimental to the examination of the image and thus are removed via preprocessing. All (a) Initialize the position xi(0)∀ i ∈ 1: N images with dimensions more than 96-by-84 pixels are (b) Initialize the particles best position to its position downsampled. +is is followed by the conversion of all Pi(0) � xi(0) colored images to grayscale. +e outputs of the images are (c) Calculate the fitness of each particle, and if separated into training and test sets. Eighty percent of the f(xj(0))≥f(xi (0)) ∀ i≠ j , initialize the global images are considered as training sets with 20 percent as the best as g � xj (0) test set. +is preprocessing is implemented so that the (2) Repeat until condition is met complexity will be reduced and the computational time improved. (a) Update the particle velocity in accordance with equation (15) 4.4. Feature Extraction. +is section further illuminates on vi(t + 1) � vi(t) + c1( pi − xi(t)􏼁R1 + c2( g − xi(t)􏼁R2. the feature extraction approach used in this study. Among (16) the objectives of this study is the implementation of an offline facial recognition system with an improved and (b) Update the particle position using equation (14) robust feature extraction method using optimization Applied Computational Intelligence and Soft Computing 7 Standardized ‘Mean’ face face image 2 Face Features 10 preprocessing extraction 3 20 30 Image Feature input 40vectors 50 60 Feature 1 Face optimization 4 70 dataset 80 Feature 90 Comparison vectors 10 20 30 40 50 60 70 80 optimization Figure 2: Mean face for the AT&T dataset. Output Feature Feature matching classification 5 ‘Mean’ face 6 10 Figure 1: Implementation pipeline. 20 30 techniques. +is method will be tested using the AT&T and 40 YaleB face datasets as they contain faces with varying illu- 50 mination, different poses, occluded faces, dissimilar ex- 60 pressions, or a combination of them. +e mean of the 70 features is computed, and the feature of the first principal 80 component of each image is selected. +e mean face for 90 AT&T and YaleB datasets is shown in Figures 2 and 3, respectively. 10 20 30 40 50 60 70 80 Figure 3: Mean face the YaleB dataset. 4.5. Dimensionality Reduction and Feature Selection. Given the computed mean face of the training data, the are used. For an efficient evaluation and a valid comparison binary singleton expansion function is applied as an ele- with the existing study, the accuracy metric is selected. +e ment-wise operator. +e resultant image is decomposed recognition accuracy is computed for all the classification with the single value decomposition function to reduce the methods as applied on different datasets with varying op- coefficient used to characterize the image. +e cumulative timization methods. Tables 1–7 show the average, maxi- sum of the square of the diagonal matrix is computed to mum, and minimum recognition accuracies for the datasets produce the principal component with the first k eigenvalue with different classification methods. +is experiment was of the component selected. +e eigenvectors are then nor- conducted with a thousand five hundred (1500) iterations malized into eigenfaces.+e sample output of this process on with/without considering the optimization of the extracted the AT&T and YaleB datasets is shown in Figures 4 and 5 , features. respectively. Once more, the binary singleton expansion function is used to transform the test data by using the mean face.+ese 5.2. Discussion. From the result shown in Section 4.1, it is transformed train and test data are then optimized for better observed that the model’s performance on the AT&Tdataset classification results. is fairly low in general. +is could be attributed to the oc- clusion, varying pose, and expression exhibited in the face 5. Results and Discussion images making it naturally difficult to model. On the con-trary, the model’s performance was relatively good as it +is section describes in detail the results of the experiment contains only images with varying illumination. From Ta- and the analysis of the results. Moreover, comparisons ble 1, it can be seen that the accuracy is highest for KNN and between other optimization methods using the same data- SVM at 100% each for the YaleB dataset. Nevertheless, base and three different classifiers will be discussed. optimizing the features with GA saw a significant decrease of the KNN classifier to 70% with a 9.8% reduction using the Euclidean distance method as shown in Table 3. +ere is no 5.1. Numerical Results. Generally, in recording the perfor- loss as shown in Tables 2 and 4 for KNN and SVM when mance of a facial recognition model, statistical metrics such ABC and PSO optimization is performed. However, 4% and as accuracy, recall, precision, F-measure, and among others 5% reduction for PSO and ABC, respectively, was noted 8 Applied Computational Intelligence and Soft Computing Eigenface 1 Eigenface 2 Eigenface 3 20 20 20 40 40 40 60 60 60 80 80 80 20 40 60 80 20 40 60 80 20 40 60 80 Eigenface 4 Eigenface 5 Eigenface 6 20 20 20 40 40 40 60 60 60 80 80 80 20 40 60 80 20 40 60 80 20 40 60 80 Figure 4: First six eigenfaces of AT&T. Eigenface 1 Eigenface 2 Eigenface 3 20 20 20 40 40 40 60 60 60 80 80 80 20 40 60 80 20 40 60 80 20 40 60 80 Figure 5: First three eigenfaces of YaleB. Table 1: Recognition accuracy without the use of the optimization algorithm. Default recognition accuracies EUD KNN SVM Dataset 29.94 77.84 82.05 AT&T 82.63 100 100 YaleB Table 2: Recognition accuracies for the YaleB database with the PSO algorithm. YaleB–particle swarm optimization (PSO) EUD KNN SVM 87.92 100 91.72 Average 100 100 100 Maximum 10.53 100 0 Minimum Applied Computational Intelligence and Soft Computing 9 Table 3: Recognition accuracies for the YaleB database with the GA algorithm. YaleB–genetic algorithm (GA) EUD KNN SVM 70.04 70.04 99.15 Average 100 100 100 Maximum 10.53 10.523 91.23 Minimum Table 4: Recognition accuracies for the YaleB database with the ABC algorithm. YaleB–artificial bee colony (ABC) EUD KNN SVM 74.71 100 99.60 Average 100 100 100 Maximum 17.54 100 95.61 Minimum Table 5: Recognition accuracies for the AT&T database with the PSO algorithm. AT&T–particle swarm optimization (PSO) EUD KNN SVM 28.46 80.46 59.85 Average 36.25 99.25 78.75 Maximum 18.75 67.5 5 Minimum Table 6: Recognition accuracies for the AT&T database with the GA algorithm. AT&T–genetic algorithm (GA) EUD KNN SVM 28.78 47.05 54.86 Average 40 87.5 86.25 Maximum 17.5 7.5 15 Minimum Table 7: Recognition accuracies for the AT&T database with the PSO algorithm. AT&T–artificial bee colony (ABC) EUD KNN SVM 28.50 66.78 35.55 Average 43.75 93.75 72.5 Maximum 17.5 8.75 8.75 Minimum when the EUD classifier was used. Consequently, there is a in Tables 6 and 7. +e order of experimentation is given as large difference in recognition accuracy using the AT&T follows. database. Without the use of the optimization method, the AT&T database’s recognition plummeted to 29.94, 77.84, PCA+EUD and 82.05 for EUD, KNN, and SVM, respectively, as shown PCA+KNN in Table 1. However, using the PSO optimization technique PCA+ SVM saw an improvement in the average recognition accuracy to 80.46% for KNN. +ere is a significant degradation when PCA+PSO+ED using the SVM classifier with an average recognition ac- PCA+PSO+KNN curacy of 59.85%. Again, EUD saw 28.46% average recog- PCA+PSO+ SVM nition accuracy for PSO as shown in Table 5. Conversely, the average recognition accuracy for GA and ABC reduced to PCA+GA+ED 47.05 and 66.78, respectively, when using KNN and 54.86% PCA+GA+KNN and 35.55% when using SVM as can be separately observed PCA+GA+ SVM 10 Applied Computational Intelligence and Soft Computing PCA+ABC+ED Again, PCA+GA+EUD indicates 28.78% average recog- PCA+ABC+KNN nition accuracy. +is is similar to the average results got by PCA+ABC+ SVM all 3 optimization methods using EUD as the classifier.However, GA and ABC achieved 47.05% and 66.78% average By observation, the PCA+PSO+EUD has 15.19% of the recognition accuracy for the KNN classifier, respectively. recognition accuracy below 79.83%, which is the recognition +is illustrates an atrophy of the result from 77.84% to accuracy of PCA+EUD for the YaleB dataset. +is indicates 47.05% for GA and 66.78% for ABC. GA suffers 30% that PSO optimizes the features well with 84.81% of the degradation, while ABC saw an 11% reduction in average experiments producing better results. In addition, there is no recognition accuracy. Moreover, the average recognition change in the recognition accuracy of KNN.+is points with accuracy for both GA and ABC for the SVM classifier PSO not reducing the results achieved when the optimi- plummeted further than that of KNN. A 27% reduction in zation technique is performed. Conversely, SVM saw less average recognition accuracy using the SVM classifier for than 1% of the recognition accuracies below 90%. +is GA supersedes that of ABC, which has 46.5% reduction. demonstrates that over 99% of the results for the +us, it concludes that GA and ABC using SVM as the PCA+PSO+ SVM have recognition accuracy above 90% classifier is not suitable for this approach. +e first 20 results with 95% of the recognition accuracy at 100%.+erefore, the are shown in Table 9. 5% reduction in PCA+ SVM can be considered insignificant. Again, the linear kernel was used for the SVM classifier As a final point, PSO optimizes well for EUD and SVM. when the experiment was performed. +is kernel has the Similarly, PCA+GA+EUD has over 71% of the results propensity of improving computational time compared to above that of PCA+EUD. KNN and SVM, however, have other SVM kernels, and it is suitable for high dimensional 60% and 100% of the results greater than that of PCA+KNN data [31]. However, the linear kernel in this experiment and PCA+ SVM, respectively. Yet still, PCA+GA+KNN appears to have sacrificed the accuracy for computational shows significant decay of results from its default 100%. time. +us, the kernel chosen does not produce good re- SVM, on the other hand, displays a negligible reduction in sults. Other kernels such as the polynomial, Gaussian, average recognition accuracy. With this, SVM seems to radial basis function (RBF), or ANOVA could be used for produce better results than both KNN and EUDwith respect SVM in future research, and the result is compared to the to the use of the GA optimization algorithm. proposed method. Similarly, Table 1 indicates that SVM is a In like manner, the YaleB dataset results for better classifier when the linear kernel is used and when no PCA+ABC+EUD give rise to 28% of the data above the optimization algorithms are utilized. +us, both AT&Tand default 79.83% of PCA+EUD. However, the ABC optimized YaleB datasets produce the best results for SVM. Now, recognition for KNN and SVM revealed no significant loss of comparing Tables 2–4, it is perceived that a perfect rec- results with an average recognition accuracy of 100% and ognition accuracy of 100% for the maximum of all meta- 99.6% for KNN and SVM, respectively. It can be established heuristic algorithms and classifiers is achieved. +is indi- that the result optimized by ABC and classified using KNN cated that all optimization methods can be used for the are appropriate for the YaleB dataset, and ABC optimizes YaleB database regardless of the classifier. Conversely, the well for KNN and SVMon the said dataset. Table 8 shows the maximum recognition accuracy for the algorithms used for first 20 results of the total experiments for PSO, ABC, and augmentation gave the impression that the KNN classifier GA optimization algorithms implemented on the YaleB was better. +is means that PSO+KNN, ABC+KNN, and dataset using EUD, KNN, and SVM classifiers. Nevertheless, GA +KNN have better recognition accuracy than their the substantial reduction of the results observed when the SVM counterparts. +is shows that the optimization al- AT&Tdataset is used stems from the increase in parameters gorithms have degraded the results produced by the SVM for recognition. +e AT&T dataset contains images that are classifier. Nonetheless, GA’s maximum recognition was occluded, and it also has varying poses and expressions. better than that of the default SVM (PCA+ SVM). PCA+PSO+EUD for the AT&T face dataset produced +erefore, GA should be preferred when an SVM classifier results that are on average below PCA+EUD for the da- with a linear kernel is chosen. Furthermore, the algorithms tabase. 51% of the 1500 results obtained were lower than the improved the highest recognition accuracy achieved by the default 29.94% for PCA+EUD. +e overall average recog- EUD classifier only. With this, PSO is selected as the ideal nition of PCA+PSO+EUD for the AT&T database, how- optimization algorithm for the YaleB and AT&T datasets. ever, was 28.46% as shown in Table 5. It is perceived that the Juxtaposing the proposed method to other approaches, it is deterioration of average recognition is offset by the larger shown in Table 10 that the offered approach is effective than values of the other recognitions. 48% of the result above the other SOTA methods. +e culmination of this research default 29.94% is not insignificant, yet it is a small percentage presented the proposed optimization method and classifier, for consideration. KNN on the other hand has 31% of the given their respective datasets in Table 11. results above the 77.84% default recognition. Still, the av- Finally, Table 12 shows the time taken for each exper- erage recognition accuracy achieved was 3% higher than the iment carried out. It is seen that PSO has the lowest average default. +us, 80.46% average recognition accuracy for KNN time for the experiment with 1.594s, 1.592s, and 55.46s for with PSO-optimized features (PCA+PSO+KNN) is the EUD, KNN, and SVM, respectively. PSO+ SVM saw the best combination for the AT&T database since none of the highest computational cost with 55.46s for all experimen- results for SVM was above its 82.05% baseline recognition. tation. However, it required less than 2 seconds for Applied Computational Intelligence and Soft Computing 11 Table 8: YaleB experiment results for PSO, GA, and ABC using EUD, KNN, and SVM classifiers. YaleB–PSO YaleB–GA YaleB–ABC EUD KNN SVM EUD KNN SVM EUD KNN SVM 100 100 100 60.526 60.526 99.123 36.842 100 100 100 100 100 99.123 99.123 100 35.088 100 100 71.93 100 100 51.754 51.754 100 28.07 100 100 100 100 100 100 100 98.246 71.93 100 99.123 95.614 100 100 41.228 41.228 98.246 86.842 100 100 99.123 100 100 40.351 40.351 100 100 100 100 90.351 100 100 94.737 94.737 98.246 71.053 100 99.123 99.123 100 100 99.123 99.123 99.123 47.368 100 100 100 100 100 35.088 35.088 99.123 53.509 100 100 100 100 100 99.123 99.123 100 54.386 100 100 85.965 100 100 100 100 100 100 100 100 67.544 100 98.246 100 100 100 57.895 100 100 82.456 100 100 46.491 46.491 100 91.228 100 100 87.719 100 100 97.368 97.368 99.123 30.702 100 98.246 88.596 100 100 41.228 41.228 100 92.982 100 98.246 83.333 100 98.246 28.07 28.07 96.491 100 100 100 93.86 100 100 100 100 98.246 50 100 100 100 100 100 50 50 97.368 72.807 100 100 70.175 100 99.123 100 100 100 27.193 100 100 Table 9: AT&T experiment results for PSO, GA, and ABC using EUD, KNN, and SVM classifiers. AT&T–PSO AT&T–GA AT&T–ABC EUD KNN SVM EUD KNN SVM EUD KNN SVM 30 82.5 70 28.75 53.75 51.25 27.5 71.25 53.75 26.25 80 61.25 37.5 31.25 37.5 31.25 55 33.75 28.75 76.25 40 32.5 78.75 76.25 28.75 38.75 20 32.5 76.25 75 36.25 20 30 23.75 76.25 41.25 27.5 77.5 62.5 23.75 51.25 70 31.25 10 8.75 27.5 80 75 27.5 31.25 35 26.25 71.25 27.5 30 86.25 55 28.75 35 38.75 31.25 55 27.5 33.75 87.5 73.75 27.5 63.75 62.5 30 42.5 23.75 23.75 71.25 66.25 33.75 35 40 27.5 77.5 47.5 23.75 76.25 68.75 30 56.25 71.25 28.75 42.5 16.25 31.25 72.5 62.5 36.25 28.75 35 26.25 50 22.5 26.25 73.75 68.75 27.5 46.25 56.25 23.75 76.25 37.5 32.5 86.25 7.5 27.5 46.25 60 30 77.5 52.5 30 83.75 63.75 28.75 11.25 16.25 25 45 31.25 30 85 65 27.5 53.75 50 27.5 67.5 37.5 22.5 83.75 50 30 28.75 33.75 30 45 40 31.25 78.75 63.75 26.25 67.5 65 26.25 70 33.75 28.75 68.75 56.25 35 58.75 62.5 28.75 86.25 56.25 31.25 87.5 70 30 57.5 60 23.75 45 10 Table 10: Recognition accuracy of other methods on the YaleB and AT&T datasets. Author Method Recognition accuracy (%) Database [32] Generalized low-rank approximation of matrices (GLRAM) 82.18 YaleB [33] FDDL 96.2 YaleB [34] Local nonlinear multilayer contrast patterns (LNLMCP) 97.50 YaleB [35] Discriminative sparse representation via l2 regularization 82.61 YaleB [32] GLRAM 97.25 AT&T [33] Fisher discriminative dictionary learning (FDDL) 96.7 AT&T [31] PSO–KNN 98.75 AT&T [31] PCA-LDA fusion algorithm 98.00 AT&T [35] Discriminative sparse representation via l2 regularization 95.00 AT&T 12 Applied Computational Intelligence and Soft Computing Table 11: Proposed classification and optimization techniques for both datasets. Proposed selection AT&T YaleB Optimization technique Particle swarm optimization (PSO) Particle swarm optimization (PSO) Classification method K-nearest neighbor K-nearest neighbor Table 12: Average time taken for experiments. Conflicts of Interest Time in seconds for Particle swarm Artificial Genetic experimentation optimization bee colony algorithm +e authors declare that they have no conflicts of interest. Euclidean distance 1.594 2.104 1.648 K-nearest neighbor 1.592 2.115 1.646 References Support vector machine 55.46 4.871 4.445 [1] S. Li and W. Deng, “Deep facial expression recognition: a survey,” 2018, http://arxiv.org/abs/1804.08348. PSO+EDU and PSO+KNN trials. Subsequently, the ABC [2] R. D. Labati, A. Genovese, E. Muñoz, V. Piuri, F. Scotti, and and GA meta-heuristic algorithms produced a similar result G. Sforza, “Biometric recognition in automated border to PSO, but PSO is computationally less expensive than both. control,” ACM Computing Surveys, vol. 49, no. 2, pp. 1–39, 2016. [3] F. Zhang, Y. Yu, Q. Mao, J. Gou, and Y. Zhan, “Pose-robust 6. Conclusion feature learning for facial expression recognition,” Frontiers of +is study looks at how to augment PCA feature with the Computer Science, vol. 10, no. 5, pp. 832–844, 2016.[4] S. Zafeiriou, C. Zhang, and Z. Zhang, “A survey on face selected optimization method to improve the accuracy of detection in the wild: past, present and future,” Computer face recognition models. +e proposed implementation Vision and Image Understanding, vol. 138, pp. 1–24, 2015. shows that the choice of PSO as an optimization method [5] H. Tu, K. Li, and Q. Zhao, “Robust Face Recognition with works well in an unconstrained environment of the real Assistance of Pose and expression Normalized albedo im- world, since pose, occlusion, and expression are among the ages,” ACM International Conference Proceeding Series, dominate face recognition problems found in the uncon- vol. 93, 2019. strained environments. +e default recognition accuracy of [6] X. Chen, X. Lan, G. Liang, J. Liu, and N. Zheng, “Pose-and- the YaleB showed 100% accuracy for both SVM and KNN illumination-invariant face representation via a triplet-loss classifiers. However, the ORL database did not attain perfect trained deep reconstruction model,” Multimedia Tools and recognition due to the inherent nature of the dataset. Applications, vol. 76, no. 21, pp. 22043–22058, 2017. Nonetheless, the use of optimization algorithms on the [7] C. Ding, C. Xu, and D. Tao, “Multi-task pose-invariant face selected features saw an increase in recognition accuracy recognition,” IEEE Transactions on Image Processing: A from 82.63% to a maximum of 100% for EUD.+is indicates Publication of the IEEE Signal Processing Society, vol. 24, no. 3, that all three evolutionary algorithms can be used to improve pp. 980–993, 2015. the accuracy of results. However, due to the ORL database [8] C. Ding and D. Tao, “A comprehensive survey on pose-invariant catering for 3 parameters, the maximum recognition did not face recognition,” ACM Transactions on Intelligent Systems and reach 100% but 99.25% which is promising using the PSO Technology, vol. 7, no. 3, 2016. algorithm and KNN classifier. Last, the PCA+PSO+KNN [9] M. Chihaoui, A. Elkefi, W. Bellil, and C. B.. Amar, “A survey approach is chosen for this study due to its ability to handle of 2D face recognition techniques,” Computers, vol. 5, no. 4, the increase in parameters, and it also outperforms other pp. 41–68, 2016. SOTA algorithms. +ese parametric increases move the [10] S. K. Fazilov, N. M. Mirzaev, and G. R. Mirzaeva, “Modified recognition closer to real-world human face recognition. recognition algorithms based on the construction of models of Moving forward, this study can be extended by looking at elementary transformations,” Procedia Computer Science, other recent swarm intelligent optimization models used in vol. 150, pp. 671–678, 2019.[11] R. He, X.Wu, Z. Sun, and T. Tan, “Wasserstein CNN: learning other fields with the property of it be being less expensive, invariant features for NIR-VIS face recognition,” IEEE Other private datasets with more stricter challenges could be Transactions on Pattern Analysis and Machine Intelligence, used to further validate this model. +is remains a limitation vol. 41, no. 7, pp. 1761–1773, 2019. to this study. [12] S. Rahimzadeh Arashloo and J. Kittler, “Fast pose invariant face recognition using super coupled multiresolution Markov Data Availability Random Fields on a GPU,” Pattern Recognition Letters, vol. 48, pp. 49–59, 2014. +e secondary data source used to support the findings of [13] T.-H. Chan, K. Jia, S. Gao, J. Lu, Z. Zeng, and Y.Ma, “PCANet: this study are available from the AT&T database (https:// a simple deep learning baseline for image classification?” IEEE www.kaggle.com/kasikrit/att-database-of-faces) and YaleB Transactions on Image Processing, vol. 24, no. 12, pp. 5017– database (https://github.com/Suchetaaa/CS663- 5032, 2015. Assignments/tree/ [14] W. Deng, J. Hu, Z. Wu, and J. Guo, “From one to many: pose- 0426d951d0212ed3dd831377a0df11551670ab87/ Aware Metric Learning for single-sample face recognition,” Assignment-4/1/CroppedYale). Pattern Recognition, vol. 77, pp. 426–437, 2018. Applied Computational Intelligence and Soft Computing 13 [15] Z. Chen, W. Shen, and Y. Zeng, “Sparse representation for [32] S. Ahmadi and M. Rezghi, “Generalized low-rank approxi- pose invariant face recognition,” International Journal for mation of matrices based on multiple transformation pairs,” Engineering Modelling, vol. 30, no. 1–4, pp. 37–47, 2017. Pattern Recognition, vol. 108, 2020. [16] T. Zhang, W. Zheng, Z. Cui, Y. Zong, J. Yan, and K. Yan, “A [33] B. B. Benuwa, B. Ghansah, and E. K. Ansah, “Kernel based deep neural network-driven feature learning method for locality – sensitive discriminative sparse representation for multi-view facial expression recognition,” IEEE Transactions face recognition,” Scientific African, vol. 7, Article ID e00249, on Multimedia, vol. 18, no. 12, pp. 2528–2536, 2016. 2020. [17] M. M. Ghazi and H. K. Ekenel, “A comprehensive analysis of [34] L. Zhou, W. Li, Y. Du, B. Lei, and S. Liang, “Adaptive illu- deep learning based representation for face recognition,” in mination-invariant face recognition via local nonlinear multi- Proceedings of the IEEE Computer Society Conference on layer contrast feature,” Journal of Visual Communication and Computer Vision and Pattern Recognition Workshops, Image Representation, vol. 64, Article ID 102641, 2019. pp. 102–109, Las Vegas, Nevada, USA, July 2016. [35] Y. Xu, Z. Zhong, J. Yang, J. You, and D. Zhang, “A new [18] M. M. Ghazi and H. K. Ekenel, “Automatic emotion recog- discriminative sparse representation method for robust face nition in the wild using an ensemble of static and dynamic recognition via $l_{2}$ regularization,” IEEE Transactions on representations,” in Proceedings of the ICMI 2016 - Proceedings Neural Networks and Learning Systems, vol. 28, no. 10, of the 18th ACM International Conference on Multimodal In- pp. 2233–2242, 2017. teraction, pp. 514–521, Tokyo, Japan, November 2016. [19] Z. Zhang, P. Luo, C. C. Loy, and X. Tang, “Learning deep representation for face alignment with auxiliary attributes,” IEEE Transactions on Pattern Analysis and Machine Intelli- gence, vol. 38, no. 5, pp. 918–930, 2016d. [20] C. Ding and D. Tao, “Pose-invariant face recognition with homography-based normalization,” Pattern Recognition, vol. 66, pp. 144–152, 2017. [21] R. Sharma and M. S. Patterh, “A new hybrid approach using PCA for pose invariant face recognition,” Wireless Personal Communications, vol. 85, no. 3, pp. 1561–1571, 2015. [22] C. N. Duong, K. Luu, K. G. Quach, and T. D. Bui, “Deep appearance models: a deep Boltzmann machine approach for face modeling,” International Journal of Computer Vision, vol. 127, no. 5, pp. 437–455, 2019. [23] X. Duan and Z.-H. Tan, “A spatial self-similarity based feature learning method for face recognition under varying poses,” Pattern Recognition Letters, vol. 111, pp. 109–116, 2018. [24] K. Singh, M. Zaveri, and M. Raghuwanshi, “Rough set based pose invariant face recognition with mug shot images,” Journal of Intelligent & Fuzzy Systems, vol. 26, no. 2, pp. 523–539, 2014. [25] Y. Zhao, L. Li, and Z. Liu, “A novel algorithm using affine- invariant features for pose-variant face recognition,” Com- puters & Electrical Engineering, vol. 46, pp. 217–230, 2015. [26] K. H. Abdalhamid and W. Jeberson, “Pose-invariant face recognition by means of artificial bee colony optimized knn classifier,” Journal of Advanced Research in Dynamical and Control Systems, vol. 11, no. 8, pp. 525–539, 2019. [27] Y.-D. Zhang, Z.-J. Yang, H.-M. Lu et al., “Facial emotion recognition based on biorthogonal wavelet entropy, fuzzy support vector machine, and stratified cross validation,” IEEE Access, vol. 4, pp. 8375–8385, 2016. [28] G. Sang, J. Li, and Q. Zhao, Pose-Invariant Face Recognition via RGB-D Images, 2016. [29] M. Haghighat, S. Zonouz, and M. Abdel-Mottaleb, “CloudID: trustworthy cloud-based and cross-enterprise biometric identification,” Expert Systems with Applications, vol. 42, no. 21, pp. 7905–7916, 2015. [30] M. Tiwari and A. K. Shukla, “An Implementation of FACE recognition system ( FARS ) using PCA and PSO based tech- niques,” State of the Art in Face Recognition, vol. 211007, no. 6, pp. 225–229, 2016. [31] K. Sasirekha and K. +angavel, “Optimization of K-nearest neighbor using particle swarm optimization for face recog- nition,” Neural Computing and Applications, vol. 31, no. 11, pp. 7935–7944, 2019.