Predictive Models for Identifying Critical Units for Inspection in a Regulatory Body
Abstract
Routine inspections are conducted at various food establishments that yield large
data sets, which capture attributes useful for data mining algorithms to predict
critical violations. Critical violations related to food establishments cause serious
public health problems, which may happen as result of unhygienic environment,
leading to food contamination. This study presents predictive models to detect
critical violations in food establishments by employing Logistic Regression (LR),
Support Vector Machine (SVM) and K-Nearest Neighbour (KNN). A database
from the City of Chicago data portal that contained food inspections from 2011
to 2014 was used. In the preliminary analysis, Principal Component Analysis
was utilised and ten (10) relatively relevant variables, that are independent of
each other, were selected from twenty-eight (28) to be used as inputs in the
models. In the family of the SVM, several kernels were used and the optimal
model selected was based on the performance measures Receiver Operating
Characteristic (ROC), sensitivity and specificity. The optimal model of the KNN was also selected based on the same performance measures. The out of sample
classification accuracies for the LR, SVM and KNN classifiers were 92:7872%,
92:7873% and 92:6650% respectively. The performances of the models showed no
large marginal differences in classification accuracies; however, the SVM model
appears to provide a better discrimination ability as compared to the LR and
KNN.
Description
MPhil.
Keywords
Predictive Models, Regulatory Body, Critical Units, Inspection