Abstract:
Research has shown that current health expenditure in most countries, especially in sub-Saharan Africa, is inadequate and
unsustainable. Yet, fraud, abuse, and waste in health insurance claims by service providers and subscribers threaten the delivery of
quality healthcare. It is therefore imperative to analyze health insurance claim data to identify potentially suspicious claims.
Typically, anomaly detection can be posited as a classification problem that requires the use of statistical methods such as mixture
models and machine learning approaches to classify data points as either normal or anomalous. Additionally, health insurance
claim data are mostly associated with problems of sparsity, heteroscedasticity, multicollinearity, and the presence of missing
values. The analyses of such data are best addressed by adopting more robust statistical techniques. In this paper, we utilized the
Bayesian quantile regression model to establish the relations between claim outcome of interest and subject-level features and
further classify claims as either normal or anomalous. An estimated model component is assumed to inherently capture the
behaviors of the response variable. A Bayesian mixture model, assuming a normal mixture of two components, is used to label
claims as either normal or anomalous. +e model was applied to health insurance data captured on 115 people suffering from
various cardiovascular diseases across different states in the USA. Results show that 25 out of 115 claims (21.7%) were potentially
suspicious. +e overall accuracy of the fitted model was assessed to be 92%. +rough the methodological approach and empirical
application, we demonstrated that the Bayesian quantile regression is a viable model for anomaly detection.