Statistical Assessment of Imputation Algorithms for Estimation of Missing

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

University Of Ghana

Abstract

The validity and quality of data analysis relies largely on the data accuracy and completeness of the data matrix. Missing values are unavoidable statistical research problems in almost every research study and if not handled properly, may provide negative and bias conclusion. This study purposely sought to investigate the efficacy and accuracy of the convergence of five imputation algorithms: expectation maximization (EM), multiple imputation by chained equation (MICE), k nearest neighbor (KNN), mean substitution (MS) and regression substitution (RS) in estimating and replacing missing values in crosssectional world population data sheet using MCAR and MAR assumptions. This thesis used Little’s Test to verify whether a given data matrix with missing values is MCAR or MAR. Multiple linear regression analysis model was used to run the complete data of the world population data sheet, and thereafter, missing values in the complete data sets were artificially introduced at 5%, 10%, 20%, 30% and 40% under two missing data mechanisms (MCAR & MAR). The imputation algorithms used for evaluating missing data problems were assessed and compared using average coefficient difference (ACD) of multiple linear regression (MLR) model, mean absolute difference (MAD) and the coefficient of determination (R2). The study suggested that, when data on cross-sectional World Population Data Sheet is missing completely at random (MCAR) and normally distributed, the regression substitution is the best approach. The MICE algorithm was found to be comparatively the best method for replacing missingness under MAR assumption. Since this thesis is mainly concentrated on missing data imputation in a crosssectional dataset, it is recommended that in future categorical and longitudinal studies should be considered.

Description

MPhil.

Citation

Endorsement

Review

Supplemented By

Referenced By