Statistical Assessment of Imputation Algorithms for Estimation of Missing

dc.contributor.authorGyimah, O.
dc.date.accessioned2018-11-28T16:26:03Z
dc.date.available2018-11-28T16:26:03Z
dc.date.issued2018-10
dc.descriptionMPhil.en_US
dc.description.abstractThe validity and quality of data analysis relies largely on the data accuracy and completeness of the data matrix. Missing values are unavoidable statistical research problems in almost every research study and if not handled properly, may provide negative and bias conclusion. This study purposely sought to investigate the efficacy and accuracy of the convergence of five imputation algorithms: expectation maximization (EM), multiple imputation by chained equation (MICE), k nearest neighbor (KNN), mean substitution (MS) and regression substitution (RS) in estimating and replacing missing values in crosssectional world population data sheet using MCAR and MAR assumptions. This thesis used Little’s Test to verify whether a given data matrix with missing values is MCAR or MAR. Multiple linear regression analysis model was used to run the complete data of the world population data sheet, and thereafter, missing values in the complete data sets were artificially introduced at 5%, 10%, 20%, 30% and 40% under two missing data mechanisms (MCAR & MAR). The imputation algorithms used for evaluating missing data problems were assessed and compared using average coefficient difference (ACD) of multiple linear regression (MLR) model, mean absolute difference (MAD) and the coefficient of determination (R2). The study suggested that, when data on cross-sectional World Population Data Sheet is missing completely at random (MCAR) and normally distributed, the regression substitution is the best approach. The MICE algorithm was found to be comparatively the best method for replacing missingness under MAR assumption. Since this thesis is mainly concentrated on missing data imputation in a crosssectional dataset, it is recommended that in future categorical and longitudinal studies should be considered.en_US
dc.identifier.urihttp://ugspace.ug.edu.gh/handle/123456789/25984
dc.language.isoenen_US
dc.publisherUniversity Of Ghanaen_US
dc.subjectStatisticalen_US
dc.subjectImputation Algorithmsen_US
dc.subjectCross Sectional Dataen_US
dc.titleStatistical Assessment of Imputation Algorithms for Estimation of Missingen_US
dc.typeThesisen_US

Files

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.6 KB
Format:
Item-specific license agreed upon to submission
Description: