University of Ghana http://ugspace.ug.edu.gh DETERMINANTS OF LOW BIRTHWEIGHT IN GHANA BY APPIAH KUBI FELIX JUNIOR (10638150) THIS THESIS IS SUBMITTED TO THE UNIVERSITY OF GHANA, LEGON IN PARTIAL FULFILLMENT OF THE REQUIREMENT FOR THE AWARD OF MASTER OF PHILOSOPHY STATISTICS DEGREE JULY 2019 University of Ghana http://ugspace.ug.edu.gh DECLARATION I, hereby declare that this submission is my own work towards the award of Master of Philosophy and that, to the best of my knowledge, it contains no materials previously published by another person nor material which has been accepted for the award of any other degree of the University, except where due acknowledgment has been made in the text. Signature ……………………….…… Date …………………………... Appiah Kubi Felix Junior (Student) We hereby certify that this thesis was prepared from the candidate’s own research work and has been submitted for examination with our approval as University supervisors. Signature ………………… Date …………………………... Dr. Isaac Baidoo (Principal Supervisor) Signature ………………… Date ……………….…………… Dr. Felix O. Mettle (Co-Supervisor) ii University of Ghana http://ugspace.ug.edu.gh ABSTRACT The study objectives are to determine the key factors that may be accounting for birthweight of infants, estimate its distribution, prevalence and proposes a statistical model that can be useful in epidemiological studies in Ghana. Both estimation (Linear) and classification (Logistic) analysis was conducted using data from GDHS. Covariates found to statistically associated with newborn’s weight are gender, size of child at birth, region, wealth index, birth order, total number children ever had, delivery place, preceding birth interval, and birth type. Size of child (Very Large) was the largest contributor to the explained variation in birthweight in the Multiple Linear regression. The Multivariable Logistic regression, for every one-unit change in mothers height, the log odds of low birthweight (against normal birthweight) of a child decreases by -0.107kg with an odds ratio of 0.898kg and for a unit decrease in birth order of the infant, the log odds of being low weight decreases by -0.714kg with an odds ratio of 0.490kg. Region depicted that (Brong Ahafo, Central, Greater Accra, Upper East and Volta) was found to be significantly related to newborn’s weight, implying that a mother from any of these regions is more likely to have a NBW as compared to a mother from Ashanti. A male child was 0.397 times more likely to be NBW as compared to a female child. Single birth babies were 0.001 times more likely to be NBW as compared to multiple births. Only mothers from the poorest class was found to be significantly related to birthweight which implied that the newborns whose mothers are under the poorest class would 1.973 times more likely to be NBW as compared to mothers belonging to the middle class. Size of a child at birth was found to be significantly related to birthweight of a newborn. The study has contributed to the understanding of maternal determinants associated with infant birthweight at the population level. Findings have therefore, provided a starting point towards identifying risk factors and providing clues to health service providers on maternal determinants and birth outcomes factor to concentrate health promotion messages on. iii University of Ghana http://ugspace.ug.edu.gh DEDICATION I will give thanks to you, LORD, with all my heart; I will tell of all your wonderful deeds. (Psalm 9:1) This thesis is dedicated to the Almighty God, who led me through it all to complete my course and research work successfully. Additionally, I dedicate this thesis to my lovely parents, Rev. Appiah Kusi Felix and Mrs. Dorothy Appiah who sacrificed their priorities, gave me invaluable educational opportunities and made efforts for me to be here today. iv University of Ghana http://ugspace.ug.edu.gh ACKNOWLEDGMENT A study on the Determinants of Low Birthweight in Ghana was undertaken with an objective to analyze the factors affecting birthweight of a newborn. Many people helped me in every single imaginable approach to finish this exploration work to whom I wish to offer my thanks. To start with, I might want to thank the all-powerful God for giving me the opportunity to pursue my graduate study at the Department of Statistics, University of Ghana. I owe the most profound appreciation to Dr. Isaac Baidoo and Dr. Felix O. Mettle, my thesis advisors who assisted me throughout the dissertation process. I fumbled in many ways during the early stages but their rapid response and timely feedback, even during odd hours, was just extraordinary. I am truly amazed by the energy and the quality of the guidance they provided. Their stimulating support helped me in preparing my thesis within the timeframe. My deepest thanks go to my parents, Rev. Appiah Kusi Felix and Mrs. Dorothy Appiah who has bolstered me right to fulfill my dream. I also extend my gratitude to my friend Bright Antwi Boasiako, for playing a pivotal role in shaping my research methodology and data analysis. I am truly appreciative for liberally giving your opportunity to look at my research work, regardless of how bustling you were. Moreover, to Rachael Adalevo for her profitable, productive remarks and supportive gestures all through my study. Finally, I extend my gratitude to Lecturers and Colleagues of the Department of Statistics and Actuarial Science, University of Ghana who contributed in many ways behind the scenes. Once again, to God the giver of life and the wherewithal to dream of and execute the program, be the Glory. v University of Ghana http://ugspace.ug.edu.gh TABLE OF CONTENT DECLARATION......................................................................................................................................... ii ABSTRACT ................................................................................................................................................ iii DEDICATION............................................................................................................................................ iv ACKNOWLEDGMENT ............................................................................................................................ v TABLE OF CONTENT ............................................................................................................................. vi LIST OF TABLES ..................................................................................................................................... ix LIST OF FIGURES .................................................................................................................................. xii LIST OF ABBREVIATIONS ..................................................................................................................xiii CHAPTER ONE ......................................................................................................................................... 1 INTRODUCTION ....................................................................................................................................... 1 1.0 Introduction ....................................................................................................................................... 1 1.1 Background of Study ........................................................................................................................ 1 1.2 Statement of Problem ....................................................................................................................... 2 1.3 Research Objectives .......................................................................................................................... 3 1.3.1 General Objective ...................................................................................................................... 3 1.3.2 Specific Objectives ..................................................................................................................... 3 1.4 Research Questions ........................................................................................................................... 3 1.5 Scope of the Study ............................................................................................................................. 4 1.6 Rationale for the current study ........................................................................................................ 5 1.7 Study significance .............................................................................................................................. 5 1.8 Organization of the thesis ................................................................................................................. 6 CHAPTER TWO ........................................................................................................................................ 8 LITERATURE REVIEW .......................................................................................................................... 8 2.0 Introduction ....................................................................................................................................... 8 2.1 Overview of Birthweight .................................................................................................................. 8 2.2 Classification of birthweight ............................................................................................................ 8 2.3 Consequences of Low birthweight ................................................................................................... 8 2.4 Determinants of Birthweight .......................................................................................................... 9 2.4.1 Normal factors .......................................................................................................................... 10 2.4.2 Intermediate factors ................................................................................................................. 10 2.4.3 Non-Normal factors ................................................................................................................. 14 vi University of Ghana http://ugspace.ug.edu.gh 2.5 Related Literature Reviews outside Ghana .................................................................................. 17 2.6 Related Literature Reviews in Ghana ........................................................................................... 18 2.8 Conclusion ....................................................................................................................................... 20 CHAPTER THREE .................................................................................................................................. 21 METHODOLOGY ................................................................................................................................... 21 3.0 Introduction ..................................................................................................................................... 21 3.1 Data description .............................................................................................................................. 21 3.1.1 Type and Source of Data ......................................................................................................... 21 3.1.2 Sampling and Sampling Procedures ...................................................................................... 21 3.1.3 Instrumentation and Operationalization ............................................................................... 22 3.1.4 Our Study assumptions ........................................................................................................... 23 3.2 Application of Statistical Methods ................................................................................................. 24 3.2.1 Descriptive Analysis ................................................................................................................. 24 3.2.2 Cross-tabulation for birthweight and birth size (2 x 2 Contingency table) ........................ 25 3.2.3 Kappa Statistics and Fleiss ...................................................................................................... 27 3.2.4 The Chi-square 𝝌𝟐 test ............................................................................................................ 28 3.3 Statistical Model Building .............................................................................................................. 28 3.3.1 Our study model building........................................................................................................ 29 3.3.2 Linear Regression Model (Multiple) ...................................................................................... 29 3.3.3 Logistic Regression Model ....................................................................................................... 36 3.3.4 Variable Selection for Model Building ................................................................................... 43 3.3.5 Developing the Screening tool ................................................................................................. 44 3.4 Conceptual framework ................................................................................................................... 44 CHAPTER FOUR ..................................................................................................................................... 46 DATA ANALYSIS AND DISCUSSION OF RESULTS ....................................................................... 46 4.0 Introduction ..................................................................................................................................... 46 4.1 Birthweight Data Analysis .............................................................................................................. 46 4.1.1 Birthweight Distribution ......................................................................................................... 47 4.1.2 Classification of Birthweight ................................................................................................... 48 4.2 Prevalence rate of Low birthweight (LBW) ................................................................................. 48 4.2.1 WHO’s Recommendation ........................................................................................................ 48 4.2.2 Researcher’s Adjusted Prevalence ......................................................................................... 48 4.3 Mother’s Reporting of Birth Size Reliability Analysis ................................................................ 49 vii University of Ghana http://ugspace.ug.edu.gh 4.3.1 How accurate is a mother at reporting child’s birthweight? ............................................... 49 4.3.2 Assessment of Birthweight verse Size of an infant at birth .................................................. 50 4.3.3 Calculation of Kappa Statistics ............................................................................................... 51 4.3.4 Reporting of Birthweight: (Mother’s Memory Recall or Health Card) ............................. 52 4.4.1 Factor One: Outcome of Pregnancy ....................................................................................... 53 4.4.2 Factor Two: Socio-Economic and Demographic Factors ..................................................... 60 4.4.3 Factor Three: Maternal Anthropometry ............................................................................... 74 4.4.4 Factor Four: Maternal Reproductive Factors ....................................................................... 76 4.5 Multivariate Analysis of Predictors and Birthweight .................................................................. 89 4.5.1 Towards Building a Multiple regression model .................................................................... 89 4.5.2 Towards Building a Logistic Regression Model .................................................................. 104 CHAPTER FIVE .................................................................................................................................... 122 SUMMARY, CONCLUSIONS, AND RECOMMENDATIONS ........................................................ 122 5.0 Introduction ................................................................................................................................... 122 5.1 Summary ........................................................................................................................................ 122 5.1.1 Key Observations from Study Objectives ............................................................................ 123 5.1.2 Final Comments on Multiple Linear Regression fit ............................................................ 124 5.1.3 Final Comments on Logistic Regression fit ......................................................................... 126 5.1.4 Comparison of Two Sets of Fits ............................................................................................ 127 5.2 Conclusions .................................................................................................................................... 127 5.3 Recommendations ......................................................................................................................... 128 REFERENCES ........................................................................................................................................ 130 APPENDICES ......................................................................................................................................... 135 viii University of Ghana http://ugspace.ug.edu.gh LIST OF TABLES Table 1: Variables with the level of Measurement ------------------------------------------------------------------ 23 Table 2: Standard 2 × 2 Table contingency Table ------------------------------------------------------------------- 25 Table 3: Interpretation of Cohen’s kappa & Fleiss Agreement ------------------------------------------------- 27 Table 4: ANOVA Table for Linear Regression (Multiple) -------------------------------------------------------- 33 Table 5: Logit transformation --------------------------------------------------------------------------------------------- 40 Table 6: Descriptive Statistics for Birthweight ----------------------------------------------------------------------- 47 Table 7: Descriptive Statistics of birthweight-based classification --------------------------------------------- 48 Table 8: Birthweight groups against Size of child Crosstabulation -------------------------------------------- 49 Table 9: Kappa Statistics for the contingency table ---------------------------------------------------------------- 52 Table 10: Tabulation of Gender of child ------------------------------------------------------------------------------- 53 Table 11: Descriptive Analysis for Gender of a child across birthweight classifications ---------------- 54 Table 12: Chi-Square Tests for Gender of a child against birthweight classification -------------------- 55 Table 13:Symmetric Measures for Gender of child and birthweight classification ----------------------- 56 Table 14: Tabulation of type of delivery by Caesarean ------------------------------------------------------------ 56 Table 15: Tabulation of type of delivery by Caesarean ------------------------------------------------------------ 57 Table 16: Chi-square tabulation for Type of delivery by cesarean and birthweight --------------------- 57 Table 17: Descriptive analysis of Size of the child ------------------------------------------------------------------- 58 Table 18: Chi-Square Tests for Size of child and birthweight --------------------------------------------------- 59 Table 19: Symmetric Measures for Size of child and birthweight classification --------------------------- 59 Table 20: Descriptive Statistics of Mother Age ----------------------------------------------------------------------- 60 Table 21: Descriptive statistics of Mothers Age by group and Birthweight --------------------------------- 61 Table 22: Descriptive and Crosstabulation for Mothers Age and Birthweight classification ---------- 61 Table 23: Chi-Square Tests for Mothers age and birthweight -------------------------------------------------- 62 Table 24: Mean birthweight across Maternal Educational level ------------------------------------------------ 63 Table 25: Crosstab for mother education and birthweight ------------------------------------------------------- 63 Table 26: Chi-Square tests between Mother education and birthweight ------------------------------------ 64 Table 27: Mean birthweight across various Regions --------------------------------------------------------------- 65 Table 28: Region and Birthweight Crosstabulation ---------------------------------------------------------------- 65 Table 29: Chi-Square Tests across regions and birthweight ----------------------------------------------------- 67 Table 30: Symmetric Measures for regions and birthweight ---------------------------------------------------- 67 ix University of Ghana http://ugspace.ug.edu.gh Table 31: Tabulation of Type of residence ----------------------------------------------------------------------------- 68 Table 32: Crosstab of Type of residence and birthweight -------------------------------------------------------- 68 Table 33: Chi-Square Tests type of residence and birthweight ------------------------------------------------- 69 Table 34: Religion and Birthweight Crosstabulation --------------------------------------------------------------- 70 Table 35: Chi-Square Tests for mother recode religion and birthweight ------------------------------------ 71 Table 36: Symmetric Measures of Religion and birthweight ---------------------------------------------------- 71 Table 37: Mean weight of infants across the Wealth Index ------------------------------------------------------ 72 Table 38: Crosstabulation Wealth index and Birthweight group ---------------------------------------------- 73 Table 39: Chi-Square Tests for Wealth index and Birthweight ------------------------------------------------- 74 Table 40: Symmetric Measures for Wealth index and birthweight -------------------------------------------- 74 Table 41: Descriptive Statistics for Mothers Height ---------------------------------------------------------------- 75 Table 42: Descriptive Statistics for Mother weight ----------------------------------------------------------------- 76 Table 43: Crosstab for Birth order (Parity) and birthweight ---------------------------------------------------- 77 Table 44: Chi-Square Tests for Birth order (Parity) and birthweight ---------------------------------------- 77 Table 45: Symmetric Measures for Birth order (Parity) and birthweight ----------------------------------- 78 Table 46: Tabulation of Total children ever had against Birthweight classification --------------------- 78 Table 47: Crosstabulation of Total Children ever born and Birthweight group -------------------------- 80 Table 48: Chi-Square Tests for total children ever born and birthweight----------------------------------- 80 Table 49: Symmetric Measures for total children ever had/born and birthweight ----------------------- 81 Table 50: Mean Tabulation of delivery place ------------------------------------------------------------------------- 81 Table 51: Delivery place and Birthweight Crosstabulation ------------------------------------------------------ 82 Table 52: Chi-Square Tests for Delivery place and Birthweight ----------------------------------------------- 83 Table 53: Symmetric Measures for delivery place and birthweight ------------------------------------------- 83 Table 54: Descriptive Statistics for Preceding birth interval (Months) --------------------------------------- 84 Table 55: Tabulation of Preceding birth interval and birthweight -------------------------------------------- 85 Table 56: Crosstab of Preceding birth interval and Birthweight ----------------------------------------------- 85 Table 57: Chi-Square Tests for Preceding birth interval and birthweight ---------------------------------- 86 Table 58: Symmetric Measures for Preceding birth interval and birthweight ----------------------------- 86 Table 59: Descriptive statistics for Birth type and Birthweight ------------------------------------------------- 87 Table 60: Birth type and Birthweight group Crosstabulation --------------------------------------------------- 87 Table 61: Chi-Square Tests for Birth type and birthweight ----------------------------------------------------- 88 Table 62: Symmetric Measures for Birth type and Birthweight ------------------------------------------------ 88 x University of Ghana http://ugspace.ug.edu.gh Table 63: Variable selection for Multiple Regression Model building ---------------------------------------- 93 Table 64: Summary of Model: OLS Backward Selection --------------------------------------------------------- 94 Table 65: Summary after resolving the assumptions behind our fitted model --------------------------- 101 Table 66: ANOVA Table for Multiple Linear Regression ------------------------------------------------------ 103 Table 67: Classification Table on full low weight data ----------------------------------------------------------- 105 Table 68: Birthweight in train dataset after applying Sampling method --------------------------------- 106 Table 69: Summary of Variable selection for Logistic Regression Model building --------------------- 107 Table 70: Strength of Relationship of the Logistic Model------------------------------------------------------- 113 Table 71: Summary analysis of the final Logistic regression model ----------------------------------------- 114 Table 72: Significant Variable in the Logistic model ------------------------------------------------------------- 117 Table 73: Classification Table on the Test data -------------------------------------------------------------------- 121 xi University of Ghana http://ugspace.ug.edu.gh LIST OF FIGURES Figure 1: Data Cleaning Procedure --------------------------------------------------------------------------- 22 Figure 2: Conceptual framework for studying determinants of Birthweight ------------------- 45 Figure 3: Histogram of Birthweight distribution (kg) -------------------------------------------------- 47 Figure 4: Birthweight classification Bar plot according to gender---------------------------------- 54 Figure 5: Birthweight histogram across gender ---------------------------------------------------------- 55 Figure 6: Age distribution of mothers ----------------------------------------------------------------------- 60 Figure 7: Birthweight distribution across various Regions ------------------------------------------- 66 Figure 8: Distribution of birthweight across region and type of residence ----------------------- 70 Figure 9: Simple Scatter with Fit Line of Birthweight by Mothers height ----------------------- 75 Figure 10: Simple Scatter with Fit Line of Birthweight by Mothers weight --------------------- 76 Figure 11: Bar Plot of Total Children ever born/had --------------------------------------------------- 79 Figure 12: Boxplot of Birth and Total children ever born --------------------------------------------- 79 Figure 13: Preceding Birth Interval in Months ----------------------------------------------------------- 84 Figure 14: QQ-Plot for Longitudinal birthweight (kg) ------------------------------------------------- 90 Figure 15: Fitted Verse Residuals for the Multiple Linear Model fit ------------------------------ 96 Figure 16: ROC Curve for our Logistic Model --------------------------------------------------------- 111 Figure 17: Cooks distance for the Logistic Model ------------------------------------------------------ 112 xii University of Ghana http://ugspace.ug.edu.gh LIST OF ABBREVIATIONS ANC Antenatal Care ANOVA Analysis of Variance AOR Adjusted Odds Ratio CEB Children Ever Born CI Confidence Interval CM Centimeter GDHS Ghana Demographic and Health Survey GMHS Ghana Maternal Health Survey GM Gram KG Kilogram LRT Likelihood Ratio Test MICS Multiple Indicator Cluster Survey NBW Normal Birthweight OLS Ordinary Least Squares OR Odds Ratio REG Region Std. Standard Deviation SOC Size of Child UNICEF United Nations International Children’s Emergency Fund WHO World Health Organization xiii University of Ghana http://ugspace.ug.edu.gh CHAPTER ONE INTRODUCTION 1.0 Introduction At the point when a newborn enters the world, the question of survival of life comes to mind. When one thinks about the survival of human life, then health becomes the first indicator. Numerous efforts, plans, and interventions are arranged by the world to save the life of a child. In recent years, the World Health Organization (WHO) has adapted to the slogan “Children’s health is tomorrow’s wealth”. 1.1 Background of Study Birthweight is a significant determinant of a newborn’s exposure to the risk of diseases and his/her odds of survival. The World Summit has selected the incidence of birthweight as an essential indicator for observing and monitoring major health goals for Children (UN, 2002). Birthweight is a projected (plan) characteristic of future development, improvement and a reassessment (review) determinant of maternal nutrition and health status. Globally, birthweight is a good summary measurement for the heterogeneous public health difficulty that reflects the undernourishment of mothers and poor pregnancy healthcare. Birthweight, therefore, is a noteworthy indicator of mortality, morbidity and infancy and childhood disability (WHO, 2010). In addition, to be an integral endpoint, birthweight is affiliated with several unfavorable health outcomes in childhood and adulthood. The challenge of birthweight is multidimensional, and it needs a coordinated methodology to incorporate socio-economic, medical and educational measures to address the issues. However, to recognize the factors influencing birthweight has been the paramount challenge in the public health sector. 1 University of Ghana http://ugspace.ug.edu.gh 1.2 Statement of Problem The World Health Organization characterize low weight of infant as weight less than 2.5kg (2500g) at birth irrespective of community, region, culture or gestational age with measurement following birth, preferably between the first hour of life before critical postnatal weight reduction happens. This cut off is utilized worldwide and depends on epidemiological observations of newborn babies. About 20.5 million babies, an estimated 14.6 percent of all newborns all around the world that year, suffered from low birthweight. (UNICEF, 2015). As indicated by the World Bank collection of development indicators, low birthweight babies in Ghana was 10.70% as of 2011. Its highest value over the past 18 years was 16.10% in 2003, while its lowest value was 9.10% in 2006. Notwithstanding, the latest recent WHO data published in 2017, low birthweight deaths in Ghana was 8224, approximately (3.91%) of total deaths. Low birthweight is positioned seventh among the best 20 causes for mortality in Ghana and 29th on the world (WHO, 2017). The low weight of neonates is a demanding public health menace in Ghana. Its criticality originates because infant morbidity is an extreme determinant that contributes generously to the comprehensive handicap of childhood mortality. Low weight neonates are at a high risk of serious medical issues, enduring disabilities, and even death. It is, hence, a significant indicator that has a persevering impact on health results in adult life. Being undernourished in the uterine accelerates the possibility of death in the early months and infant’s years ahead. The survived is prone to suffer from an impaired immune disability and a high likelihood of contracting diseases; likely to remain malnourished, minimized muscle strength and conceptual abilities during their lives. As adults, they are subjected to an excessive rate of diabetes and heart-related diseases. In spite of the fact that indications of advancement in infant medical care have significantly diminished the number of deaths affected by low weight during childbirth, hearing misfortune and 2 University of Ghana http://ugspace.ug.edu.gh vision, learning issues, cerebral paralysis and a little level of survivors creates mental impediment. It is viewed as the single most critical indicator of mortality particularly of death inside the inside month of baby's life. However, the present study is “To find out the prevalence of low birthweight and assess the different factors affecting the weight of a newborn child in Ghana”. 1.3 Research Objectives 1.3.1 General Objective To identify a method for the explanation of birthweight in terms of cognizable factors so that using this method a possible procedure is defined to assess the weight of the newborn infant beforehand. 1.3.2 Specific Objectives  To calculate the prevalence estimates of birthweight (low and normal).  To examine the association between the key risk factors affecting birthweight.  To assess the type of birthweight reporting system in Ghana.  To investigate the use of the Mother’s perception of child size at birth (variable) as a proxy to birthweight of an infant if weigh data is missing.  To develop statistical prediction models for the indicators of birthweight using mothers’ characteristics and birth outcomes. 1.4 Research Questions Different investigations have been done with various factors influencing birthweight with various settings. The review of literature unveils a major public health burden associated with birthweight. As evident from the available literature, very few studies attempted at utilizing the available large- scale quality data like Ghana Demographic and Health Survey (GDHS) to carry out systematic statistical analysis of existing burden of low birthweight and related epidemiological models. In 3 University of Ghana http://ugspace.ug.edu.gh view of these perspectives, the present examination has been done to answer a portion of the pertinent questions like: (i) What are the prevalence rates of birthweight (low and normal)? (ii) What is the type of birthweight reporting system in Ghana? (iii) Can we investigate the use of the Mother’s perception of child size at birth (variable) as a proxy to birthweight of an infant if weigh data is missing? (iv) Which factors contribute to and influences the birthweight of newborns in Ghana? (v) If possible, can we hypothesize a statistical model in predicting the likelihood of a child’s birthweight belonging to a low or normal class using chosen factors such as maternal factors, outcomes of birth and maternal anthropometric variables? Such studies from time to time will substantiate the epidemiological evidence on low birthweight and provide insight for planning appropriate interventions. Epidemiological model on low birthweight for the country as well as for specific region may give extra insights for the policy planners. 1.5 Scope of the Study Indeed, even today, the study of risk factors contributing to birthweight is famous among researchers. Since the research carried out in various geographical environment (settings) indicated diverse outcomes, the causal relationship between critical risk factors and birthweight establishment is hard. This study brings together evidence from past and current researches with an objective to address a significant public health challenge in relation to what already persists in Ghana. Moreover, the emphasis of the study is to establish the association between key risk indicators and infants’ weight from a population-based dataset and to explore how these relationships were mediated by some covariates. Women aged between 15 and 49 years with a 4 University of Ghana http://ugspace.ug.edu.gh history of live delivery were included in the research. External validity is a critical challenge in research when results from a sample of data are inferred into the whole population. The fact that GDHS data were used and approximate very closely to a census, the results may be generalizable. 1.6 Rationale for the current study As a statistician attempting to utilize data modeling skills in giving a reasonable clarification to birthweight, which is overwhelmingly an issue related to the medical profession, may pose many challenges. From the available literature on this problem, it may be observed that there is not a unique method to estimate birthweight, which suits every environment. As the competing variables are many in number and they vary in their characteristics due to variations in demographic, behavioral and prenatal care of mothers, a unified methodology to estimate birthweight seems to be a difficult task. This might be an explanation behind the continuous enthusiasm for this issue. In this context, this study is aimed at providing an explanation of birthweight using all available related maternal characteristics, demographic, behavioral, pre-natal care profiles. The main issue being addressed is the question of identifying a procedure that could be applied to a large class of situations. Many statistical tools are available to evaluate the characteristics of variables under consideration and generate a model that may provide a stable procedure to estimate birthweight. The study involves data obtained from Ghana Demography and Health Survey 2014. 1.7 Study significance Few statistical models have been developed yet to address the issue. This study proposes to analyze these elements and build up an exploratory model of these determinants of birthweight that might be valuable in epidemiological investigations. Moreover, the study is significant because it uses the descriptive information from the Ghana Demography Health Survey 2014 report to draw conclusions about the general population of Ghanaian children with respect to the information 5 University of Ghana http://ugspace.ug.edu.gh obtained from the sample. In this regard, the research findings will be a contribution to knowledge and would be a basis for further research into birthweight in Ghana and the world. The research findings will empower Stakeholders, Obstetricians, Pediatricians, Public Health Physicians, and Health Policymakers to have much insight into the significant contributing factors of birthweight. Additionally, to comprehend the epidemiology of low birthweight, decrease its prevalence and associated public health consequences in order to decrease the scope of intervention policies to accelerate the success of achieving the Millennium Development Goal 4. It is also amazing to note that, this study will serve as a guide in detailing, formulation, and implementation of child survival policies in the country. 1.8 Organization of the thesis The present investigation under the entitled “Determinants of Low birthweight in Ghana” is organized into five chapters. Chapter one gives a concise introduction and details the background for our study. It likewise discusses the problem statement understudy, the study objectives (general and specific), the research questions and the study significance. Chapter two deals with different studies done in parts of the world. It deals with the review of developed countries, developing countries, Africa and Ghanaian studies. It additionally manages the hypothetical foundation on birthweight by different researchers with a comprehensive review to establish relationships between birthweight and different factors. It gives us a frame of reference on the key risk determinants significant to the study. The methodology adopted for the study is described and detailed in chapter three. 6 University of Ghana http://ugspace.ug.edu.gh In chapter four, the data is explored, analyzed and discussed using R Software packages and SPSS 25 to produce results. Different tools such as frequency, percentage analysis, Analysis of Variance (ANOVA), Crosstabs and Chi-square test were employed. Chapter five presents the discussion (overall findings to the objectives understudy) based on results produced from the analysis and presents conclusions established from the researcher’s findings. The concrete recommendations based on the present study are also given in this chapter to authorities, policymakers and others who have initiated the task of saving low birthweight. A discussion of both opportunities for further study and limitations of the research work are likewise noted. 7 University of Ghana http://ugspace.ug.edu.gh CHAPTER TWO LITERATURE REVIEW 2.0 Introduction A theoretical foundation in our birthweight study is formulated in this chapter. Its primary concern is to review pieces of literature from epidemiological setting the significant indicators whose influence has been reported and discussed. 2.1 Overview of Birthweight Birthweight is the first weight of the newborn baby measured following birth, preferably between the first hour of life before critical postnatal weight reduction happens. In all countries (developing and developed), birthweight is likely the most significant single factor that influences neonatal mortality, notwithstanding be a significant determinant of post neonatal mortality and Child mortality. It is said to literally follow a person from the cradle to the grave as it is related not just with morbidity and mortality of infants, but with outcomes happening later in life, including adult mortality (Basso, 2008). Thus, birthweight has been considered as a sensitive index (indicator) of a country's wellbeing and viability of a newborn infant. 2.2 Classification of birthweight For past centuries, birthweight has been treated as dichotomous. ‘Low birthweight’ (LBW) is the class of infants weighing under 2500 grams (up to and including 2499g) during childbirth, and 'Normal birthweight' is all the rest (that is greater than or equal to 2500 gram). 2.3 Consequences of Low birthweight The effect of birthweight seems to expand well beyond infancy. Low birthweight infants are vulnerable to early growth retardation, exposed to infectious disease and considered death during infancy and childhood (WHO, 2015). As indicated by the fetal origin’s hypothesis (Barker, 2012), infant undernutrition for which low birthweight is a marker, may for all time program the body by 8 University of Ghana http://ugspace.ug.edu.gh lessening the number of cells in specific organs, changing the circulation of cell types and affecting metabolic processes. These customized changes are related to an assortment of chronic infection results during the adult stage and old age, for example, cardiovascular ailment, diabetes, and hypertension. Moreover, existing literature suggests that infants of low weight are more likely than normal birthweight infants to have physiological, neurodevelopmental complications and congenital abnormalities extending from early stages through adolescence and into adulthood. Studies have discovered a significant relationship among birthweight and school-age inabilities (Avchen, Scott and Mason, 2001), behavior issues (Sommerfelt, Ellertsen and Markestad, 1993), cognitive function during young adulthood (Richards, Hardy, & Kuh, 2001), preterm birth (Porter, Fraser, Hunter, Ward & Varner, 1997) and gestational diabetes (Innes, 2002). It is also postulated that the most common cause of morbidity and mortality are associated with low birthweight babies (Maheswari, 2014). It is commonly perceived that being brought into the world with low weight is a hindrance for the infant. 2.4 Determinants of Birthweight By going through the available literature on research relating to birthweight, it is observed that a vast majority of researchers have reported the relationship between wide spectrums of factors. Considering such factors, the present study classified them into three namely; (1) Normal, (2) Intermediate and (3) Non-normal. The studies highlight the importance of these variables sequentially below. 9 University of Ghana http://ugspace.ug.edu.gh 2.4.1 Normal factors These are indicators that are an element of all pregnancy and where the related increment or decrement in fetal development rate does not of itself impact the prognosis for the child or the mother future pregnancies. 2.4.1.1 Sex of infant Of a similar gestation, male newborn children appear to be heavier than female babies. This discovering shows up dependably in a wide scope of population studies from Australia (Kettle, 1960) to Canada (Love and Kinch, 1965) to Aberdeen (Thomson, 1968) and to Malta (Camilleri and Cremona, 1970). Although differences do happen (Adams and Niswander, 1968), in general, the reported variation is somewhere in the range of 120g and 150g. 2.4.2 Intermediate factors These are factors present in all pregnancy but where the consequences for development rate may themselves impact prognosis or where the impacts are bewildered with the impacts of ‘non- normal’ indicators. 2.4.2.1 Parity (Birth order) Parity is the number of deliveries a woman has born after a pregnancy duration of at least 28 completed weeks. It is one of the central factors fundamentally associated with birthweight, obviously, an element of all pregnancy. It was settled that first newborn is lighter than later newborn at all ages of gestation, that is birthweight increases with birth order (Seidman et al., 1988). Greenwood R, et al. (1994) compared singleton survivors born to multigravida and showed that higher the gravidity, lower is the risk of poor pregnancy outcome for current pregnancy. Data from the Indian population shows similar results. For example, on investigating 331 Bengalee 10 University of Ghana http://ugspace.ug.edu.gh mother-infant sets, (Bisai 2006) saw that the difference of mean birthweight among first and second pregnancy was 145 grams, while that among second and third pregnancy was 72 grams. While (Thomson et al., 1968) study reported no further increment after second pregnancy, Camilleri and Cremona (1970) in Malta found that weight of newborns increases up to the ninth or tenth pregnancy. Additional researches have demonstrated that Mothers who had 3-4 children were 24% more likely to have low weight infant as compared with women who had brought forth at least five children (Muula, Siziya, and Rudatsikira, 2011). 2.4.2.2 Maternal Size 2.4.2.2.1 Maternal height The maternal height provides a measure of past nutrition of the mother and therefore an indicator of long-term nutritional status. The positive correlation between birthweight and maternal height all-inclusive finding. Thomson et al., (1968) demonstrated a difference of 200g to 300g between the shortest women in their study (under 5ft 1in) and the tallest (greater than 5ft 4in). A further study in 1971 by Thomson demonstrated that when a newborn’s weight was plotted against maternal height for several populations with various mean birthweights, the slope of the graphs was approximately parallel. This finding would contend that maternal stature of itself comparably affects birthweight in different population. Additionally, a study conducted to examine the independent effect of maternal height on birthweight demonstrated that increasing maternal height was significant and positively related to newborn child weight (Abrams B., Selvin S., 2000). However, various attempts have been made to identify cutoffs of height for estimating the risk of low birthweight. For instance, Phaneendra Rao et al. (2001) noted that mothers who were shorter than 145cm gave birth to children who weighed almost 600g less than mothers with height more than 160cm did. According to Eltahir 11 University of Ghana http://ugspace.ug.edu.gh Elshibly and Gerd Schmalisch (2008), the maternal height of less than 156cm was found to increase the relative risk for low birthweight about 52% in Sudanese women. Studies done on Indian mothers provide a range of short heights from 145 cm to 156 cm as risk cutoffs associated with low birthweight (Mohanty C. et al., 2006; Nahar S. et al., 2007). Further, Anitha C. J. (2009) showed that an increase in height of 1cm contributed a 13g increase in birthweight. As maternal height does not change during the short duration of pregnancy, it can be used for prediction of risk of low birthweight at the time of registration. 2.4.2.2.2 Maternal weight Weight is the simplest anthropometric measurement recorded in field studies and is an indicator of current nutritional status. Pre-pregnancy maternal weight is one of the significant indicators of birth outcome. Serial measurements of weight during gestation are useful for estimating weight gain that ensures the progress of fetal growth and therefore is an influential parameter associated with birth outcomes. Using weight, Miller et al., (1978) have demonstrated that being maternal underweight is a critical risk characteristic related to infants small for gestational age. Luke and Petrie (1980) discovered that, at the other extreme, among obese women, increasing weight detrimentally affected birthweight. Anderson (1989) recommended that weight in from the early pregnancy, up to the thirteenth week could be utilized as a surrogate of pre-pregnancy weight. While Pune Maternal Nutrition Study (PMNS) reports average pre-pregnancy to be 41.7 kg ± 5.1 kg in (Rao S. et al. 2001), it was 43.7 kg ± 6.6 in Karnataka. Phaneendra Rao et al., (2001) and Agarwal K. N. et al., (2001) reported it to be 42.5 ± 3.9kg in rural Uttar Pradesh, whereas mothers from Bangalore had an average weight of 51.2 (46.2–57.5) kg (Muthayya S., 2005). Among all indicators of low infant weight, Nahar S., et al. (2007) observed the maternal weight to be the most indicator and every 1 kg increment in maternal weight was demonstrated to be related with an 12 University of Ghana http://ugspace.ug.edu.gh increase in birthweight approximately 260g with statistically significant correlation (r = 0.4, p<0.001). 2.4.2.3 Maternal age While mother’s age is a factor normal to every pregnancy, the age at which mothers begin their childbearing are firmly identified with socio-economic status (Thompson, 1982) and due to this close relationship between parity and mother’s age, it is hard to evaluate the independent effect of mother’s age. Leppert et al., (1986) conducted their study among adolescents and older mothers in New York and reported maternal age as a significant predictor of birthweight. Viegas et al., (1989), based on a study conducted in Singapore validated a quadratic relationship between birthweight and maternal age. Fraser et al. (1995) found that a younger maternal age conferred an increased risk for low birthweight. Gage et al. (2009) reported that a mother’s age at birth significantly influences the weight distribution in a study from eight populations in New York State. Selvin and Janerich (1971) and Miller et. al., (1978) contend that adolescent pregnancy is related to small infants. Meanwhile, some latter researchers suggested that mothers beyond age 35 possess a higher risk of giving birth to children small for gestational age. Low birthweight disparities by maternal age are complexly related to socioeconomic disadvantage and current social and behavioral factors. Adolescents or teenage mothers (less than 20 years) frequently have reproductive conditions, worse socioeconomic and perinatal outcomes when compared to other age classes, for example, those between 20-29 years. 2.4.2.4 Ethnic origin Differences in, for instance, lifestyle and nutrition among various ethnic classes represent significant confounding features. Additionally, the intricate impacts of various ethnic blends in the various guardians are hard to evaluate. In Ghana, where our populace is of moderately 13 University of Ghana http://ugspace.ug.edu.gh homogeneous ethnic roots, it is absurd to address the issue of ethnicity in either research or clinical standards. Even when data is accessible, the standard development explicit to ethnicity for clinical purposes might be in fact doable; regardless of whether it is appropriate, could possibly rely upon numerous different factors. Additionally, while it might be useful for analysts to know about the dispersion of various ethnicity in the populace since the difference in child weight-related with differences in ethnicity may confound different investigations. 2.4.3 Non-Normal factors These are factors that might happen and where there is at first sight evidence to assume that the related increment or decrement in the fetus affect prognosis for the child or mother future pregnancies. 2.4.3.1 Socio-economic factors Socioeconomic status is an incorporated measurement of individual work experience, or family’s socioeconomic position in relation to another, based on wealth, educational status and occupation. The prominent factors of socioeconomic status as viewed by various researchers with respect to low birthweight are discussed below. 2.4.3.2.1 Educational level Maternal literacy is one of the important factors that are known to be associated with infant mortality rate and nutritional status of a child. The educational level of individuals in the family has a huge influence on the social welfare of members of the family. Therefore, higher levels of education have relatively larger and increasing benefits (Rolleston, 2011). Less educated mothers are known to have low birthweight infants (Chiavarini, Bartolucci, Gili, Pieroni, & Minelli, 2012). Infants of women with low or intermediate education have significantly higher odds of low birthweight than those of higher education (Gisselmann, 2005). 14 University of Ghana http://ugspace.ug.edu.gh Additionally, Kleinman and Madans (2006) found that women with less than 12 years of education were two times higher odds ratio of low birthweight compared to mothers with greater years of education. Conversely, Subramanyam M. A. et al., (2010) detailed that in India, newborn children were more likely to be brought into the world with low weight, on the off chance that they had mothers with more than 12 years of training compared with 1 to 5 years of education with a relative risk (RR) of 0.79. Some studies do not show an association of low birthweight with maternal education. Maternal education may have direct or indirect effects on pregnancy outcome, its significance may be affected by other social variables when considered simultaneously. 2.4.3.2.2 Residential status In Ghana, most inhabitants live in a rural residence. Residence determines the availability of social amenities like housing, health care, education, and it is shown that rural-urban difference poses these inequalities (Sahn & Stifel, 2003). Living in rural areas in Sub-Saharan Africa means living in a deprived community in terms of social amenities, infrastructures and job opportunities that convey an increased risk of low birthweight. It has also been shown in a study in Ghana that, being a rural dweller increased the probability of having a low weight infant (Kayode et al., 2014). Hillemeier et al., (2007) detailed rural areas further as large rural city-focused areas and more rural areas compared to urban areas and low weight risk is related with some however not a wide range of rural settings when compared with the urban setting. Auger et al., (2009) concluded that rural relative to urban area as well as low socioeconomic status (represented by maternal education) as having an association with low birthweight. 2.4.3.3 Mode of delivery Spontaneous vaginal delivery is the commonest mode of delivery. A study in China has shown that the cesarean section rate of low birthweight infants increased with the increasing of gestational 15 University of Ghana http://ugspace.ug.edu.gh weeks (Chen et al., 2015). It indicated that the cesarean section rate of low birthweight infants was 61.14%, which was higher than that of normal birthweight infants (52.947%). 2.4.3.4 Number of antenatal care (ANC) visits This is the totality a pregnant woman visits the clinic to receive antenatal care until delivery. As per GDHS (2017) report, almost all (98%) women age 15 to 49 with live birth or stillbirth received antenatal care from skilled providers during pregnancy. Whiles 64% of mothers had their visit during the first trimester of pregnancy, 89% followed the WHO recommendation of at least four visits during pregnancy. An inadequate number of ANC visits, laboratory studies and exams has appeared to have a higher risk of low weight newborns. A study in Nepal to find the factors of low birthweight noted that mothers who do not attend antenatal care, have increased odds of having a low weight infant by twice more (Odds Ratio=2.3) (Khanal, Zhao, & Sauer, 2014). Further studies have found that even after adjusting for other differences like socioeconomic status and maternal age, low birthweight was highest 36.8% among the mothers who had no antenatal checkup, but it was 15.9% among those who had check-up more than 7 times (Nahar N. et al., 1998). Negi K. et al., (2006) also noted that mothers with one antenatal care visit had almost six times increased risk of having a low birthweight baby as compared with mothers who received more than four antenatal visits (Odds Ratio = 5.71). Raatikainen et al. (2007) concluded that none or under-attendance for antenatal care is related to an elevated risk of low birthweight. Tayie and Lartey (2008) revealed that early antenatal care is crucial to favorable pregnancy outcomes including birthweight. 16 University of Ghana http://ugspace.ug.edu.gh 2.5 Related Literature Reviews outside Ghana Tema (2006) conducted a cross-sectional descriptive study to determine the prevalence and key indicators of low birthweight infants in Jimma zone, South West Ethiopia from September 1, 2002, to March 30, 2003. A sample size of 645 newborn mother pairs who delivered in the four health centers, one specialized referral hospital, home deliveries and those who received care in the above health facilities within 24 hours of delivery during the study period were considered. Among the 645 live births included in this study, 145 were low birthweight indicating a prevalence rate of 22.5%. Those mothers residing in urban areas had a high proportion of delivering newborn babies with low weight compared to rural mothers with differences statistically significant (p = 0.00). The other factors related to the socioeconomic status of the mothers like age, religion, ethnicity and marital status revealed no statistically significant association (p > 0.05) with low birthweight. Low birthweight in relation to maternal obstetric history showed that mothers who delivered before 37 weeks of gestation and those mothers who had multiple pregnancies had a higher proportion of low birthweight babies with the differences statistically significant (p = 0.01, 0.00 respectively). Mothers who had weight loss and those who did not receive additional nutritional supplement during pregnancy had an increased risk of delivering low birthweight babies with the differences statistically significant (p = 0.00, 0.00) respectively. The other determinants such as the number of pregnancies, history of STI, hypertension, anemia, engaging in light work during pregnancy, family size and history of chronic illness had no association with LBW delivery. Mehri Rejali et al., (2017), the study evaluated assessed elements connected with low birthweight and deployed a decision curve analysis (DCA) to define a scale to predict the likelihood of having a low birthweight newborn baby. As a hospital-based case-control, the study included 470 mothers with normal birthweight neonates and 470 mothers with low birthweight neonates. Factors found 17 University of Ghana http://ugspace.ug.edu.gh to be significantly associated with low birthweight were former low weight infants (OR = 2.99 [1.510–5.932]), last trimester of pregnancy bleeding (OR = 2.58 [1.018–6.583]), hypertension in pregnancy (OR = 2.39 [1.429–4.019]), premature membrane rupture (odds ratio [OR] = 3.18 [1.882–5.384]), mother age >30 (OR = 2.17 [1.350–3.498]) and premature pain (OR = 2.70 [1.659–4.415]). However, with decision curve analysis, the prediction model made on these 15 variables had a net benefit (NB) of 0.3110. 2.6 Related Literature Reviews in Ghana Fosu M. Ofori et al., (2013), the study demonstrated the incidence of low weight among newborns and its associated maternal risk indicators in Manhyia District Hospital, Kumasi-Ghana. The study was a facility-based cross-sectional analysis from the maternity ward of the hospital. 1,200 women were sampled within the reproductive age (15 – 49 years) between 2010 and 2012 from a total delivery of 24,025. Multiple logistic regression was employed to determine the relationship between maternal risk factors and low birthweight. The estimated low birthweight prevalence was reported as 21.1%. The leading factors reported to be significantly associated with low birthweight included maternal age (p-value=0.0160), Residence (p-value =0.0000), Hemoglobin level (p-value =0.0020), Fetal infection (p-value<0.0000) and Antenatal Care (p-value =0.0040). All other variables considered not significant (p-values > 0.05) were height, weight, gestational age and baby’s sex. Mensah Y. Evelyn (2015), the study comprised data obtained through personal interviews from mothers in their postpartum at Ridge Hospital, Kumasi and a secondary data generated from the mother’s antenatal book to identify risk factors associated with low birthweight (LBW) and Low Apgar score (LAS). The study encompassed 330 women who delivered at the hospital between February and March 2015. The prevalence of LBW and LAS at Ridge Hospital were 18.8% and 18 University of Ghana http://ugspace.ug.edu.gh 15.2% respectively. Using logistic regression, the significant risk factors associated with LBW were found to be Retroviral (HIV) status of the mother, Gestational Age, Daily Hours Rested, Frequency of Eating and Type of Cooking Fuel used. The factors that significantly influenced LAS were Retroviral (HIV) status of the mother, Gestational Age, and Daily Expenditure. The results showed that HIV-Positive Mothers are more likely to give birth to a newborn with LBW and LAS. Retroviral (HIV) status of the mother was found to be the most important determinant for both LBW and LAS. Atinuke O. Adebanji and Puurbalanta R. (2015), in the study, a logistic regression model was utilized to identify the determining variables in predicting LBW babies based on the birth records of 500 mothers of singleton neonate’s resident in the Tamale metropolitan area of the Northern Region of Ghana from November 2010 to January 2011. The significant model coefficients were Gestation (p-value = 0.0008), Household size (p-value = 0.0160), Maternal food intake (p-value = 0.0002), Maternal health (p-value = 0.0000), Passive smoking (p-value = 0.0003) and Type of fuel used for cooking (p-value = 0.0418). A test of predictive ability of the model showed correct classifications of 93% for normal birthweight infants and 76.8% for low birthweight infants. The likelihood ratio and Nagelkerk R square tests showed a positive correlation between the predictors and LBW. Tampah-Naah M. Anthony et al., (2016), population-based study design and a cross-sectional study using data from the Ghana Multiple Indicator Cluster Survey 2011 on some selected maternal factors. A binary logistic regression model was generated to assess factors associated with low birthweight among mothers. Mothers with no education (Odds ratio = 0.566, 95% CI. = 0.349 – 0.919) were less likely to have children with low birthweight, and those not in union (OR = 1.698, 95% C.I. = 0.993 – 2.905) had a higher likelihood of giving birth to children with low 19 University of Ghana http://ugspace.ug.edu.gh birthweight. Maternal factors such as educational status and marital status showed to influence the birthweight of a child. 2.8 Conclusion The literature on birthweight is enormous and ever-expanding, not exclusively are there variables that have not been referenced, yet in addition, most of the indicators investigated are themselves intricately identified with any pregnancy. Additionally, the present examination does not talk about the significant influence of the clinical management of pregnancy. The goal of this section at this point has been to detail the primary factors influencing newborn weight and at any rate to allude to the literature wherein they have been examined. While the discoveries introduced may illuminate certain parts of the issues, the motivation behind this study is not to give an all-around acceptable predictive model of birthweight but instead to contend for and outline a way to deal with the statistical development of fitting birthweight model for Ghana. 20 University of Ghana http://ugspace.ug.edu.gh CHAPTER THREE METHODOLOGY 3.0 Introduction The analytical methods and their mathematical aspects are detailed in the chapter. Moreover, it represents a step forward for the analysis for the next chapter. 3.1 Data description 3.1.1 Type and Source of Data The study utilized secondary data from the 2014 Ghana Maternal Health Survey (GMHS), a nationally representative household survey to gather comprehensive information on 9,396 women age 15 to 49 in the country. 3.1.2 Sampling and Sampling Procedures Data for birthweight was recorded using a measurement scale (in grams). During data collection, birthweight was confirmed by way of a documented evidence on the child health card or verbally from the mother. The DHS data were weighted prior to analysis to reflect population representativeness. Figure 1 shows the stages of how the final dataset for analysis was arrived at. Of the 26,003 original participants in the dataset, birthweight was not documented on 20,122 of the infants. 543 participants did not know about birthweight, while data from 1,971 of the births were excluded because they were not weighed at birth. In my study, 3,361 records were available for analysis. 21 University of Ghana http://ugspace.ug.edu.gh Original dataset 26,003: Records available for individual recode file of GDHS Excluded from dataset ▪ 20,122: No documentation on birthweight ▪ 549: Mother does not know weight of infant ▪ 1971: Not applicable for the study due to child not weighed at birth Final dataset for analysis 3,361: Mother – Infant dataset for analysis Figure 1: Data Cleaning Procedure 3.1.3 Instrumentation and Operationalization Validated questionnaires per the protocol for undertaking a GDHS were used. Data generated using the women’s questionnaire were recoded into an individual record file, which was then used to select specific variables for the study. Of significant to note is that the women’s questionnaire contains both maternal and infant data. Understanding the measurement level on each of the variables helps to select the most appropriate statistical method to use when analyzing the data. The scales of measurement of data were nominal, ordinal, and continuous. Variables that are measured on an ordinal scale take on intrinsic ordering whereas on a nominal scale the variables are categorical and mutually exclusive. Data measured on a continuous scale allow for the use of advanced statistical analysis. (See Table 1) 22 University of Ghana http://ugspace.ug.edu.gh Table 1: Variables with the level of Measurement Variable Level of measurement Total Number of children ever had Ordinal Wealth index Education level Birth order (Parity) Size of the child Marital status Nominal Place of residence Region Ethnicity Fuel type for cooking Toilet facility type Gender of child Place of delivery Birth type Birthweight Continuous Mothers weight Mothers height Maternal age 3.1.4 Our Study assumptions Four key assumptions underpinned this study. These assumptions are necessary considering that data collection was cross-sectional. • First, the sensitivity of instruments used to weigh infants at birth was assumed to be the same across all health facilities in the country. • Second, as individuals from households either agreed, declined, or were unavailable at the household during the time of the survey, it was assumed that characteristics of all individuals, hence birth outcomes regardless of participation in the study were randomly distributed. • Third, the socio-economic situation of the mother as presented during data collection was assumed to be the same as at the entire duration of the pregnancy through to delivery. For 23 University of Ghana http://ugspace.ug.edu.gh example, if at the time of the survey the mother was using firewood or LPG as fuel for cooking in the household, the study assumed that the same form of energy was also used during the lifetime of the pregnancy. • Finally, the underlying influence of covariates was additive, each additional covariate contributing the same additional risk were assumed to be independent. 3.2 Application of Statistical Methods 3.2.1 Descriptive Analysis The study data collected needs to be analyzed systematically so that patterns of association can be identified. Here, we are concerned with describing the numerical distributions that characterize birthweight variables. The study considered dataset containing 18 variables with 3361 observations each. Three phases of analysis were done including descriptive (univariate) analysis and inferential (bivariate and multivariable) statistical approaches. • Univariate analysis Two approaches were used in the univariate analysis depending on the type of the variable being analyzed. Firstly, for continuous variables including age and weight measures of central tendency (mean and median) were calculated along with measures of distribution (standard deviation and range) to determine the distribution of the variables. These descriptive statistics were summarized and presented. Secondly, for each categorical variable, the univariate analysis was presented as a table of frequency distribution containing both the frequency and corresponding percentage. • Bivariate analysis The bivariate analysis involved cross tabulating each independent variable against birthweight classifications and comparing the proportion of mothers in the different level of each 24 University of Ghana http://ugspace.ug.edu.gh independent factor who had low birthweight delivery. The chi-square test is performed to test for significant associations between risk characteristics and birthweight classification. An alpha cut-off level of 0.05 would be used to determine statistically significant associations. • Multivariable analysis The multivariable analysis was conducted using regression models (Multiple Linear and Logistic) with birthweight delivery as the dependent variable. 3.2.2 Cross-tabulation for birthweight and birth size (2 x 2 Contingency table) This 2 × 2 contingency table is a useful and simple technique in the identification of statistical significance of the association (relationship) of two binary (dichotomous) characteristics usually coded as 1 for “Yes” or “Success” and 0 for “No” or “Failure.” With the help of this contingency table, we can easily understand the various steps of the analysis concepts of birthweight (per health card) and birth size (per mother’s recall). Table 2: Standard 2 × 2 Table contingency Table Gold Birthweight reported in terms of birth size Standard Low Normal Total (Small size) (Large Size) Low birthweight tp fp (<2500g) (True Positive) (False Positive) tp+fp Birthweight Normal birthweight fn tn reported (>2500g) (False Negative) (True Negative) fn+tn Total tp+fn fp+tn N=tp+fp+fn+tn From Table (2) above, the test can lead to the conclusion that: • True Positive (TP): Proportion of children having a birthweight below 2500 grams and reported as of small size. It is denoted by ‘tp’. • False Positive (FP): Proportion of children having a birthweight below 2500 grams but reported as of large size. It is denoted by ‘fp’. 25 University of Ghana http://ugspace.ug.edu.gh • False Negative (FN): Proportion of children having a birthweight above 2500 grams but reported as of small size. It is denoted by ‘fn’. • True Negative (TN): Proportion of children having a birthweight above 2500 grams and reported as of large size. It is denoted by ‘tn’. • Sensitivity is the ability of the test to correctly identify those who have the disease (tp) from all individuals with the disease (tp+fn). That is the strength of birth size to identify correctly children with low birthweight (LBW-Small size) as true under-recorded birthweight. It is calculated as: 𝑡𝑝 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 = { } = 𝑡𝑝 + 𝑓𝑛 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 • Specificity is the ability of the test to correctly identify those who do not have the disease (tn) from all individuals free from the disease (fp+tn). The strength of birth size to identify correctly children with normal birthweight as true under the recorded birthweight. It is calculated as: 𝑡𝑛 𝑇𝑟𝑢𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 = { } = 𝑓𝑝 + 𝑡𝑛 𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝑇𝑟𝑢𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 • Positive predictive value (PPV): The proportion of children of small size who remained low birthweight as per recorded birthweight. It is calculated as: 𝑡𝑝 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑃𝑃𝑉 = { } = 𝑡𝑝 + 𝑓𝑝 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 • Negative predictive value (NPV): The proportion of children of normal size who remained normal as per recorded birthweight. It is calculated as: 𝑡𝑛 𝑇𝑟𝑢𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑁𝑃𝑉 = { } = 𝑓𝑛 + 𝑡𝑛 𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 + 𝑇𝑟𝑢𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 26 University of Ghana http://ugspace.ug.edu.gh 3.2.3 Kappa Statistics and Fleiss An agreement between two is required when a diagnostic test is being compared with a Gold standard. What amount is the agreement among negative and positive results of the two tests must be answered? It is basic to evaluate the reproducibility of the test by many observers. At least two observers ought to independently assess the test results without accessing the data. The Kappa lie between −1 to +1 as most relationship insights do. Mathematically, kappa statistic is defined and calculated as 𝑝0 − 𝑝𝑒 𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝐴𝑔𝑟𝑒𝑒𝑚𝑒𝑛𝑡 (𝑂) − 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝐴𝑔𝑟𝑒𝑒𝑚𝑒𝑛𝑡(𝐸) 𝐾𝑎𝑝𝑝𝑎 = 𝑘 = = 1 − 𝑝𝑒 1 − 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝐴𝑔𝑟𝑒𝑒𝑚𝑒𝑛𝑡 (𝐸) 𝑡𝑝 + 𝑡𝑛 Where; 𝑅elative Observed Agreement = ( ) 𝑎𝑛𝑑 𝑛 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 = 𝑛 = 𝑡𝑝 + 𝑓𝑝 + 𝑓𝑛 + 𝑡𝑛 𝑡𝑝+𝑓𝑝 𝑡𝑝+𝑓𝑛 𝑓𝑝+𝑡𝑛 𝑓𝑛+𝑡𝑛 Therefore, Proportion of Chance Agreement = 𝐸 = [( ) ( ) + ( ) ( )] 𝑛 𝑛 𝑛 𝑛 Table 3: Interpretation of Cohen’s kappa & Fleiss Agreement Kappa statistics Agreement level Fleiss Agreement level 0–0.20 Slight <0.4 Poor 0.21 –0.39 Fair 0.4-0.75 Fair to Good 0.40 –0.59 Moderate >0.75 Excellent 0.60 –0.79 Substantial Above .80 Almost Perfect Fleiss’ kappa (after Joseph L. Fleiss, 1981) statistic has been used for a similar measure of agreement in categorical rating when there are more than two raters (Avijit Hazra, 2013). 27 University of Ghana http://ugspace.ug.edu.gh 3.2.4 The Chi-square 𝝌𝟐 test The Chi-Square test is frequently utilized for testing relationships between categorical variable. The null hypothesis states that no relationship exists on the categorical factors in the observation; they are independent. Here, the values of 𝜒2 are also calculated to check the association between birthweight and different predictors. The test statistic for the Chi-Square Test of Independence is computed as: 𝑛 (𝑂 − 𝑒 )2 χ2 𝑖 𝑖 = ∑ [ ] 𝑒𝑖 𝑖=1 𝑊ℎ𝑒𝑟𝑒: 𝑂𝑖 = 𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠 and 𝑒𝑖 = 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠 3.3 Statistical Model Building In general, by statistical models, we mean models of real systems obtained through empirical methods. Statistical models summarize the relationship between a dependent and an independent factor. Mathematically, the way response variable depends upon the values of certain explanatory variables may be explained using a general form, 𝑅𝑒𝑠𝑝𝑜𝑛𝑠𝑒 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 = 𝑆𝑦𝑠𝑡𝑒𝑚𝑎𝑡𝑖𝑐 𝑐𝑜𝑚𝑝𝑜𝑛𝑒𝑛𝑡 + 𝑅𝑎𝑛𝑑𝑜𝑚 𝑐𝑜𝑚𝑝𝑜𝑛𝑒𝑛𝑡 Where the systematic component summarizes how variability in a response variable is accounted for by explanatory variables and the random component summarizes deviation of the response values from the systematic component. Suppose there is a set of ‘n’ observed values of a response variable, then 𝑌 can be written as, 𝑦𝑖 = ηi + ξ𝑖 , 𝑖 = 1,2, … , 𝑛 Where ηI is the systematic component and ξ𝑖 is the random component. 28 University of Ghana http://ugspace.ug.edu.gh It is possible that ηI may be linear in unknown parameters, β, for example, η 2i = 𝛽0 + 𝛽1𝑥𝑖, ηi = 𝛽0 + 𝛽1𝑥1 + 𝛽2𝑥𝑖 , and ηi = 𝛽0 + 𝛽1𝑥1 + 𝛽2𝑥2𝑖 𝑜𝑟 𝑥 Nonlinear as in; ηi = 𝛽0 exp(𝛽1𝑥𝑖), η 𝑖 i = 𝑎𝑛𝑑 η = 𝛽 exp(𝛽 𝑥 + 𝛽 𝑥 ) {𝛽 +𝛽 𝑥 } i 0 1 1𝑖 1 2𝑖0 1 𝑖 3.3.1 Our study model building All statistical related leaning problems are built to minimize expected loss. Mathematically, the issue of determining birthweight is that of investigating a given set of elements, the one that predicts the infant weight in the most ideal manner. We minimize our risk factor in a circumstance where the joint distribution of our variables (dependent and independents) are unknown so to choose the best accessible indicators. Two main types of problems considered in this study: • Regression estimation • Classification In the regression estimation, our problem is to minimize the risk function with the squared error loss function, whiles the classification problem, we find for an indicator function that minimizes our misclassification error. 3.3.2 Linear Regression Model (Multiple) Linear models are those statistical models in which a series of parameters are organized as a linear combination. That is, inside the model, no parameter shows up as either a multiplier, divisor or exponent to some other parameter. The linear model is used to serve the following goals: (1) Modeling the relationship between variables, (2) Prediction of the target variable (forecasting) and (3) Hypothesis testing. The effect of some factor on a dependent variable may be affected by the presence of other factors because of redundancies or effect interactions (modifications). 29 University of Ghana http://ugspace.ug.edu.gh Consequently, to give a comprehensive study examination, it might be desirable to consider as many factors as possible and sort out which ones are most closely associated with the response variable. Now, we write our multiple regression model as 𝑛 𝑌 = 𝛽0 + ∑ 𝛽𝑗𝑋𝑗𝑖 + 𝜖𝑖 𝑜𝑟 𝑗=1 𝑌𝑖 =  + 𝛽1𝑋1𝑖 + 𝛽2𝑋2𝑖 + ⋯ + 𝛽𝑘𝑋𝑘𝑖 + 𝜖𝑖 , 𝑖 = 1,2,3, … , 𝑛 0 Where 𝑌𝑖 represent our dependent variable, the 𝑋’s are our independent variables, and  is the error term. We have a dependent factor and 𝑘𝑡ℎ independent factors excluding the intercept term. The inferential technique of a regression intensely relies upon the following assumptions: 1. Linearity and additivity of the relationship between the dependent and independent variables: a) The expected value of the dependent variable is a straight-line function of each independent variable, holding the others fixed. b) The slope of that line does not depend on the values of the other variables. c) The effects of different independent variables on the expected value of the dependent variable are additive. 2. The X’s are non-stochastic variables whose values are fixed. 3. The error has zero expected value: 𝐸( ) = 0 4. Homoscedasticity (constant variance) of the errors for all observations, that is 𝐸(2 ) = 2 , 𝑖 = 1, 2, … , 𝑛 5. Statistical independence of the errors. Thus, 𝐸(𝑖𝑗) = 0, for all 𝑖  𝑗. 6. Normality of the error distribution. 30 University of Ghana http://ugspace.ug.edu.gh Now, we can present our multiple regression model in matrix form as: 𝑌 = 𝑋 +  Where; 𝒚𝟏 𝜷𝟏 𝝐𝟏 𝟏 𝑿𝟏,𝟏 𝑿𝟏,𝟐 ⋯ 𝑿𝟏,𝒌 𝒚𝟐 𝜷𝟐 𝝐𝟐 𝟏 𝑿𝟐,𝟏 𝑿𝟏,𝟐 ⋯ 𝑿𝟐,𝒀 = [ ] 𝜷 = [ ] 𝝐 = [ ] and 𝑿 = [ 𝒌 ] ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋯ ⋮ 𝒚𝒏 𝜷 𝝐𝒌 𝒌 𝟏 𝑿𝒏,𝟏 𝑿𝒏,𝟐 ⋯ 𝑿𝒏,𝒌 The ordinary least square estimate of 𝑘 + 1 unknown parameters are obtained by minimizing the sum of squares errors ∑𝑛 2 ′ ′𝑖=1 𝑒𝑖 = 𝜖 𝜖 = ( 𝑌 − 𝑋) ( 𝑌 − 𝑋) 𝑦𝑖𝑒𝑙𝑑𝑖𝑛𝑔 ?̂? = (𝑋 ′𝑋)−1𝑋′𝑌. Moreover, we obtain 𝑉(?̂?) = 𝜎2(𝑋′𝑋)−1. For this model, the residuals then become: 𝜖?̂? = 𝑌𝑖 − ?̂?𝑖 = 𝑌 − ?̂?0 − ?̂?1𝑋1 − ?̂?2𝑋2 − ⋯ − ?̂?𝑘𝑋𝑘 ; 𝑖 = 1,2, … , 𝑛 𝑖 𝑖 𝑖 However, the Matrix 𝑊 = 𝑋(𝑋′𝑋)−1𝑋′ is known as a Leverage Matrix. An unbiased and consistent estimate of 𝜎2 is 𝑛 𝜖2?̂? 𝑆2 = ∑ (𝑛 − 𝑘 − 1) 𝑖=1 The estimated standard error of ?̂? 𝑖𝑠 𝑠 = √𝑆2𝑗 ?̂? 𝑉𝑗 𝑤ℎ𝑒𝑟𝑒 𝑉𝑗 𝑖𝑠 𝑡ℎ𝑒 𝑗 − 𝑡ℎ 𝑑𝑖𝑎𝑔𝑜𝑛𝑎𝑙 𝑒𝑙𝑒𝑚𝑒𝑛𝑡 𝑜𝑓 (𝑋 ′𝑋)−1 𝑗 ?̂?𝑗 −𝛽𝑗 When the errors are normally distributed, then ~𝑡 𝑠 𝑛−𝑘−1 ?̂?𝑗 31 University of Ghana http://ugspace.ug.edu.gh 3.3.2.1 Goodness of fit check in multiple regression We can use the 𝑅2 statistic as a measure of goodness of fit for the multiple regression model but the difficulty is that it does not account for the number of degrees of freedom. Although, we know that 𝑅𝑆𝑆 𝐸𝑆𝑆 ∑𝑛 2 2 𝑖=1 𝑒𝑖 𝑅 = = 1 − = 1 − 𝑇𝑆𝑆 𝑇𝑆𝑆 2∑𝑛𝑖=1(𝑌𝑖 − 𝑌) A natural solution is to use variances, not variations that help to define a corrected (adjusted) 𝑅2 as ∑𝑛 22 𝑖=1 𝑒𝑖 𝑛 − 1 𝑛 − 1 𝑅 = 1 − 2 = 1 − (1 − 𝑅 2) ∑𝑛 (𝑌 − 𝑌) 𝑛 − 𝑘 − 1 𝑛 − 𝑘 − 1𝑖=1 𝑖 A formal technique to test the goodness of fit by the multiple regression line can be developed using the 𝐴𝑁𝑂𝑉𝐴 𝑜𝑓 𝑌. At first, we set the null hypothesis that the overall regression is not significant in a sense that the 𝑘 explanatory variables considered in this model are not able to 𝑅𝑀𝑆 explain the response variable in a satisfactory way. We then compute the ratio that follows a 𝐸𝑀𝑆 𝐹 distribution with 𝑘 𝑎𝑛𝑑 𝑛 – 𝑘 – 1 degrees of freedom. If the calculated value of this ratio is greater than 𝐹𝑘,𝑛−𝑘−1,0.05, we reject the null hypothesis and conclude that the overall regression is significant at the 5% level of significance. Hence, for a multiple regression model, we have 𝑛 𝑛 2 2 2 𝑇𝑆𝑆 = ∑(𝑌𝑖 − 𝑌) = ∑ 𝑌 2 𝑖 − 𝑛𝑌 = 𝑌 ′𝑌 − 𝑛𝑌 𝑖=1 𝑖=1 2 𝑇𝑆𝑆 = 𝑅𝑆𝑆 + 𝐸𝑆𝑆 𝑤ℎ𝑒𝑟𝑒 𝑅𝑆𝑆 = ?̂?′𝑋′𝑌 − 𝑛𝑌 32 University of Ghana http://ugspace.ug.edu.gh Table 4: ANOVA Table for Linear Regression (Multiple) Component Sum of Squares Degree of Mean Sum of Squares F Statistics Freedom Regression RSS 𝑘 𝑅𝑆𝑆 𝑅𝑆𝑆 𝑅𝑀𝑆 = ~𝐹 𝑘 𝐸𝑆𝑆 𝑘,𝑛−𝑘−1 Error ESS 𝑛 − 𝑘 − 1 𝐸𝑆𝑆 𝐸𝑀𝑆 = 𝑛 − 𝑘 − 1 Total TSS 𝑛 − 1 3.3.2.2 Tests for Normality Statistical methods such as correlation, experimental design, and regression are altogether founded on one basic assumption, that the data sets follow normal (Gaussian) distribution. As such, there is an expectation that samples accumulated from the population are normally distributed. At the point when the data sets are not distributed normally, the associated chi-square tests are inaccurate and subsequently, the 𝑡 𝑎𝑛𝑑 𝐹 tests are not generally valid in finite samples. 3.3.2.2.1 Normal Q-Q Plot This graphical tool empowers us to evaluate if our data set plausibly originate from some theoretical distribution, for example, a Normal or exponential. Two sets of quantiles are plotted against each other in a scatterplot form, if both sets of quantiles originate from similar distribution, we should see our data points forming a line that looks roughly straight. 3.3.2.2.2 Shapiro-Wilk Test The Shapiro-Wilk (1965) test is based on the correlation of true observations and the expectation of normalized order statistics. The form of the test statistic is: 2 {∑ 𝑎𝑖𝜖(𝑖)} 𝑊 = ∑(𝜖 − 𝜖)2𝑖 33 University of Ghana http://ugspace.ug.edu.gh Where (𝑖) is the 𝑖 − 𝑡ℎ order statistics and 𝑎𝑖 is the 𝑖 − 𝑡ℎ expected value of normalized order statistics. 3.3.2.3 Unusual Observations in Linear Regression 3.3.2.3.1 Outlier in regression An outlier is a data point that deviates from the linear relationship determined from the other points, or possibly from the greater part of those points. 3.3.2.3.2 High Leverage Points According to Hocking and Pendleton (1983), ‘high leverage points are those for which the input vector 𝑥𝑖, in some sense, far from the rest of the data have an enormous impact on the distance from the center. To identify outliers, we should consider first looking at the residual plot of 𝑒𝑖 𝑣𝑒𝑟𝑠𝑢𝑠 𝑌𝑖 . Recall the property of residuals: 𝑒𝑖 = 𝑌𝑖 − ?̂?𝑖 ~ 𝑁(0, 𝜎 2(1 − ℎ𝑖𝑖)) Where 1 (𝑋𝑖 − ?̅?) 2 ℎ𝑖𝑖 = + , 𝑤ℎ𝑖𝑐ℎ 𝑖𝑠 𝑐𝑎𝑙𝑙𝑒𝑑 𝑡ℎ𝑒 𝐿𝑒𝑣𝑒𝑟𝑎𝑔𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑖𝑡ℎ 𝑐𝑎𝑠𝑒 𝑛 𝑆𝑥𝑥 3.3.2.4 Detecting of Outliers • Deleted Studentized (Externally Studentized or R-Student) To account for the different variances among residuals, we consider “studentizing” the residuals (that is dividing by an estimate of their standard deviation). It is defined as: 𝑒𝑖 𝑟𝑖 = ; 𝑖 = 1,2, … , 𝑛 √𝑀𝑆𝐸(1 − ℎ𝑖𝑖) 34 University of Ghana http://ugspace.ug.edu.gh Where 𝐸(𝑟𝑖) = 0 𝑎𝑛𝑑 𝑉𝑎𝑟(𝑟𝑖) = 1, so that the studentized residuals have a constant variance regardless of the location of the X’s. An absolute residual value exceeding three can be considered an outlier. 3.3.2.5 Detecting of Influential Observations • Cook’s Distance: Cook (1977) proposed the use of the distance measure known in statistical literature as Cook’s distance. We define the 𝑖 − 𝑡ℎ Cook’s distance as: (?̂? − ?̂?−𝑖 𝑇 ) (𝑋𝑇𝑋)(?̂? − ?̂?−𝑖) 𝐶𝐷 = ; 𝑖 = 1,2, … , 𝑛 (𝑘 + 1)?̂?2 Where ?̂?(−𝑖) is the estimated parameter of  𝑤𝑖𝑡ℎ 𝑡ℎ𝑒 𝑖 − 𝑡ℎ observation deleted. 3.3.2.6 Multicollinearity (Variance Inflation Factor and Tolerance) Linear regression assumes that there is little or no multicollinearity in the data. Multicollinearity occurs when the independent variables are too highly correlated with each other. Tolerance and Variance Inflation Factor (VIF) are computed in evaluating both our pairwise and multiple variable collinearities. A tolerance estimate nearing zero indicates that the variable is highly collinear with the other independent variables. Inversely, the variance inflation factor (VIF) is associated with 1 the tolerance estimate: 𝑉𝐼𝐹 = . Larger variance inflation factor estimates (a usual 𝑇𝑂𝐿𝐸𝑅𝐴𝑁𝐶𝐸 threshold of 10.0 that corresponds to a 0.10 tolerance) indicate a high degree of multicollinearity or collinearity among our predictor variables. 35 University of Ghana http://ugspace.ug.edu.gh 3.3.2.5.1 Corrections for Multicollinearity To remedy multicollinearity, the following steps can be adopted: • More data collection • Drop off insignificant variables • Use Ridge regression • Use Principal component regression 3.3.3 Logistic Regression Model Logistic regression is helpful for situations in which we need to anticipate (predict) the likelihood (for example, yes or no) of a result dependent on estimations of a lot of predictor factors. Its strategies utilize any of these three types of categorical dependent variables: binary, ordinary and nominal. Now, considering a simple 𝑘 variable regression model 𝑌𝑖 =  + 𝛽1𝑋1 + 𝛽𝑝𝑋𝑝 + ⋯ + 𝛽𝑗𝑋𝑗𝑖 0 where 𝑗 = 𝑝 + 1 can easily be generalized and expressed as: 1 (𝑋) = 𝑖 = 1,2, … , 𝑛 1 + 𝑒𝑥𝑝[−(𝛽 𝑘0 + ∑𝑗=1 𝛽𝑗𝑋𝑗𝑖)] Or, equivalently, 𝑘 (𝑋) ln = 𝛽0 + ∑ 𝛽𝑗𝑋𝑗𝑖 1 − (𝑋) 𝑗=1 This leads to the likelihood function 𝑛 𝑦𝑖 [𝑒𝑥𝑝(𝛽0 + ∑ 𝑘 𝑗=1 𝛽𝑗𝑋𝑗𝑖)] 𝐿((𝑋)) = ∏ 𝑦 = 0,1 1 + 𝑒𝑥𝑝(𝛽 + ∑𝑘 𝑖=1 0 𝑗=1 𝛽𝑗𝑋𝑗𝑖) Application of computer-packaged programs such as R, STATA, and SPSS can iteratively estimate our parameters. 36 University of Ghana http://ugspace.ug.edu.gh We would logically let, 0, 𝑖𝑓 𝑡ℎ𝑒 𝑖 − 𝑡ℎ 𝑢𝑛𝑖𝑡 𝑑𝑜𝑒𝑠 𝑛𝑜𝑡 ℎ𝑎𝑣𝑒 𝑡ℎ𝑒 𝑐ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑖𝑠𝑡𝑖𝑐 𝑦𝑖 = { 1, 𝑖𝑓 𝑡ℎ𝑒 𝑖 − 𝑡ℎ 𝑢𝑛𝑖𝑡 𝑑𝑜𝑒𝑠 𝑝𝑜𝑠𝑠𝑒𝑠𝑠 𝑡ℎ𝑒 𝑐ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑖𝑠𝑡𝑖𝑐 Here, the logistic regression model that specifies the probability of a birthweight depending on a set of n-explanatory variables 𝑋. Where (𝑋) represent the conditional probability that the low birthweight is present . 𝑃(𝑌 = 1|𝑋) 𝑎𝑛𝑑 𝑡ℎ𝑒 𝜃′𝑠 are the parameters representing the effects of the 𝑋′𝑛𝑠 on the risk (probability) of low birthweight. 3.3.3.1 Testing hypotheses of Multiple Logistic Regression When we have fit a multiple logistic regression model and our estimates are obtained for the various parameters of interest, we need to respond to questions concerning the contributions of different factors to the prediction of the binary response factor. 3.3.3.1.1 Overall Regression Tests An overall test for a model containing 𝑘 factors, say, 1 (𝑋) = 𝑖 = 1,2, … , 𝑛 1 + 𝑒𝑥𝑝[−(𝛽 + ∑𝑘0 𝑗=1 𝛽𝑗𝑋𝑗𝑖)] Therefore, our null hypothesis is stated as: ‘‘All 𝑘 independent variables considered together do not explain the variation in the responses.’’ In other words, 𝐻0: 𝛽1 = 𝛽2 = ⋯ = 𝛽𝑘 = 0 To test our null hypothesis, two likelihood-based statistics can be utilized with each following an asymptotic chi-square distribution with k degrees of freedom under 𝐻0. 1. Likelihood Ratio Test 𝜒2𝐿𝑅 = 2[ln 𝐿(?̂?) − ln(0)] 37 University of Ghana http://ugspace.ug.edu.gh 2. Score Test 𝑇 −1 𝛿 ln(0) 𝛿2 ln(0) 𝛿 ln(0) 𝜒2𝑆 = [ ] [ ] [ ] 𝛿𝛽 𝛿𝛽2 𝛿𝛽 3.3.3.1.2 Tests for a Single Variable The null hypothesis can be stated as; ‘‘Factor 𝑋𝑖 does not have any value added to the prediction of the response given that other factors are already included in the model.’’ In other words, 𝐻0: 𝛽𝑖 = 0 To test such a null hypothesis, we can perform a likelihood ratio chi-square test, with 1 degree of freedom: 𝜒2𝐿𝑅 = 2[ln 𝐿(?̂?; 𝑎𝑙𝑙 𝑋 ′𝑠) − ln 𝐿(?̂?; 𝑎𝑙𝑙 𝑜𝑡ℎ𝑒𝑟 𝑋′𝑠 𝑤𝑖𝑡ℎ 𝑋𝑖 𝑑𝑒𝑙𝑒𝑡𝑒𝑑)] 𝛽 A much easier alternative method is using 𝑧 = ?̂?𝑖 . Where 𝛽?̂? is the corresponding estimated 𝑆𝐸(?̂?𝑖) regression coefficient and 𝑆𝐸(𝛽?̂?) is the estimate of the standard error of 𝛽?̂?, both of which are printed by standard computer-packaged programs. 3.3.3.2 Wald Test Wald statistic is purposed in evaluating the estimate of individual regressors when different regressors are in the model. It is defined as: ?̂? ?̂?2𝑖 𝑊 = ~𝜒2 𝑖 1 𝑜𝑟 𝑎𝑛𝑜𝑡ℎ𝑒𝑟 𝑣𝑒𝑟𝑠𝑖𝑜𝑛 𝑜𝑓 𝑖𝑡 𝑎𝑠 𝑊 = ~𝜒 2 1 𝑠. 𝑒(?̂?𝑖) 𝑉(?̂?𝑖) It is based on large sample sizes, that is, the large-sample normality of parameter estimates. The Wald test can be used to determine a 100(1 − 𝛼)% confidence interval for ?̂?𝑖 to show that the true 38 University of Ghana http://ugspace.ug.edu.gh parameter lies in the interval with boundaries; ?̂?𝑖 ± 𝑍 𝛼1− (𝑆𝑒(?̂?𝑖)). Where 𝑍 𝛼1− is the critical value 2 2 for the two-sided normal distribution of size 𝛼. exp {?̂?𝑖 ± 𝑍 𝛼 (𝑆𝑒(?̂? ))} 1− 𝑖 2 𝐿𝑜𝑤𝑒𝑟 𝐿𝑖𝑚𝑖𝑡 = exp {?̂?𝑖 − 𝑍 𝛼 (𝑆𝑒(?̂?𝑖))} 𝑎𝑛𝑑 𝑈𝑝𝑝𝑒𝑟 𝐿𝑖𝑚𝑖𝑡 = exp {?̂?𝑖 − 𝑍 𝛼 (𝑆𝑒(?̂?𝑖))} 1− 1− 2 2 In terms of the odds ratio, confidence intervals are formed by finding exponents of the boundaries. 3.3.3.3 Logit Transformation For proper interpretation of the estimated coefficients, the expression given below 1 (𝑋) = 𝑖 = 1,2, … , 𝑛 1 + 𝑒𝑥𝑝[−(𝛽0 + ∑ 𝑘 𝑗=1 𝛽𝑗𝑋𝑗𝑖)] may be transformed as: 𝜆(𝑋) 𝑔(𝑋) = ln [ ] = (𝜃0 + 𝜃1𝑋1 + 𝜃2𝑋2 + ⋯ + 𝜃1 + 𝜆(𝑋) 𝑝 𝑋𝑝) The equation above is termed as logit transformation. The importance of this transformation is that 𝑔(𝑋) possesses many of the desirable properties of a linear regression model. The logit, 𝑔(𝑋), is linear in its parameters and may also range from − 𝑡𝑜 + , depending upon the range of values of 𝑋. 3.3.3.4 Interpretation of Coefficients Like in the case of linear regression, the estimated regression coefficients represent the slope or rate of change in log logit function of the dependent variable because of per unit change in the independent variable. In the logistic model, 𝜃 represents a change in the logit for a one-unit change in the covariates, X, that is, g(𝑋 + 1)– 𝑔(𝑋); where 𝑔(𝑋) is the logit transformation defined by the expression above. In case of consideration of only one dichotomous independent variable that 39 University of Ghana http://ugspace.ug.edu.gh is coded as 0 or 1, the values of the regression model may be expressed as described below in table 5. Table 5: Logit transformation Outcome Independent Variable (X) Variable (Y) X = 1 X = 0 exp(𝜃0 + 𝜃1) exp(𝜃0) 𝜆( ) ( )Y = 1 1 = 𝜆 1 = 1 + exp(𝜃0 + 𝜃1) 1 + exp(𝜃0) 1 1 1 − 𝜆(1) = 1 − 𝜆(1) = Y = 0 1 + exp(𝜃0 + 𝜃1) 1 + exp(𝜃0) Total 1.0 1.0 Odds of the outcome (Low birthweight) being present among individuals with 𝑋 = 1 is defined as 𝜆(1) . On the similar pattern odds of the outcome being present among individuals with 𝑋=0 is 1−𝜆(1) 𝜆(0) defined as . Therefore, the log of the odds, called logit, is defined as: 1−𝜆(0) 𝜆(1) 𝜆(0) 𝑔(1) = ln [ ] 𝑎𝑛𝑑 𝑔(0) = ln [ ] 1 − 𝜆(1) 1 − 𝜆(0) The odds ratio denoted by 𝜑 is defined as the ratio of the odds for 𝑋 = 1 to the odds for 𝑋 = 0: 𝜆(1) [ ] 1 − 𝜆(1) 𝜑 = 𝜆(0) [ ] 1 − 𝜆(0) The log of the odds ratio, termed as a log-odds ratio or log-odds is given by the equation: 𝜆(1) [ ] 1 − 𝜆(1) ln 𝜑 = ln = 𝑔(1) − 𝑔(0) 𝜆(0) [ ] 1 − 𝜆(0) 40 University of Ghana http://ugspace.ug.edu.gh Now, using the expressions for the logistic regression model as shown in Table 5, the odds ratio is: 𝑒𝜃0+𝜃1 1 ( 1 + 𝑒𝜃0+𝜃 ) ( ) 𝜃 +𝜃 1 1 + 𝑒𝜃0 𝑒 0 1 𝜑 = = = 𝑒𝜃1 𝑒𝜃0 1 𝑒𝜃0 ( ) ( ) 1 + 𝑒𝜃0 𝑒𝜃0+𝜃1 Hence, under the logistic regression analysis with a dichotomous independent variable, 𝜑 = 𝑒𝜃1 and the logit difference, or log-odds, is ln 𝜑 = ln 𝑒𝜃1 = 𝜃1. Therefore, the odds ratio approximates how much more likely (or unlikely) it is for the low birthweight to be present among those categories with 𝑋 = 1 than those among with 𝑋 = 0. 3.3.3.5 Goodness-of-Fit test in Logistic Regression We employ a test for Goodness of fit to determine how well the proposed model fits the data. A model is poorly fit if either the model’s residual variation is large or it does not follow the variability postulated by the model (Hallet, 1999). 3.3.3.5.1 𝑹𝟐 in Logistic Regression The standard regression theory tells us that: 2 1 𝐿(?̂?) 𝑙𝑛 = 𝑤ℎ𝑒𝑟𝑒 𝑙 = 𝑖𝑠 𝑡ℎ𝑒 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑅𝑎𝑡𝑖𝑜 𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝑠 1 − 𝑅2 𝐿(0) We can rewrite the above expression as: 2 − 𝐿(0) 𝑛 𝑅2 = 1 − [ ] 𝐿(?̂?) Since the likelihood function 𝐿(?̂?) is a product of probabilities, it follows that the value of the function must be less than or equal to 1. Thus, the maximum possible value for 𝑅2 is given by 2 𝑀𝑎𝑥(𝑅2) = 1 − [𝐿(0)]𝑛 41 University of Ghana http://ugspace.ug.edu.gh In linear regression ?̂? = 𝑌 ̅(for the null model). Similarly, in logistic regression, we would have ?̂? = 𝛾 for the null model, with  denoting the percentage of 1’s in data set. It follows that 𝑛 𝐿(0) = ∏ 𝑦 𝑦𝑖𝑖 (1 − 𝑦 𝑦𝑖 𝑛𝛾 𝑖) = 𝛾 (1 − 𝛾) 𝑛−𝑛𝛾 𝑖=1 Here, we can rewrite 2 𝑀𝑎𝑥(𝑅2) = 1 − [𝛾𝑛𝛾(1 − 𝛾)𝑛−𝑛𝛾]𝑛 In other words, the Cox-Snell 𝑅2 is a 𝑃𝑠𝑒𝑢𝑑𝑜 − 𝑅2 statistic and the ratio of the likelihoods reflect the improvement of the full model over the intercept-only model with a smaller ratio reflecting greater improvement. It is given by: 2 𝐿(𝑅) 𝑛 𝐶𝑜𝑥 − 𝑆𝑛𝑒𝑙𝑙 𝑅2 = 1 − [ ] 𝐿(𝐹) Where, 𝐿(𝑅) = 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑜𝑓 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡 𝑜𝑛𝑙𝑦 𝑚𝑜𝑑𝑒𝑙 𝐿(𝐹) = 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑜𝑓 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑒𝑑 𝑚𝑜𝑑𝑒𝑙 𝑛 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 The maximum possible value will be close to zero when our data is quite sparse. Nagelkerke 2 𝐿(𝑅) 𝑛 2 2 𝑅2 1−[ ]𝐿(𝐹) (1991) therefore suggests that 𝑅 be used, with 𝑅 = = which is known as 𝑀𝑎𝑥(𝑅2) 2 1−[𝐿(𝑅)]𝑛 adjusted 𝑅2. 3.3.3.5.2 Hosmer – Lemeshow Test Hosmer-Lemeshow goodness-of-fit is used for grouped data and it is defined as: 𝑔 2 (𝑂𝑖 − 𝑁𝑖?̂?𝑖) 𝑯𝑳𝑻 = ∑ 𝑁𝑖?̂? (1 − ?̂? )𝑖=1 𝑖 𝑖 42 University of Ghana http://ugspace.ug.edu.gh where 𝑔 is the number of groups, where 𝑁𝑖 is the total frequency of subjects in the 𝑖𝑡ℎ group, 𝑂𝑖 is the total frequency of event outcomes in the 𝑖𝑡ℎ group, and ?̂?𝑖 is the average estimated predicted probability of an event outcome for the 𝑖𝑡ℎ group. 3.3.4 Variable Selection for Model Building Our goal is to identify from numerous available indicators a small subset of factors that significantly relates to our response. In the process of identification, our wish is to avoid a large type I error (false positive). Such a goal can be accomplished by utilizing a methodology that enters or removes from a regression model one factor at a time according to a certain order of relative significance. a) Backward Elimination 1. Start with all the predictors in the model 2. Remove the predictor with the highest p-value greater than the alpha value 3. Refit the model and go to 2 4. Stop when all p-values are less than the alpha value b) Forward Selection 1. Start with no variables. 2. For all predictors not in the model, check their p-value if they are added to the model. Choosing the one with the lowest p-value less than the alpha value 3. Continue until no new predictors can be added. 3.3.5.2 Criterion-Based Procedures 𝐴𝑘𝑎𝑖𝑘𝑒’𝑠 𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝐶𝑟𝑖𝑡𝑒𝑟𝑖𝑜𝑛 (𝐴𝐼𝐶) = 𝑛 𝑙𝑛(𝑆𝑆𝐸) − 𝑛 𝑙𝑛 𝑛 + 2𝑝 𝐵𝑎𝑦𝑒𝑠𝑖𝑎𝑛 𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝐶𝑟𝑖𝑡𝑒𝑟𝑖𝑜𝑛 (𝐵𝐼𝐶) = 𝑛 𝑙𝑛(𝑆𝑆𝐸) − 𝑛 𝑙𝑛 𝑛 + 𝑝 𝑙𝑛 𝑛 𝑛 + 𝑝 𝐴𝑚𝑒𝑚𝑖𝑦𝑎′𝑠𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 𝐶𝑟𝑖𝑡𝑒𝑟𝑖𝑜𝑛 (𝐴𝑃𝐶) = 𝑆𝑆𝐸 𝑛(𝑛 − 𝑝) 43 University of Ghana http://ugspace.ug.edu.gh Where ‘n’ is sample size and ‘p’ is the number of regression coefficients in the model being evaluated including the intercept. 3.3.5 Developing the Screening tool The discrimination of a screening tool is a measure of how well a model categorizes individuals by order of risk, such that individuals are correctly identified as being high-risk or low risk. Usually, discrimination is measured by plotting the receiver operating characteristic (ROC) curve with the calculation of the Area Under the ROC curve (AUROC) or c-statistic. Using the proportion of positive observations that are correctly considered as positive and the proportion of negative observations that are erroneously considered as positive, a tradeoff graphic is plotted to demonstrate the rate at which you can accurately predict something with the rate of inaccurately predicting something. In the y-axis, true positive rate (TPR) or sensitivity is displayed and x-axis false positive rate (FPR) or 1 – specificity is displayed, the curve approaching 1 represents the best performance of the model. 3.4 Conceptual framework So as to decide the relationship between various factors and birthweight in Ghana, this conceptual framework was adopted. We hypothesized that birthweight of infants was likely going to be contributed by the following classes of factors, namely; social, demographics, mothers’ reproductive behavior, maternal healthcare and health status including nutritional status (mother and newborn) factors. These factors may influence neonate’s birthweight either directly or indirectly. Several factors which do not show direct associations with unfavorable birth outcomes contribute to these outcomes indirectly through intermediate factors. Based on the reviewed literature, the inter-relationship among various variables is shown in the flow chart below. 44 University of Ghana http://ugspace.ug.edu.gh Figure 2: Conceptual framework for studying determinants of Birthweight Maternal reproductive Factors: Socio-Economic and Demographic Parity, Delivery type and Total Factors: children ever born Age, Region, Residence, Religion, Ethnicity, Education, Wealth Index, Delivery Place and Occupation Antenatal Care Factors: Number of ANC Visits, Iron Supplement Intake and Maternal Anthropometry Tetanus Injections Height, Weight, BMI Index, Inadequate Fetal nutrition Birth type (Single or Multiple) Gender & S ize Child Birthweight o f Infant (kg) Source: Author (2019) 45 University of Ghana http://ugspace.ug.edu.gh CHAPTER FOUR DATA ANALYSIS AND DISCUSSION OF RESULTS 4.0 Introduction We report our findings in the study in order to ascertain which variables and to what degree are responsible for birthweight prediction. The chapter intends to fulfill the objectives of the study and focuses on findings by analyzing our prospective data gathered in the survey. Analyses in the univariate sense, bivariate sense, and multivariate sense were conducted to find out the proportion of birthweight (low or normal) including influential factors associated with birthweight. Data employed in the study are related to birthweight of infants incorporated in the pregnancy- related section of the individual questionnaire. As the study focus at determining significant risk factors with birthweight association, the covariates have been exclusively selected for consistency where possible. Further consideration was given to variables reported in works of literature to be associated with birthweight. With these two restraints, a limited number of covariates entered our model building process. 4.1 Birthweight Data Analysis Initially, we consider birthweight dataset containing 18 variables each with 3361 observations except for mother’s weight and height with missing values. The analysis was conducted in three phases incorporating the usage of descriptive (univariate) statistics and inferential (bivariate and multivariable) statistical methods. Details of analytical approaches used in the three phases are presented in chapter three. 46 University of Ghana http://ugspace.ug.edu.gh 4.1.1 Birthweight Distribution Using the data, the first variable to be considered is the birthweight. Table 6: Descriptive Statistics for Birthweight Std. Obs. Min Max Mean Median Std. Error Deviation Birthweight 3361 .800 5.500 3.137 3.133 0.01 .5928 Valid N 3361 In table 6, the mean birthweight for the study is 3.137kg with a standard error of 0.01kg and standard deviation of 0.593kg. The minimum birthweight recorded is 0.8kg while the maximum is 5.5kg. The median birthweight is 3.130kg which is approximately equal to the mean birthweight suggesting a possible nearing symmetric distribution. Looking at the overall data, there seems to be evidence for normality, vide Histogram (Figure 3), but there may be some extreme cases as indicated by the Boxplot (See Appendix 3). Figure 3: Histogram of Birthweight distribution (kg) From figure 3 above, a large cluster of data points seems to be found in the 2.50kg to 4.0kg range of birthweight. 47 University of Ghana http://ugspace.ug.edu.gh 4.1.2 Classification of Birthweight In the study, newborn classification based on birthweight was in two parts: Low birthweight (LBW) as birthweight less 2.5 kilograms (less 2,500g) and Normal birthweight (NBW) considered as birthweight above 2.499 kilograms. Table 7: Descriptive Statistics of birthweight-based classification Standard 95.0% Lower 95.0% Upper Count Mean Deviation Minimum Maximum CL for Count CL for Count 335 2.118 .293 .80 2.499 2.086 2.149 a. Low birthweight neonates Standard 95.0% Lower 95.0% Upper Count Mean Deviation Minimum Maximum CL for Count CL for Count 3026 3.25 .503 2.50 5.50 3.232 3.27 b. Normal birthweight neonates 4.2 Prevalence rate of Low birthweight (LBW) 4.2.1 WHO’s Recommendation The official definition of low birthweight is birthweight of less than 2500g (World Health Organization, 2004a). With the study’s aim to explore the prevalence statistics on low birthweight, of 3361 women, 9.9% delivered low birthweight infants. 4.2.2 Researcher’s Adjusted Prevalence It was observed from the DHS data there was birthweight heaping. With this kind of problem, the researcher employed a new birthweight prevalence method calculation. From the data, there were 194 infants who were reported to weigh exactly 2500g. Many of these infants may actually have a birthweight of less than 2500g and therefore should be classified as having low birthweight. The exclusion of these infants will have an influence over the estimated proportion of infants with low birthweight. Per the researcher’s analysis, of 3361 48 University of Ghana http://ugspace.ug.edu.gh infants, 529 births had low birthweight, which reports an adjusted low infant weight at 15.7%, a 5.8% increase in the WHO’s low birthweight recommendation. 4.3 Mother’s Reporting of Birth Size Reliability Analysis 4.3.1 How accurate is a mother at reporting child’s birthweight? One interesting question asked on the DHS is “the mother is asked for her assessment of the size of her baby at birth”. Size at the birth of a child has been utilized in certain studies as a proxy for birthweight (Ghosh, 2006; Magadi et at., 2007). However, the exact relationship existing between birthweight and perception of infant size is not yet established in Ghana. The point is to investigate whether a mother’s perception of the infant’s size gathered in a retrospective survey like Demographic and Health Survey is reliable and valid to be considered as a proxy for birthweight when missing. The sizes of a child at birth are recorded as ‘very small’, ‘smaller than average’, ‘average’, ‘large than average’ and ‘very large’. Recategorization of very small and smaller than average was recoded as “Small size at birth” and likewise average, larger than average and very large as “Large birth size”. These two new classifications formulated was compared to the birthweight reported to assess the perception of the mother on a child’s weight. Table 8: Birthweight groups against Size of child Crosstabulation Size of child Low (Small size) Normal (Large size) Total Low birthweight Count 178 157 335 % within groups 53.1% 46.9% 10.0% Normal birthweight Count 389 2637 3026 % within groups 12.9% 87.1% 90.0% Total Count 567 2794 3361 % within groups 16.9% 83.1% 100.0% Pearson chi2(1) = 348.9247 Pr = 0.000 49 University of Ghana http://ugspace.ug.edu.gh Chi-square test was utilized to assess if there exists an association between the size of child at birth and birthweight recorded, with this association observed at 5% alpha level. From the result, since our p-value (0.000) calculated is less than the significance level of (0.05), we must reject our null hypothesis and make the conclusion that the variables are associated. Hence, this result is evidence that birth size of neonates as reported by the mother might be a decent proxy of birthweight. (See Table 8) 4.3.2 Assessment of Birthweight verse Size of an infant at birth To ascertain our accuracy of a mother' s assessment of the size of a child, it was important to pass judgment on the size during childbirth against recorded birthweight. The aim is to discover whether a child who is declared as being normal size by the mother is indeed normal with reference to newborn child actual birthweight. Likewise, it is additionally imperative to perceive what number of individuals who are classified as being small would likewise be named being of low birthweight. If there is close agreement between low weight and small size of child assessment, at that point this may enable size during childbirth to be utilized in the calculation of birthweight estimates in Ghana. The following analyses were reported from the contingency table: • True Positive (TP): Proportion of children having a birthweight below 2.50 kg and reported as of small size is 53.1% • False Positive (FP): Proportion of children having a birthweight below 2.50 kg but reported as of large size is 46.9% • False Negative (FN): Proportion of children having a birthweight above 2.50 kg but reported as of small size is 12.9% 50 University of Ghana http://ugspace.ug.edu.gh • True Negative (TN): Proportion of children having a birthweight above 2.50 kg and reported as of large size is 87.1% • Sensitivity: The strength of birth size to identify correctly children with low birthweight (LBW- small size) as true under-recorded birthweight. It is calculated as: 178 𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 = = 0.314 ≈ 31.4% 178 + 389 • Specificity: The strength of birth size to identify correctly children with normal birthweight as true under-recorded birthweight. It is calculated as: 2637 𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 = = 0.944 ≈ 94.4% 157 + 2637 • Positive predictive value (PPV), that is our proportion of children of small size who remained low birthweight as per recorded birthweight. It is calculated as: 178 𝑃𝑃𝑉 = = 0.531 ≈ 53.1% 178 + 157 • Negative predictive value (NPV), that is our proportion of children of normal size who remained normal as per recorded birthweight. It is calculated as: 2637 𝑁𝑃𝑉 = = 0.87.1 ≈ 87.1% 389 + 2637 4.3.3 Calculation of Kappa Statistics From Table 9 presented below, the kappa statistics tells us that if each mother had made her determination randomly, we would expect two mothers to agree on 75.53% of their infants’ weight. In fact, they agreed on 83.75% of the child’s weight or 30.80% of the way between random agreement and perfect agreement. Moreover, the amount of agreement (Prob>Z = 0.000) which is less than our alpha value (0.05) indicates that the hypothesis of mothers making their 51 University of Ghana http://ugspace.ug.edu.gh determinations randomly can be rejected. Therefore, we conclude that mother’s perception of their neonate’s size is a true reflection of the child’s actual birthweight. Table 9: Kappa Statistics for the contingency table Agreement Expected Kappa Standard Err. Z Prob>Z Agreement 83.75% 76.53% 0.3080 0.0165 18.68 0.000 4.3.4 Reporting of Birthweight: (Mother’s Memory Recall or Health Card) For neonates with a recorded birthweight in our study, their reporting method was noted. These two reporting techniques are ‘mother's memory recall’ or ‘read directly from health card’. In Ghana, not long after childbirth when a newborn is weighed, a health card is issued to the mother containing all significant data with respect to the newborn including birthweight. The distribution of the birthweights related to the reporting method, either via card or from memory can be investigated to distinguish whether there is any distinction among the distribution of birthweight by the two techniques of reporting. The sample size, mean birthweights, their corresponding standard deviations and standard errors for each reporting technique. The study showed a higher mean birthweight for memory recall (3.2304kg ± 0.7292kg) as compared to mean weight on card reports (3.1114kg ± 0.5472kg). (See Appendix 1) The p-value (0.000) associated with the two-sample t-test utilized to test if there exists a difference between the mean birthweights by reporting type. The analysis showed a significant difference among the mean birthweights reported from health card against the mean birthweights from mothers recall at a significant difference of 5% level. Based on the result, it might be hypothesized 52 University of Ghana http://ugspace.ug.edu.gh that birthweights extracted from a health card should be more accurate than those birthweights extracted from memory recall. (See Appendix 2) 4.4 Bivariate analysis of study factors associated with birthweight The researcher starts by looking at the epidemiological factors affecting birthweight in the socio- demographic sense, physical characteristics of the mother, antenatal care availability, obstetric history as well as morbidities during pregnancy. At this point, before we utilize Multiple Linear and Logistic regression to make conclusions about the data, each of these characteristics is evaluated to establish relationships between the maternal indicators and birthweight. A crosstab will put in effect a preliminary explanatory analysis as significant covariates are categorical. Then after, to get an idea about the association between different explanatory variables and dependent variable, Chi-square test performed. 4.4.1 Factor One: Outcome of Pregnancy The outcome of the current pregnancy is studied to see the factors that may be influencing birthweight. The important variables are the sex of the newborn baby, birth size of neonates and the mode of delivery (either by cesarean). The distribution of birthweight regarding outcomes of pregnancy is presented below. 4.4.1.1 Gender of child 4.4.1.1.1 Analysis of Gender of a child Gender of a child is a binary variable, either male or female, defined by the biological sex of the child. Table 10: Tabulation of Gender of child Gender of child Obs. Mean Standard Dev. Min Max Male 1,749 3.195 0.596 0.8 5.5 Female 1,612 3.074 0.583 1.0 5.5 53 University of Ghana http://ugspace.ug.edu.gh Looking at Table 10, birthweight of male children is slightly higher compared to female children with respectively mean birthweight of 3.195kg ± 0.596kg and 3.074kg ± 0.583kg. Table 11: Descriptive Analysis for Gender of a child across birthweight classifications Low birthweight Normal birthweight Stand. Obs. Mean dev. Birthweight % Obs. Mean Stand. dev. Birthweight % Gender Female 187 2.13 .28 5.6% 1425 3.20 .49 42.4% of child Male 148 2.10 .31 4.4% 1601 3.30 .51 47.6% As already shown in Table 11, male infants constitute 52.0% (1749 infants) of the sample whereas female infants reported 48.0% (1612 infants). This means there are more male newborn children than female babies in the study. The minimum birthweight recorded for male infants is 0.8kg whereas the maximum is 5.5kg. Likewise, the minimum and maximum birthweights recorded for female infants are 1.0kg and 5.5kg respectively. Figure 4: Birthweight classification Bar plot according to gender Assessment of the group bar chart presented in figure 4 shows the category of low birthweight infants who (weighed less than 2.5kg at birth) is 335 (9.97%) while those with normal birthweight who (weighed above 2.499kg) is 2981 (90.03%) of the total sample size. 54 University of Ghana http://ugspace.ug.edu.gh Figure 5: Birthweight histogram across gender From the histogram plot above in Figure 5, we can see clearly that the levels in males are on average higher than the levels of female infants. 4.4.1.1.2 Association between birthweight classification and gender of the child H10: There exists no significant association between the gender of the child and newborn birthweight H11: There exists a significant association between Gender of child and newborn birthweight. Table 12: Chi-Square Tests for Gender of a child against birthweight classification Asymptotic Exact Sig. (2- Exact Sig. (1- Value Df Significance (2-sided) sided) sided) Pearson Chi-Square 9.208a 1 .002 Continuity Correctionb 8.861 1 .003 Likelihood Ratio 9.207 1 .002 Fisher's Exact Test .003 .001 Number of Valid cases 3361 55 University of Ghana http://ugspace.ug.edu.gh The result of the Chi-square test as presented in Table 12 was performed at 5% level of significance. The Pearson Chi-square significant value is reported as 0.002 with 1 degree of freedom. Therefore, the null hypothesis faces rejection and hence, a significant association between the gender of the child and newborn birthweight exist. It might be concluded that the gender of a child and newborn birthweight are not independent of each other. Table 13:Symmetric Measures for Gender of child and birthweight classification Value Approximate Significance Nominal by Nominal Phi .107 .002 Cramer's V .107 .002 Number of Valid cases 3361 In Table 13, measures of indexes of the agreement are utilized to evaluate the strength of our association. The value of this measure as per the Chi-square table is significant with 0.002 and the degree of association between these two variables is 10.7%. 4.4.1.2 Type of delivery by cesarean 4.4.1.2.1 Analysis of the type of delivery by cesarean In here, the “YES” group includes those deliveries conducted through cesarean section (delivery with the surgical procedure). It is seen that the average birthweight of infants delivered by cesarean section was higher than those infants who were delivered by normal mode (delivery outside of cesarean section) with respective mean birthweights of 3.225 kg ± 0.710kg and 3.122kg ± 0.569kg (See Table 14). Table 14: Tabulation of type of delivery by Caesarean Obs. Mean Standard Deviation Type of delivery by No 2867 3.122 .569 cesarean Yes 494 3.225 .710 56 University of Ghana http://ugspace.ug.edu.gh In Table 15, the distribution of the study about the type of delivery by cesarean is illustrated. There were 279 (8.3%) and 56 (1.7%) of low birthweight incidences compared to 2,588 (77.0%) and 438 (13.0%) of normal birthweight cases that gave births through cesarean section and normal modes of delivery respectively. Table 15: Tabulation of type of delivery by Caesarean Low birthweight Normal birthweight Type of delivery No Obs. 279 2588 2867 by cesarean % of Total 8.3% 77.0% 85.3% Yes Obs. 56 438 494 % of Total 1.7% 13.0% 14.7% Total Obs. 335 3026 3361 % of Total 10.0% 90.0% 100.0% 4.4.1.2.2 Association between birthweight and type of delivery by cesarean The hypothesis for the data shown in Table 15 for the Chi-square test is: 𝐻20: There exists no significant association between type of delivery by cesarean and newborn birthweight. 𝐻21: There exists a significant association between type of delivery by cesarean and newborn birthweight. Table 16: Chi-square tabulation for Type of delivery by cesarean and birthweight Asymptotic Significance (2- Exact Sig. (2- Exact Sig. (1- Value Df sided) sided) sided) Pearson Chi-Square 1.209a 1 .272 Continuity Correctionb 1.037 1 .309 Likelihood Ratio 1.171 1 .279 Fisher's Exact Test .290 .154 Number of Valid cases 3361 a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 49.24. b. Computed only for a 2x2 table 57 University of Ghana http://ugspace.ug.edu.gh In Table 16, the Chi-square test performed at a 5% level of significance. Here, the Pearson Chi- square significant value is 0.272 with 1 degree of freedom. Therefore, the null hypothesis fails rejection and hence, no significant association between type of delivery by cesarean and newborn birthweight exist. It might be concluded that the type of delivery by cesarean and newborn birthweight are independent of each other. Since there exists no association between type of delivery by cesarean and birthweight, the measure of the strength of association was not computed. 4.4.1.3 Size of the child at birth 4.4.1.3.1 Analysis of the size of the child at birth Size of an infant at birth is a nominal variable which took the form of the mother placing the child into one of five size categorizations: very small, smaller than average, average, larger than average or very large. This information was asked of newborn babies born in the last five years preceding the survey in order to minimize recall bias. Table 17: Descriptive analysis of Size of the child Obs. Mean Standard Deviation Size of Child Very small 91 2.447 .640 Smaller than average 303 2.693 .515 Average 749 3.010 .471 Larger than average 819 3.256 .510 Very large 393 3.631 .543 In table 17, with each perception of size classification, the mean birthweight was computed. The mean birthweight in each classification of size follows the expected trend within each classification. The mean birthweight in the very large classification is the heaviest recording of 3.631kg ± 0.543kg, with the mean in each subsequent classification being lighter than the preceding. Additionally, birthweight in the very small classification recorded the lowest with mean weight and standard deviation as 2.447kg ± 0.640kg. 58 University of Ghana http://ugspace.ug.edu.gh 4.4.1.3.2 Association between the type of delivery by Size of child and birthweight 𝐻30: There exists no significant association between the size of newborn and newborn birthweight. 𝐻31:: There exists a significant association between the size of newborn and newborn birthweight. Table 18: Chi-Square Tests for Size of child and birthweight Asymptotic Significance (2- Value df sided) Pearson Chi-Square 450.737a 4 .000 Likelihood Ratio 367.062 4 .000 Number of Valid cases 3361 a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 12.66. The results of the Chi-square test as presented in Table 18 was performed at 5% level of significance. The Pearson Chi-square significant value is 0.000 with 4 degrees of freedom. Therefore, the null hypothesis is rejected and hence, a significant association between Size of child and newborn birthweight exist. It might be concluded that the size of a child and newborn birthweight are not independent of each other. Table 19: Symmetric Measures for Size of child and birthweight classification Value Approximate Significance Nominal by Nominal Phi .366 .000 Cramer's V .366 .000 Number of Valid cases 3361 In Table 19, measures of indexes of the agreement are utilized to evaluate the strength of our association. The value of this measure is as per Chi-square table is significant with 0.000 and the degree of association between the two variables is 36.6%. 59 University of Ghana http://ugspace.ug.edu.gh 4.4.2 Factor Two: Socio-Economic and Demographic Factors 4.4.2.1 Mothers Age 4.4.2.1.1 Analysis of Mothers Age Mother’s age is a continuous measure of the age in years at the time of interview. It was calculated from a century month code (CMC) using the birth date and the interview date. In order to facilitate the analysis, it was later recorded from its earlier continuous state to a categorical variable using a five years interval of age as 15 to 19 years, 20 to 24 years, 25 to 29 years, 30 to 34 years, 35 to 39 years, 40 to 44 years and ≥ 45 years. Table 20: Descriptive Statistics of Mother Age N Minimum Maximum Mean Median Std. Deviation Mother Age 3361 15 49 30.36 30.00 6.551 Valid N (listwise) 3361 From Table 20, the mean age of women whose infants were studied is 30.36 (approximately 30) years ranging from 15 to 49 years with a median age of 30 years. Figure 6: Age distribution of mothers Figure 6 above exhibit that most of the births occurred between the age of 18 years to 40 years. 60 University of Ghana http://ugspace.ug.edu.gh Table 21: Descriptive statistics of Mothers Age by group and Birthweight Obs. Mean Standard Deviation Mother Age 15-19 127 2.978 .596 20-24 545 3.051 .593 25-29 901 3.123 .592 30-34 837 3.159 .591 35-39 621 3.204 .588 40-44 268 3.200 .554 45-49 62 3.180 .691 Table 21 above depicts the mean birthweight of infants across the various categories of age intervals. It is shown that the age interval of 15-19 years recorded the lowest mean weight of 2.978kg whiles the class with the highest birthweight recorded was the group of 35-39 years accounting for 3.204kg. Table 22: Descriptive and Crosstabulation for Mothers Age and Birthweight classification Low birthweight Normal birthweight Mother Age Group 15-19 Obs. 19 108 127 % of Total 0.6% 3.2% 3.8% 20-24 Obs. 67 478 545 % of Total 2.0% 14.2% 16.2% 25-29 Obs. 87 814 901 % of Total 2.6% 24.2% 26.8% 30-34 Obs. 84 753 837 % of Total 2.5% 22.4% 24.9% 35-39 Obs. 53 568 621 % of Total 1.6% 16.9% 18.5% 40-44 Obs. 20 248 268 % of Total 0.6% 7.4% 8.0% 45-49 Obs. 5 57 62 % of Total 0.1% 1.7% 1.8% Total Observation 335 3026 3361 % of Total 10.0% 90.0% 100.0% The majority 901(26.8%) of the mother’s range between the ages of 25-29 years. Also, there were 127(3.8%), 545(16.2%), 837(24.9%), 627(18.5%), 268(8.0%) and 62(1.8%) of mothers in the year 61 University of Ghana http://ugspace.ug.edu.gh groups 15-19, 20-24, 30-34, 35-39, 40-44 and ≥ 45 respectively (Table 24). The mean age of the study was observed at 30.36 ± 6.55 years old. Of the total observation of low birthweight infants recorded, 19(5.7%) of the observation were delivered by mothers in the age category 15-19 years, 67 (20.0%) were delivered by mothers in the age category 20-24 years, 87(26.0%) of low birthweight infants were delivered by women in the age category between 25-29 years, 84(25.1%) were delivered by mothers from the age category 30-34 years, 53 (15.8%) were delivered by mothers from the age category 35-39 years, 20 (6.0%) were delivered by mothers in the age category 40-44 years and 5(1.5%) low birthweight infants were delivered by women of age above 44 years. (See Table 22) 4.4.2.1.2 Association between newborn birthweight and maternal Age The hypothesis for the data shown in Table 22 for the Chi-square test is: 𝐻40:: There exists no significant association between mothers age and newborn birthweight. 𝐻41: There exists a significant association between mothers age and newborn birthweight. Table 23: Chi-Square Tests for Mothers age and birthweight Asymptotic Significance Value Df (2-sided) Pearson Chi-Square 10.461a 6 .107 Likelihood Ratio 10.079 6 .121 Number of Valid cases 3361 a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 6.18. In Table 23, the Pearson Chi-square test was performed at a 5% level of significance shows a significant value of 0.107 with 6 degrees of freedom. Therefore, the null hypothesis fails rejection and hence, no significant association among mother age and newborn birthweight exist. It might be concluded that maternal age and newborn birthweight are independent of each other. 62 University of Ghana http://ugspace.ug.edu.gh 4.4.2.2 Maternal Education 4.4.2.2.1 Analysis of maternal education Mother’s education is an ordinal categorical variable defining the highest educational level. Categories include primary, secondary, higher and no education. Table 24: Mean birthweight across Maternal Educational level Birthweight Obs. Mean Standard Deviation Educational Level No education 853 3.124 .563 Primary 622 3.132 .603 Secondary 1672 3.142 .605 Higher 214 3.165 .587 The mean weight descriptive analysis in table 24 shows that as the mother’s educational level goes up, so as the weight of the infant becomes higher. The proportion of the study population by maternal educational level showed that the majority 1,679 (49.7%) of the mothers had secondary level education. Whiles, there were 18.5%, and 6.4% of mothers that accomplished the educational level up to Primary and Tertiary respectively and 25.4% of the respondent has no education. (See below in Table 25) Table 25: Crosstab for mother education and birthweight Low birthweight Normal birthweight Educational No education Obs. 81 772 853 Level % of Total 2.4% 23.0% 25.4% Primary Obs. 59 563 622 % of Total 1.8% 16.8% 18.5% Secondary Obs. 174 1498 1672 % of Total 5.2% 44.6% 49.7% Higher Obs. 21 193 214 % of Total 0.6% 5.7% 6.4% Total Observation 335 3026 3361 % of Total 10.0% 90.0% 100.0% 63 University of Ghana http://ugspace.ug.edu.gh In Table 25, of the total number of low birthweight infants reported, 81 (24.2%) of low weight infants were delivered by mothers with no education, 59 (17.6%) of low birthweight were delivered by mothers with primary level education. The majority, 174 (51.9%) of low birthweight infants were delivered by women who have attained education to the secondary level whiles 21(6.3%) low birthweight infants were delivered by women with tertiary attainment of education. 4.4.2.2.2 Association between newborn birthweight and mother education The hypothesis for the data shown in Table 26 for the Chi-square test is: 𝐻80: There exists no significant association between maternal education and newborn birthweight. 𝐻81:: There exists a significant association between maternal education and newborn birthweight. Table 26: Chi-Square tests between Mother education and birthweight Value df Asymptotic Significance (2-sided) Pearson Chi-Square .738a 3 .864 Likelihood Ratio .738 3 .864 Number of Valid cases 3361 a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 21.33. The results of the Chi-square test as presented in Table 26 was performed at 5% level of significance and obtained a calculated significant value of 0.864 with 3 degrees of freedom. Therefore, the null hypothesis fails rejection and hence, no significant association existed between mother education and newborn birthweight. It might be concluded that mother education and newborn birthweight are independent of each other. 64 University of Ghana http://ugspace.ug.edu.gh 4.4.2.3 Maternal Region 4.4.2.3.1 Analysis of Region of respondent The region is a categorical variable with 10 categories which aligns with the administrative regions of Ghana. Table 27: Mean birthweight across various Regions Standard Obs. Mean Deviation Region Ashanti 436 3.158 .586 Brong Ahafo 406 3.123 .555 Central 325 3.123 .607 Eastern 261 3.153 .716 Greater Accra 358 3.213 .594 Northern 274 3.012 .572 Upper East 387 3.064 .546 Upper West 297 2.951 .463 Volta 302 3.325 .598 Western 315 3.231 .618 From Table 27, the region with the highest mean infants’ weight was Volta with 3.231kg ± 0.618kg whiles the region that recorded the lowest birthweight was Upper West accounting for 2.951kg ± 0.463kg. Brong Ahafo and Central region had the same mean weight of 3.123kg but a varied standard deviation of 0.555kg and 0.607kg respectively. Table 28: Region and Birthweight Crosstabulation Low birthweight Normal birthweight Region Ashanti Obs. 44 392 436 % of Total 1.3% 11.7% 13.0% Brong Ahafo Obs. 44 362 406 % of Total 1.3% 10.8% 12.1% Central Obs. 29 296 325 % of Total 0.9% 8.8% 9.7% Eastern Obs. 34 227 261 % of Total 1.0% 6.8% 7.8% 65 University of Ghana http://ugspace.ug.edu.gh Greater Accra Obs. 24 334 358 % of Total 0.7% 9.9% 10.7% Northern Obs. 37 237 274 % of Total 1.1% 7.1% 8.2% Upper East Obs. 43 344 387 % of Total 1.3% 10.2% 11.5% Upper West Obs. 36 261 297 % of Total 1.1% 7.8% 8.8% Volta Obs. 19 283 302 % of Total 0.6% 8.4% 9.0% Western Obs. 25 290 315 % of Total 0.7% 8.6% 9.4% Total Obs. 335 3026 3361 % of Total 10.0% 90.0% 100.0% Figure 7: Birthweight distribution across various Regions To graphically illustrate the birthweight across various regions, a normal distribution was performed. From figure 7 above, we could see that except for Western (having multimodal distribution), the distribution of birthweight across various regions seems to be similar. 66 University of Ghana http://ugspace.ug.edu.gh 4.4.2.3.2 Association between Region and newborn birthweight The hypotheses for the data shown in table 28 for the Chi-square test is: 𝐻50: There exists no significant association between region and newborn birthweight. 𝐻51: There exists a significant association between region and newborn birthweight. Table 29: Chi-Square Tests across regions and birthweight Asymptotic Significance Value df (2-sided) Pearson Chi-Square 19.629a 9 .020 Likelihood Ratio 20.183 9 .017 Number of Valid cases 3361 a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 26.01. The results of the Chi-square test as presented in Table 29 was performed at 5% level of significance. It obtained a significant value of 0.020 with 9 degrees of freedom. Therefore, the null hypothesis is rejected, indicating a significant association between mother region and newborn birthweight exist. It might be concluded that region and newborn birthweight are not independent of each other. Table 30: Symmetric Measures for regions and birthweight Approximate Value Significance Nominal by Nominal Phi .076 .020 Cramer's V .076 .020 Number of Valid cases 3361 From table 30, measures of indexes of the agreement are utilized to evaluate the strength of our association. The value of this measure is as per Chi-square table is significant with 0.020 and the degree of association between two variables is 7.60%. 67 University of Ghana http://ugspace.ug.edu.gh 4.4.2.4 Type of Residence 4.4.2.4.1 Analysis of Type of residence Type of residence is a binary variable that defines the residential place as either rural or urban setting. It is seen from the table that the average birthweight of infants delivered at urban cities was higher than those infants who were delivered at the rural areas with respective mean weights of 3.145kg ± 0.595kg and 3.128kg ± 0.590kg. Table 31: Tabulation of Type of residence Obs. Mean Standard Deviation Type of Residence Rural 1634 3.128 .590 Urban 1727 3.145 .595 The distribution of subjects by residence indicates that 1634 (48.6%) of respondents were living in the rural whiles 1727 (51.4%) in the urban. For low birthweight and normal birthweight, 4.9% against 5.0% and 43.7% against 51.4% were living in rural and urban cities respectively (Table 31). Table 32: Crosstab of Type of residence and birthweight Low birthweight Normal birthweight Type of Rural Obs. 166 1468 1634 Residence % of Total 4.9% 43.7% 48.6% Urban Obs. 169 1558 1727 % of Total 5.0% 46.4% 51.4% Total Obs. 335 3026 3361 % of Total 9.9% 90.1% 100.0% Table 32 depicts that, there is a 4.9% and 5.0% prevalence rate of low birthweight according to the type of maternal residence. 68 University of Ghana http://ugspace.ug.edu.gh 4.4.2.4.2 Association between newborn birthweight and type of residence The hypotheses for the data shown in table 32 for the Chi-square test is: 𝐻70: There exists no significant association between the type of residence and newborn birthweight. 𝐻71: There exists a significant association between type of residence and newborn birthweight. Table 33: Chi-Square Tests type of residence and birthweight Asymptotic Significance (2- Exact Sig. (2- Exact Sig. (1- Value Df sided) sided) sided) Pearson Chi-Square .130a 1 .718 Continuity Correction .092 1 .761 Likelihood Ratio .130 1 .718 Fisher's Exact Test .730 .381 Number of Valid cases 3361 The Chi-square test was performed at a 5% level of significance. Our Pearson Chi-square significant value is 0.718 with 1 degree of freedom. Therefore, the null hypothesis fails rejection and thus indicating that there is no significant impact of the distribution of maternal residence on fetal birthweight. It might be concluded that mother residence and newborn birthweight are independent of each other. (See table 33) 4.4.2.4.3 Finding distribution of Birthweight across Region and Type of Residence In trying to establish the birthweight distribution across regions and the type of residence, Figure 8 below depicts several distributions of which some regions have higher means of birthweight. Furthermore, it shows that the mean birthweight across regions with respect to the respondents living in rural areas have higher birthweight means as compared to the people living in urban cities. 69 University of Ghana http://ugspace.ug.edu.gh Figure 8: Distribution of birthweight across region and type of residence 4.4.2.5 Mother Religion 4.4.2.5.1 Analysis for Mothers Religion Religion is a categorical nominal variable with three (3) divisions. Regarding religion, majority of the study are Christians constituting 2494(74.2%), followed by Islam, Other with respectively frequencies and percentages as 717(21.3%) and 150(4.5%) as displayed in Table 34 below. Table 34: Religion and Birthweight Crosstabulation Low birthweight Normal birthweight Religion Christianity Obs. 234 2260 2494 % of Total 7.0% 67.2% 74.2% Islam Obs. 87 630 717 % of Total 2.6% 18.7% 21.3% Other Obs. 14 136 150 % of Total 0.4% 4.0% 4.5% Total Obs. 335 3026 3361 % of Total 10.0% 90.0% 100.0% 70 University of Ghana http://ugspace.ug.edu.gh Further, table 37 indicates that women in Ghana who are Christians experience 7.0% likelihood of given birth to a low weight infant in comparison to the 2.6% Islam and 0.4% Other religion women. 4.4.2.5.2 Association between mother religion and birthweight The hypotheses for the data shown in Table 34 for the Chi-square test is: 𝐻90: There exists no significant association between mother recode religion and birthweight. 𝐻91: There exists a significant association between mother recode religion and birthweight. Table 35: Chi-Square Tests for mother recode religion and birthweight Asymptotic Significance Value df (2-sided) Pearson Chi-Square 20.645 2 .008 Likelihood Ratio 22.091 2 .005 Number of Valid cases 3361 Our Chi-square test was performed at a 5% level of significance as presented in Table 35. The Pearson Chi-square significant value is 0.008 with 2 degrees of freedom. Therefore, the null hypothesis is rejected. Hence, a significant association between mother religion and newborn birthweight exist. It might be concluded that mother religion and newborn birthweight are not independent of each other. Table 36: Symmetric Measures of Religion and birthweight Approximate Value Significance Nominal by Nominal Phi .078 .008 Cramer’s V .078 .008 Number of Valid cases 3361 In Table 36, measures of indexes of the agreement are employed to evaluate the strength of our association. The value of this measure is as per Chi-square table is significant with 0.008 and the degree of association between two variables is 7.80%. 71 University of Ghana http://ugspace.ug.edu.gh 4.4.2.6 Maternal Wealth Index 4.4.2.6.1 Analysis of Wealth index Wealth quintile is presented as five ordinal categories. Regarding Demography and Health Survey, wealth is calculated as “a composite measure of a household's cumulative living standard” regarding possession of some selected assets including “televisions and bicycles; materials used for housing construction; and types of water access and sanitation facilities.” This data is used to determine a wealth index score, which is then broken into quintiles and presented in a separate variable. Table 37: Mean weight of infants across the Wealth Index Birthweight Obs. Mean Standard Deviation Wealth index Poorest (Lowest) 783 3.042 .573 Poorer (Second) 593 3.145 .562 Middle 686 3.199 .615 Richer (Fourth) 662 3.142 .606 Richest (Highest) 637 3.174 .594 The distributions in Table 37 depict the degree to which mean birthweight is distributed (evenly or unevenly) by wealth index. From the study, the group with the lowest mean infants’ weights are women who are classified as poorest (3.042kg ± 0.573kg) whiles the Middle-income earners of women had the highest mean birthweight of 3.199kg ± 0.615kg. This indicates a 4.91% increase in mean weight of the poorest against the middle-class women of the wealth index distribution. Respectively, the following mean infants’ weight was recorded for poorer, richer and richest (3.145kg ± 0.562kg, 3.142kg ± 0.606kg and 3.174kg ± 0.594kg). 72 University of Ghana http://ugspace.ug.edu.gh Table 38: Crosstabulation Wealth index and Birthweight group Low birthweight Normal birthweight Wealth index Poorest Obs. 99 684 783 (Lowest) % of Total 2.9% 20.4% 23.3% Poorer Obs. 50 543 593 (Second) % of Total 1.5% 16.2% 17.6% Middle Obs. 60 626 686 % of Total 1.8% 18.6% 20.4% Richer Obs. 69 593 662 (Fourth) % of Total 2.1% 17.6% 19.7% Richest Obs. 57 580 637 (Highest) % of Total 1.7% 17.3% 19.0% Total Observation 335 3026 3361 % of Total 10.0% 90.0% 100.0% With reference to Table 38, the proportion of the combined wealth index for the poorer and the poorest was 40.9% (n = 1,376) whiles the proportion of the combined wealth index of the middle, richer and richest constitute 61.9% (n = 1985) of the population. This illustrates that 40.9% of the population is below the middle wealth quintile. The prevalence of low birthweight among children born to mothers in the poorest (lowest) and fourth household quintiles of wealth (each above 20%) was higher than for those in the second quintile (14.9%), middle quintile (17.9%) and highest quintile (17.0%). 4.4.2.6.2 Association between newborn birthweight and household wealth index The hypotheses for the data shown in Table 38 for the Chi-square test is; 𝐻100: There exists no significant association between wealth index and newborn birthweight. 𝐻101: There exists a significant association between wealth index and newborn birthweight. 73 University of Ghana http://ugspace.ug.edu.gh Table 39: Chi-Square Tests for Wealth index and Birthweight Asymptotic Significance Value df (2-sided) Pearson Chi-Square 9.838a 4 .043 Likelihood Ratio 9.542 4 .049 Number of Valid cases 3361 a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 59.11. Our Chi-square test was performed at a 5% level of significance and presented in Table 39. The Pearson Chi-square significant value is 0.043 with 4 degrees of freedom. Therefore, the null hypothesis is rejected. Hence, a significant association exists between wealth index and newborn birthweight. It might be concluded that mother wealth index and newborn birthweight are not independent of each other. Table 40: Symmetric Measures for Wealth index and birthweight Approximate Value Significance Nominal by Nominal Phi .054 .043 Cramer's V .054 .043 Number of Valid cases 3361 In Table 40, measures of indexes of the agreement are employed to evaluate the strength of our association. The value of this measure is as per Chi-square table is significant with 0.043 and the degree of association between these two variables is 5.40%. 4.4.3 Factor Three: Maternal Anthropometry To get an initial feel for the relationships between the anthropometry variables - and, in particular, between mother height or weight and birthweight, it was interesting to observe the scatterplots produced by plotting each variable against all others, as well as the specific distribution of birthweight values within each level of mother anthropometry characteristics. 74 University of Ghana http://ugspace.ug.edu.gh 4.4.3.1 Mothers Height Mother height was taken subjectively from the mother as a continuous variable. Obtained data was recorded in the questionnaire as centimeters. Table 41: Descriptive Statistics for Mothers Height Obs. Minimum Maximum Mean Std. Mothers height 3361 139.20 184.60 159.30 4.16 Valid N (listwise) 3361 From the data, the minimum height was 139.2 centimeters whiles the maximum height was 184.6 centimeters with a mean height of 159.30 centimeters ± 4.16 centimeters. (See Table 41) Figure 9: Simple Scatter with Fit Line of Birthweight by Mothers height It is clear from the scatterplot above in Figure 9 that the birthweight does not depend on mother height very strongly, however, there seems to be a slight association between the two. 4.4.3.2 Mothers weight Mother weight was likewise taken subjectively from the mother as a continuous variable. Obtained data was recorded in the questionnaire as kilograms. 75 University of Ghana http://ugspace.ug.edu.gh Table 42: Descriptive Statistics for Mother weight Obs. Minimum Maximum Mean Standard dev. Mothers weight 3361 38.90 159.40 63.60 9.70 Valid N (listwise) 3361 From the data, the minimum weight was 38.90kg whiles the maximum weight was reported as 159.40kg with a mean weight of 63.60kg ± 9.70kg. (See Table 42) Figure 10: Simple Scatter with Fit Line of Birthweight by Mothers weight Likewise, it is clear from the scatterplot above in Figure 10 that the birthweight does not depend on mother weight very strongly, however, there seems to be a slight association between the two. 4.4.4 Factor Four: Maternal Reproductive Factors 4.4.4.1 Birth order (Parity) 4.4.4.1.1 Analysis of Birth order Birth order is a discrete-continuous variable corresponding to the numerical order in that the infant of interest was born. In order to facilitate our analysis, mothers were divided as primipara (single 76 University of Ghana http://ugspace.ug.edu.gh pregnancy) and multipara (2 or more pregnancy) such as 2 to 4, 5 to 8 birth order and above 8 birth order. Regarding Table 43 below, the proportion of low birthweight occurring within the first birth order is 3.5% (n = 118) whiles the proportion of the combined birth order above first child is reported as 6.4% (n = 217) of the population. Table 43: Crosstab for Birth order (Parity) and birthweight Low birthweight Normal birthweight Birth order 1 Obs. 118 777 895 % of Total 3.5% 23.1% 26.6% 2-4 Obs. 161 1627 1788 % of Total 4.8% 48.4% 53.2% 5-8 Obs. 52 581 633 % of Total 1.5% 17.3% 18.8% Above 8 Obs. 4 41 45 % of Total 0.1% 1.2% 1.3% Total Obs. 335 3026 3361 % of Total 10.0% 90.0% 100.0% 4.4.4.1.2 Association between newborn birthweight and birth order The hypotheses for the data shown in Table 43 for the Chi-square test is: 𝐻110: There exists no significant association between birth order (parity) and newborn birthweight. 𝐻111: There exists a significant association between birth order (parity) and newborn birthweight. Table 44: Chi-Square Tests for Birth order (Parity) and birthweight Asymptotic Significance Value df (2-sided) Pearson Chi-Square 14.394a 3 .002 Likelihood Ratio 13.724 3 .003 Number of Valid cases 3361 77 University of Ghana http://ugspace.ug.edu.gh The results of the Chi-square test as presented in Table 44 with a 5% level of significance. Our Pearson Chi-square significant value is 0.002 with 1 degree of freedom. Therefore, the null hypothesis is rejected. Hence, a significant association exists between birth order of infant and newborn birthweight. It might be concluded that birth order and newborn birthweight are not independent of each other. Table 45: Symmetric Measures for Birth order (Parity) and birthweight Approximate Value Significance Nominal by Nominal Phi .065 .002 Cramer's V .065 .002 Number of Valid cases 3361 In Table 45, measures of indexes of the agreement are employed to evaluate the strength of our association. The value of this measure is as per Chi-square table is significant with 0.002 and the degree of association between two variables is 6.50%. 4.4.4.2 Total number of children ever had or born 4.4.4.2.1 Analysis of Total Children ever had The total children ever born is a measure of the children born alive to a woman which is an attribute of the counting of unit ‘person’. In this study, the variable was applied to women 15 years of age and over but excluded mothers with no child (none). However, it used a standard single-level classification with thirteen categories. Table 46: Tabulation of Total children ever had against Birthweight classification N Minimum Maximum Mean Std. Deviation Total Children ever born 3361 1 13 3.23 1.943 Valid N (listwise) 3361 78 University of Ghana http://ugspace.ug.edu.gh Figure 11: Bar Plot of Total Children ever born/had Because of huge disparity among the counts of the different total children ever born, the above bar plot may not be very informative. Looking at Figure 11 above, birthweights across various total children ever had shown that at categories above three (3) children, the frequencies seem to decrease since fewer women give birth to the extent. Hence, the study adopted different levels of granularity for total children ever born. The following categories were considered: Small size (1 to 2 children), Moderate size (3 to 5 children) and Large size (children above 5). Figure 12: Boxplot of Birth and Total children ever born 79 University of Ghana http://ugspace.ug.edu.gh The new stratification followed by boxplots lets us see the distribution of each group now has a similar distribution but with slightly different medians. Moreover, we can see that there are points on both the high and the low ends of the variable that appears to be outliers. (See Figure 12) Table 47: Crosstabulation of Total Children ever born and Birthweight group Low birthweight Normal birthweight Total Children ever Large Obs. 42 400 442 born % of Total 1.2% 11.9% 13.2% Moderate Obs. 128 1342 1470 % of Total 3.8% 39.9% 43.7% Small Obs. 165 1284 1449 % of Total 4.9% 38.2% 43.1% Total Obs. 335 3026 3361 % of Total 10.0% 90.0% 100.0% 4.4.4.2.2 Association between newborn birthweight and total children ever had or born The hypotheses for the data shown in Table 47 for the Chi-square test is: 𝐻120: There exists no significant association between children ever born and newborn birthweight. 𝐻121: There exists a significant association between children ever born and newborn birthweight. Table 48: Chi-Square Tests for total children ever born and birthweight Asymptotic Significance Value df (2-sided) Pearson Chi-Square 5.962a 2 .039 Likelihood Ratio 5.939 2 .039 Number of Valid cases 3361 a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 44.06. The result of the Chi-square test was performed at a 5% level of significance as presented in Table 48. The Pearson chi-square significant value is 0.039 with 2 degrees of freedom. Therefore, the 80 University of Ghana http://ugspace.ug.edu.gh null hypothesis is rejected. Hence, a significant association between children ever born and newborn birthweight exist. It might be concluded that children ever born and newborn birthweight are not independent of each other. Table 49: Symmetric Measures for total children ever had/born and birthweight Value Approximate Significance Nominal by Nominal Phi .065 .039 Cramer's V .065 .039 Number of Valid cases 3361 Measures of indexes of the agreement are employed to evaluate the strength of our association. The value of this measure is as per Chi-square table is significant with 0.039 and the degree of association between two variables is 6.50%. (See Table 49) 4.4.4.3 Delivery place 4.4.4.3.1 Analysis for Delivery place Choice of delivery place might be influenced for by several reasons based on respondent such geographical destination (region and type of residence), socio-economic status (such as wealth index) of the respondent and many more. Table 50: Mean Tabulation of delivery place Obs. Mean Standard Deviation Delivery place Government facility 2795 3.126 .586 Respondent Home 274 3.159 .645 Maternity home 48 3.215 .532 Other 9 2.944 .611 Private facility 235 3.237 .609 81 University of Ghana http://ugspace.ug.edu.gh In Table 50, the facility with the highest mean birthweight was recorded at the private facility with 3.237kg ± 0.609kg whiles the facility that recorded the lowest mean birthweight of infants were delivery that happened at “Other” facilities accounting for 2.944kg ± 0.532kg. Table 51: Delivery place and Birthweight Crosstabulation Normal Low birthweight birthweight Delivery place Government facility Obs. 281 2514 2795 % of Total 8.4% 74.8% 83.2% Home Obs. 30 244 274 % of Total 0.9% 7.3% 8.2% Maternity home Obs. 2 46 48 % of Total 0.1% 1.4% 1.4% Other Obs. 1 8 9 % of Total 0.0% 0.2% 0.3% Private facility Obs. 21 214 235 % of Total 0.6% 6.4% 7.0% Total Obs. 335 3026 3361 % of Total 10.0% 90.0% 100.0% Table 51 tells us that, majority of births occurred in the government facility accounting for 2795 (83.2%) of the study, followed by 244 (8.2%) of births happening at respondent home. 4.4.4.3.2 Association between Delivery place and Birthweight The hypotheses for the data shown in Table 51 for the Chi-square test is: 𝐻130 There exist no significant association between delivery place and newborn birthweight. 𝐻131: There exists a significant association between delivery place and newborn birthweight. 82 University of Ghana http://ugspace.ug.edu.gh Table 52: Chi-Square Tests for Delivery place and Birthweight Asymptotic Significance Value Df (2-sided) Pearson Chi-Square 2.409a 4 .011 Likelihood Ratio 2.865 4 .011 Number of Valid cases 3361 a. 2 cells (20.0%) have expected count less than 5. The minimum expected count is .90. Our Chi-square test was performed at a 5% level of significance with the results presented in Table 52. The Pearson Chi-square significant value is 0.011 with 4 degrees of freedom. Therefore, the null hypothesis is rejected. Hence, there is a significant association existing between the delivery place and newborn birthweight. It might be concluded that the delivery place and newborn birthweight are dependent on each other. Table 53: Symmetric Measures for delivery place and birthweight Value Approximate Significance Nominal by Nominal Phi .065 .039 Cramer's V .065 .039 Number of Valid cases 3361 Measures of indexes of the agreement are employed to evaluate the strength of our association. The value of this measure is as per Chi-square table is significant with 0.011 and the degree of association between two variables is 6.50%. (See Table 53) 4.4.4.4 Preceding birth interval 4.4.4.4.1 Analysis of Preceding birth interval Here, preceding birth interval (inter-pregnancy interval) is considered as the interval between two consecutive pregnancy. 83 University of Ghana http://ugspace.ug.edu.gh Table 54: Descriptive Statistics for Preceding birth interval (Months) N Min. Max. Mean Median Std. Deviation Preceding birth interval 2451 12 217 49.82 42.00 28.437 (Months) Valid N 2451 Invalid N (Missing data) 901 Within the context of this study, the corresponding mean and median birth intervals are 49.82 months (approximately 50 months or 4 years and 2 months) and 42.00 months (approximately 3 years and 6 months) respectively. The minimum birth interval was 12 months (exactly a year) whiles its highest birth interval went as far as 217 months (more than 18 years). (See Table 54) Figure 13: Preceding Birth Interval in Months Figure 13 above gives a description of a fluctuated (move or sway in a rising and falling or wavelike pattern) trend of birth interval. In order to facilitate the analysis, the preceding pregnancy interval is divided into 4 groups; less than 24 months, 24 – 72 months, 73 – 120 months and interval above than 120 months. Mothers with the first parity group are considered as not applicable (included as part of missing data). 84 University of Ghana http://ugspace.ug.edu.gh Table 55: Tabulation of Preceding birth interval and birthweight Obs. Mean Std. Preceding birth interval *** 910 3.035 .599 Less 24 Months 252 3.190 .570 24-72 Months 1805 3.177 .587 73-120 Months 315 3.150 .605 Above 120 Months 79 3.163 .559 *** Missing data Table 55, in comparing the mean birthweight across the various categories of preceding pregnancy interval, the highest birthweight was recorded at an inter-pregnancy interval less than 24 months and the lowest birthweight recorded at an inter-pregnancy interval between 73-120 months with respective birthweight means of 3.190 ± 0.570 and 3.150 ± 0.605. Table 56: Crosstab of Preceding birth interval and Birthweight LBW NBW Preceding birth *** Obs. 127 783 910 interval % of Total 3.8% 23.3% 27.1% Less 24 Months Obs. 18 234 252 % of Total 0.5% 7.0% 7.5% 24-72 Months Obs. 149 1656 1805 % of Total 4.4% 49.3% 53.7% 73-120 Months Obs. 37 278 315 % of Total 1.1% 8.3% 9.4% Above 120 Months Obs. 4 75 79 % of Total 0.1% 2.2% 2.4% Total Observation 335 3026 3361 % of Total 10.0% 90.0% 100.0% *** Missing data In Table 56, majority of the respondents 1805(53.7%) gave birth between 24-72 months while less than 24 months birth interval, 73-120 months and above 120 months accounted for the following respectively percentages [252(7.5%), 315(9.4%) and 79(2.4%)]. Of 53.7% of birth occurring between 24-72 months, the study results showed that 4.4% of infants born had low birthweight. In other to easily facilitate the analysis, the mean preceding birth interval (months) is 49.8 which is 85 University of Ghana http://ugspace.ug.edu.gh approximated as 50 months were replaced by the missing value which likewise falls within the highest distribution of birth the preceding months. 4.4.4.4.2 Association between newborn birthweight and preceding birth interval The hypotheses for the data shown in Table 56 for the Chi-square test is; 𝐻140: There exists no significant association between the preceding birth interval and newborn birthweight. 𝐻141: There exists a significant association between Preceding birth interval and newborn birthweight. Table 57: Chi-Square Tests for Preceding birth interval and birthweight Asymptotic Significance Value df (2-sided) Pearson Chi-Square 27.500a 4 .000 Likelihood Ratio 26.817 4 .000 Number of Valid cases 3361 a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 7.87. The Pearson Chi-square test was performed at a 5% level of significance as presented in Table 57. Its significant value is 0.000 with 4 degrees of freedom. Therefore, our null hypothesis is rejected. Hence, a significant association between the preceding birth interval and newborn birthweight exist. It might be concluding that preceding birth interval and newborn birthweight are dependent on each other. Table 58: Symmetric Measures for Preceding birth interval and birthweight Approximate Value Significance Nominal by Nominal Phi .090 .000 Cramer's V .090 .000 Number of Valid cases 3361 86 University of Ghana http://ugspace.ug.edu.gh Measures of indexes of the agreement are employed to evaluate the strength of our association. The value of this measure is as per Chi-square table is significant with 0.000 and the degree of association between two variables is 9.0%. (See Table 58) 4.4.4.5 Birth Type 4.4.4.5.1 Analysis of Birth type The birth type was categorized into two, namely single birth and multiple births (the delivery of more than one infant in a single birth event). Table 59 below shows that infants from single births are much less likely than those from multiple births to have low birthweight. The mean birthweight of a singleton child is 0.6kg more than multiple birth infant. Table 59: Descriptive statistics for Birth type and Birthweight Obs. Mean Standard Deviation Birth type Multiple birth 167 2.521 .679 Single birth 3194 3.169 .570 In the study, (86)2.6% of 167(5.0%) of infants from multiple births had low birthweight as compared to 249(7.4%) of 3194(95%) singleton births. Moreover, the likelihood of having low birthweight when an infant is from multiple births is 51.50% as compared to normal birthweight of 48.50%. (See Table 60 below) Table 60: Birth type and Birthweight group Crosstabulation Low birthweight Normal birthweight Birth type Multiple Births Obs. 86 81 167 % of Total 2.6% 2.4% 5.0% Single birth Obs. 249 2945 3194 % of Total 7.4% 87.6% 95.0% Total Observation 335 3026 3361 % of Total 10.0% 90.0% 100.0% 87 University of Ghana http://ugspace.ug.edu.gh 4.4.4.5.2 Association between newborn birthweight and birth type The hypotheses for the data shown in Table 60 for the Chi-square test is; 𝐻150: There exists no significant association between birth type and newborn birthweight. 𝐻151: There exists a significant association between birth type and newborn birthweight. Table 61: Chi-Square Tests for Birth type and birthweight Asymptotic Significance (2- Exact Sig. (2- Exact Sig. (1- Value Df sided) sided) sided) Pearson Chi-Square 337.747a 1 .000 Continuity Correctionb 332.895 1 .000 Likelihood Ratio 200.260 1 .000 Fisher's Exact Test .000 .000 Number of Valid cases 3361 a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 16.65. b. Computed only for a 2x2 table Our Pearson Chi-square test was performed at a 5% level of significance as presented in Table 61. It obtained a significant value of 0.000 with 1 degree of freedom. Therefore, the null hypothesis is rejected. Hence, a significant association between birth type and newborn birthweight exist. It might be concluding that birth type and newborn birthweight are dependent on each other. Table 62: Symmetric Measures for Birth type and Birthweight Value Approximate Significance Nominal by Nominal Phi .317 .000 Cramer's V .317 .000 Number of Valid cases 3361 In Table 62, Cramer V, one of the measures of indexes of the agreement is employed to evaluate the strength of association. The value of this measure is as per Chi-square table is significant with 0.000 and the degree of association between two variables is 31.7%. 88 University of Ghana http://ugspace.ug.edu.gh 4.5 Multivariate Analysis of Predictors and Birthweight The present study has various strengths compared with past researches conducted on the roles of maternal characteristics and outcome of a newborn on birthweight. This is a population-based that utilizes 2014 GDHS survey data that included important information on women who uses both traditional and modern healthcare. The study is expected to identify best indicators related to birthweight than past researches that depended on clinical data. 4.5.1 Towards Building a Multiple regression model 4.5.1.1 Step Zero: Understanding the Problem Understanding the factors that can influence the baby weight is an important question. To answer such a broad question, we will begin by looking at the shape of the distribution of the baby’s weights. It will allow us to argue whether a parametric or a non-parametric model is the best for this dataset. Given this histogram it is quite hard to estimate if the data is truly normally distributed, thus we can draw the corresponding Q-Q plot. 89 University of Ghana http://ugspace.ug.edu.gh Here, we can notice that the observations are quite aligned on the line which means that the sample quantiles correspond to the quantile of a theoretical normal distribution (see figure). Figure 14: QQ-Plot for Longitudinal birthweight (kg) 4.5.1.2 Step One: Data Splitting After, we will first split the data into a train and test set to be able to compare our models on the test error after building our model with the train set. A major objective of this procedure is to identify an algorithm f(x) that most precisely predicts future values (y) in view of a set of inputs (x). As it were, we need an algorithm that fits well to our past data as well as more critically, one that predicts a future result precisely and accurately. This is known as the generalizability of our algorithm. • Training set: these data are utilized to prepare our algorithms. • Test set: these data are utilized to estimate our prediction error (generalization error) after chosen our final model. 90 University of Ghana http://ugspace.ug.edu.gh To provide an accurate understanding of the generalizability of our final optimal model, the study split our longitudinal dataset into 70% training and 30% test using a randomized sampling from the caTool package in R software. 4.5.1.3 Step Two: Variables Selection Approaches In multiple linear regression modeling, there are several methods of selection that determine how many independent variables can enter inside the analysis. By applying different selection methods, several regression models from the same set of indicators are formulated. We would like to try a different linear model to make a prediction on the weight of a newborn. In order to decide which predictors to choose and assess the relative importance of independent variables, three variable selection method was employed to yield the most appropriate regression equation with birthweight as the dependent variable. • The Ordinary Least Squares (Backward selection) • The Ridge regression • The Lasso regression The point of this selection is to decrease our set of predictor variables to those that are vital and represent about as a significant part of the variance by the total set. This will assist us in determining the degree of significance of every predictor variable. Step Three: Multiple Regression model development (1) RegModel 0: Baseline model The baseline model uses the mean birthweight of an infant in its prediction. It is the solution to the problem without applying any machine learning techniques. This means that any model we build must improve upon our solution. Since we are predicting birthweight (which is a quantitative variable), the study considered the average of all birthweights in our dataset. The prediction for 91 University of Ghana http://ugspace.ug.edu.gh child weight according to the baseline model is 3.137kg. Additionally, this resulted in an RMSE of 0.5873. So, any model we build should have an RMSE lower than 0.5873. (2) Advanced Models We evaluated by fitting train data to a multiple regression model. We considered both Ordinary Least Squares (OLS) and Regularized regression approaches: Ridge and Least Absolute Shrinkage and Selection Operator (LASSO). RegModel 1: Ordinary Least Squares Model (Backward Selection) First, a model of birthweight regressed on all other variables was fit; then, an iterative pruning procedure based on removing terms from the model and evaluating the impact on the AIC (Akaike's Information Criterion) was employed. We applied the backward selection method on the training set and compared the indicators of the AIC’s. Finally, the AIC indicator converged in the idea that we should only consider 10 predictors in the model. Now, we measure the quality of a linear regression using the predictors selected by the backward selection method. Based on the summary, we note that our linear regression model is decent as almost all predictors have less than 0.05 significance values. However, our R-Square is small with a value of 0.36, that is only 36% of the variance of the data is explained by our model. After, we applied our model on the test set in order to test its performance. It was noted that this linear model has a low accuracy on our test set with an RMSE of 0.4772. Application of Shrinkage methods (Regularized Regression) We further tried two shrinkage methods to verify the selected predictors. In both cases (Ridge and Lasso), we tested 100 different tuning parameter lambdas to find the one that minimizes the error on 10 folds cross-validation. 92 University of Ghana http://ugspace.ug.edu.gh RegModel 2: Ridge Regression We applied the ridge regression on the training set to select our predictors using their coefficients. The coefficient indicator converges in the idea that we should only consider 10 predictors in our analysis. Moreover, for the ridge regression, we minimize our RMSE for a tuning parameter of 0.0321. We then perform the ridge regression on the full training set to compute the optimal coefficients. Finally tested our model on the test dataset and obtained an RMSE of 0.47726. RegModel 3: Lasso Regression A Lasso regression on the training set was applied and compared the coefficients of the predictors. The coefficient indicator likewise converged in the idea that we should only consider 10 of the 18 predictors in our analysis. Moreover, the Lasso regression minimized our RMSE for a tuning parameter of 0.02924. Thereafter performed the Lasso regression on the full training set to compute the optimal coefficients. Finally tested our model on the test dataset and obtained an RMSE of 0.4837441. Step Four: Model Selection Evaluation All the above models were evaluated using a standard regression error metric called the Root Mean Square Error (RMSE). Table 63: Variable selection for Multiple Regression Model building Model 0 Model 1 Model 2 Model 3 Baseline OLS Regression Ridge Regression Lasso Regression Model (Backward Selection) Number of variables Entered Null 10 10 10 • Region • Region • Region • Gender of child • Gender of child • Gender of child • Delivery place • Delivery place • Delivery place • Children ever born • Children ever born • Children ever born Variables • Birth type • Birth type • Birth type Selected • Birth order • Birth order • Birth order • Mothers weight • Mothers weight • Mothers weight 93 University of Ghana http://ugspace.ug.edu.gh • Mothers height • Mothers height • Mothers height • Wealth index • Wealth index • Wealth index • Size of child • Size of child • Size of child RMSE 0.5873 0.47720 0.47726 0.48374 From Table 63, it can be inferred that all models (OLS, Ridge, and Lasso) have smaller Root Mean Squared Errors than the baseline model. It could also be seen that these Root Mean Squared Errors don't look very different from other variable selections, which means it's probably better to choose a more interpretable one. However, it appears the "Best Model" is the OLS backward selection technique. Considering this, the study adopted the OLS backward selection predictors in the modeling building. Table 64: Summary of Model: OLS Backward Selection Model Likelihood Ratio Test Discrimination Indexes Obs. 2355 LR chi2 1066.72 R-Square 0.364 Sigma0 4474 Df 29 R-Square Adj. 0.356 Df 2325 Pr (> chi2) 0.0000 G 0.403 Residuals Analysis Mininum 1st Quartile Median 3rd Quartile Maximum -1.60931 -0.29305 -0.03294 0.25982 2.17720 Coefficients of the Multiple Regression Model Estimate Std. Error t value Pr(>|t|) (Intercept) 1.353231 0.394974 3.426 0.000623 *** RegionBrong Ahafo 0.027643 0.041144 0.672 0.501739 (NS) RegionCentral 0.026571 0.043117 0.616 0.537791 (NS) RegionEastern 0.106046 0.046011 2.305 0.021266 * RegionGreater Accra 0.188214 0.042338 4.446 0.0001 *** RegionNorthern -0.085306 0.046320 -1.842 0.065653 (NS) RegionUpper East 0.122830 0.044967 2.732 0.006352 ** RegionUpper West -0.048152 0.046670 -1.032 0.302299 (NS) 94 University of Ghana http://ugspace.ug.edu.gh RegionVolta 0.289806 0.044773 6.473 <0.0001 *** RegionWestern 0.067039 0.043183 1.552 0.120691 (NS) Gender_of_childMale 0.069121 0.019835 3.485 0.000502 *** Delivery_placeGovernment hospital 0.027651 0.025106 1.101 0.270840 (NS) Delivery_placeMaternity home 0.151639 0.082981 1.827 0.067769 (NS) Delivery_placeOther 0.069177 0.100287 0.690 0.490394 (NS) Delivery_placePrivate hospital, clinic 0.120785 0.044236 2.730 0.006372 ** Delivery_placeRespondent's home 0.023122 0.039968 0.579 0.562977 (NS) Delivery_typeYes 0.098489 0.030092 3.273 0.001080 ** Children_ever_bornModerate -0.038881 0.031816 -1.222 0.221808 (NS) Children_ever_bornSmall -0.145925 0.032875 -4.439 <0.0001 *** Birth_typeSingle birth 0.576971 0.045826 12.590 <0.0001 *** Mothers_weight 0.002764 0.001129 2.448 0.014425 * Mothers_height 0.005810 0.002553 2.276 0.022953 * Wealth_indexPoorer -0.077610 0.033197 -2.338 0.019481 * Wealth_indexPoorest -0.047433 0.035224 -1.347 0.178241 (NS) Wealth_indexRicher -0.047822 0.031996 -1.495 0.135149 (NS) Wealth_indexRichest -0.089551 0.034818 -2.572 0.010174 * Size_of_childLarger than average 0.234356 0.024583 9.533 <0.0001 *** Size_of_childSmaller than average -0.288376 0.032863 -8.775 <0.0001 *** Size_of_childVery large 0.584341 0.031019 18.838 <0.0001 *** Size_of_childVery small -0.548516 0.053502 -10.252 <0.0001 *** NS=Not significant *p-value <0.05 **p-value <0.01 ***p-value <0.001 Step Five: Checking Model Assumptions a. Checking the linearity and Constant Variance Assumption 1. Fitted versus Residuals Plot Our most helpful instrument will be a Fitted versus Residuals Plot to validate both the linearity and constant variance assumption. In here, we search for two things in the plot. At any fitted value, the mean of the residuals ought to be approximately 0. If so, the linearity assumption is valid and 95 University of Ghana http://ugspace.ug.edu.gh at each fitted estimate, the spread of the residuals ought to be generally equivalent. If so, the constant variance assumption is valid. Figure 15: Fitted Verse Residuals for the Multiple Linear Model fit The plot showed in figure 15 demonstrates the scatterplot of standardized residuals against the fitted. Finding uncovered from our graphical representation shows no obvious difficulties. For any fitted value, the residuals appear to be roughly centered around 0. A random pattern is obvious in the plot, this is a good sign of the linearity assumption not violated. Notwithstanding, we likewise observe clearly, that for larger fitted estimates, the spread (dispersion) of the residuals is larger which is terrible portraying a violation in constant variance assumption. 2. Breusch-Pagan Test While the fitted versus residuals plot gave us a thought regarding homoscedasticity, sometimes a more formal test is preferable. There are numerous constant variance tests, however, the study employed the Breusch-Pagan Test with our null and alternative considered as: 96 University of Ghana http://ugspace.ug.edu.gh – 𝐻0: Homoscedasticity (About the true model, the errors have constant variance) – 𝐻1: Heteroscedasticity (About the true model, the errors have non-constant variance) Studentized Breusch-Pagan test Data Breusch-Pagan df p-value Model 82.533 29 <0.00001 For our model, we see a small p-value, so we reject the null of homoscedasticity indicating that a violation of constant variance assumption. This matches our finding with a fitted versus residuals plot. b. Test for Independence Assumption Durbin-Watson test for autocorrelation was calculated. If our residuals are autocorrelated, the regression results can be inaccurate and reliable. The Durbin-Watson test estimates lie between 0 to 4. 1. 2 means no autocorrelation 2. >2 to 4 is negative autocorrelation 3. 0 to <2 is positive autocorrelation In our Durbin-Watson test, our null hypothesis (𝐻0) is that there is no first-order autocorrelation with an alternate hypothesis (𝐻1) stating the existence of first-order correlation. Lag Autocorrelation D-W Statistic p-value 1 -0.0371087 2.074037 0.084 Alternative hypothesis: rho! = 0 In the study case, p>0.05 (0.084), hence failed to reject our null hypothesis. There is no indication of the presence of autocorrelation in the residuals and our independence assumption is satisfied. 97 University of Ghana http://ugspace.ug.edu.gh c. Normality assumption using QQ-Plot In order to evaluate our normality assumption, the normal Q-Q plot of regression standardized residuals is derived. Be that as it may, if it shows up roughly normal, then we will accept the errors could, in fact, be normal. Looking at the figure above, it displays a speculated Q-Q plot. We would most likely not accept the errors comes from a normal distribution. Therefore, more formal testing known as the Shapiro–Wilk test was computed. Shapiro-Wilk normality test Data W p-value Model residuals 0.97436 <0.0001 The table above provides us an estimation of the test statistic of 0.97436 with its p-value (<0.0001). The null hypothesis assumes the study data were sampled from a normal distribution; thus, a small p-value in here demonstrates that just a small likelihood of our study data could have been sampled from a normal distribution. Both the Q-Q plot and our Shapiro-Wilk test suggests normality assumption violation. 98 University of Ghana http://ugspace.ug.edu.gh d. Multicollinearity of our predictor's checking Multicollinearity assumption check Tolerance VIF Predictor Region .977 1.024 Gender of child .998 1.002 Delivery Place .997 1.003 Delivery type .935 1.070 Children ever born .955 1.047 Birth type .969 1.032 Wealth index .937 1.067 Mothers weight .850 1.177 Mothers height .876 1.142 Size of child .991 1.009 None of our variance inflation factor value is greater than 10.0, and our tolerance values indicate that collinearity does not explain greater than 10 percent of any predictor variable's variance. Our birthweight model problem indicates no evidence of significant collinearity. Step Six: Unusual Observations Notwithstanding checking multiple linear regression assumptions, the study additionally examines any unusual data points in our trained dataset. Frequently few data points could have an extreme effect on our regression model, occasionally to such an extent that there is a violation in our regression model as a result of these points. 1. Leverage An observation with high leverage is a data point that could have a large influence during our model fitting. In doing our leverage analysis, data points 2145 and 3066 were found. 2. Outliers To identify outliers, the study looked for observations with large residuals. With an added point, the study then calculates each of the residuals and standardized residuals. Since our sample size 99 University of Ghana http://ugspace.ug.edu.gh (n=2353) is large, the standardized residuals greater than 2 in magnitude was used to identify “large” residuals. In the analysis of the standardized residuals, we realized that 128 added points have large standardized residual. Hence, we considered to remove these points and recheck our model for improvement. The figure above indicated that after knocking out these influential data points (leverages and outliers), findings display a much better normal Q-Q plot and now Shapiro-Wilk rejects for a high p-value. Likewise, the removal of the influential points, the homoscedasticity assumption using the Breusch-Pagan test was rechecked. See table below Shapiro-Wilk normality test Data W p-value Model residuals 0.99782 0.08 For the fixed model (after influential data points removed), we see a large p-value indicating a failure to reject our null hypothesis of homoscedasticity, hence our constant variance is assumed (not violated). 100 University of Ghana http://ugspace.ug.edu.gh Studentized Breusch-Pagan test Data Breusch-Pagan df p-value Model after fix 22.591 24 0.544 Summary after resolving our assumptions behind our fitted model After these diagnostic tests, the study results about to interpreted is reliable. Table 65: Summary after resolving the assumptions behind our fitted model Model Likelihood Ratio Test Discrimination Indexes Obs. 2209 LR chi2 1373.25 R-Square 0.463 Sigma0 3760 Df 24 R-Squ. Adj. 0.457 Df 2184 Pr (> chi2) 0.0000 G 0.386 Residuals Analysis Min 1st Quartiles Median 3rd Quartiles Max -1.04266 -0.24925 -0.01725 0.24779 1.24011 Coefficient of the Fixed Model Estimate Std. Error t value Pr(>|t|) (Intercept) 1.1805372 0.3249786 3.633 0.000287 *** RegionBrong Ahafo 0.0260511 0.0331938 0.785 0.432644 (NS) RegionCentral 0.0165153 0.0349598 0.472 0.636682 (NS) RegionEastern 0.0160550 0.0383110 0.419 0.675206 (NS) RegionGreater Accra 0.1854645 0.0344450 5.384 <0.0001 *** RegionNorthern -0.0536515 0.0371116 -1.446 0.148410 (NS) RegionUpper East 0.1108757 0.0361808 3.064 0.002207 ** RegionUpper West -0.0298572 0.0373381 -0.800 0.424004 (NS) RegionVolta 0.2783656 0.0360307 7.726 < 0.0001 *** RegionWestern 0.0538071 0.0350977 1.533 0.125404 (NS) Gender_of_childMale 0.0708896 0.0160964 4.404 < 0.0001 *** Delivery_typeYes 0.0606169 0.0245646 2.468 0.013676 * Children_ever_bornModerate -0.0376362 0.0260730 -1.443 0.149026 (NS) Children_ever_bornSmall -0.1238582 0.0268070 -4.620 < 0.0001 *** Birth_typeSingle birth 0.6049093 0.0392759 15.402 < 0.0001 *** 101 University of Ghana http://ugspace.ug.edu.gh Mothers_weight 0.0023190 0.0009271 2.501 0.012445 * Mothers_height 0.0068369 0.0021008 3.254 0.001154 ** Wealth_indexPoorer -0.0720377 0.0268109 -2.687 0.007267 ** Wealth_indexPoorest -0.0472767 0.0280248 -1.687 0.091754 (NS) Wealth_indexRicher -0.0138321 0.0261286 -0.529 0.596592 (NS) Wealth_indexRichest -0.0227756 0.0280711 -0.811 0.417250 (NS) Size_of_childLarger than average 0.2087758 0.0198437 10.521 < 0.0001 *** Size_of_childSmaller than average -0.3053589 0.0266587 -11.454 < 0.0001 *** Size_of_childVery large 0.5621383 0.0252864 22.231 < 0.0001 *** Size_of_childVery small -0.5746811 0.0469995 -12.227 < 0.0001 *** NS=Not significant *p-value <0.05 **p-value <0.01 ***p-value <0.001 Note: Delivery place was removed after model fixing Step Seven: Results Interpretation of Model Before one makes use of a multiple regression equation (MRE), it is desirable to determine first whether it is, in fact, worth using. In this regard, the coefficient of multiple determination (R- Square) is utilized together with our assessment of its significance. An adjusted R-Square estimate is necessary in order to make the coefficient comparable. In the present analysis, only four of the seven maternal variables (mother’s weight, height, region and wealth index), two of three outcome of birth (gender and size of child) and three of seven obstetric variables (delivery type, children ever born and birth type) met the criteria (probability of p<0.050) to enter into the models for developing multiple regression equation (MRE) (Table 8.1). In our final model, the calculated R-Square is 0.463, stating a significant portion (above 46%) of the variation in newborn birthweights is explained by these nine predictors and R (multiple correlation coefficient) was 0.68, which is not high and not very close to the ideal value ‘1’. It indicates that MRE might not be of much use for the estimation of birthweight of newborn for the given value of the mother’s weight in the segment of the population under study. Our adjusted 102 University of Ghana http://ugspace.ug.edu.gh squared multiple correlation coefficient reported as (0.457) is not very different from the unadjusted value of (0.463) because the number of independent variables (k=24) is much less than the number of observations (n=2209). Table 66: ANOVA Table for Multiple Linear Regression Model Sum of Squares df Mean Square F Sig. 1 Regression 266.137 24 11.089 78.65 .000b Residual 308.737 2184 0.141 Total 574.874 2208 a. Dependent Variable: Birthweight b. Predictors: (Constant), Region, Mothers height, Mothers weight, Gender of a child, Size of a child, Delivery type, Wealth index, Birth type, Children ever born In Table 66, we then examine our ANOVA table associated with our multiple linear regression. Our sum of squares due to the regression model is reported as 266.137 and the sum of squares due to error (residual sum of squares) is 308.737. The residual mean square is an estimate of the variance and is equal to 0.141. The value of the F statistic is equal to 78.65 with our corresponding p-value of 0.000 providing very strong evidence of the utility of the model. Presently, the study dissects the part of the findings providing the estimates of the regression parameters: According to our model, the estimated regression line of birthweights on our nine predictors is: 𝝁{𝑩𝑾𝑻} = 𝟏. 𝟏𝟖𝟏(𝑰𝒏𝒕𝒆𝒓𝒄𝒆𝒑𝒕) + 𝟎. 𝟎𝟐𝟔(𝑹𝒆𝒈 = 𝑩. 𝑨𝒉𝒂𝒇𝒐) + 𝟎. 𝟎𝟏𝟕(𝑹𝒆𝒈 = 𝑪𝒆𝒏𝒕𝒓𝒂𝒍) + 𝟎. 𝟎𝟏𝟔(𝑹𝒆𝒈 = 𝑬𝒂𝒔𝒕𝒆𝒓𝒏) + 𝟎. 𝟏𝟖𝟓(𝑹𝒆𝒈 = 𝑮. 𝑨𝒄𝒄𝒓𝒂) − 𝟎. 𝟎𝟓𝟒(𝑹𝒆𝒈 = 𝑵𝒐𝒓𝒕𝒉𝒆𝒓𝒏) + 𝟎. 𝟏𝟏𝟏(𝑹𝒆𝒈 = 𝑼. 𝑬𝒂𝒔𝒕) − 𝟎. 𝟎𝟑𝟎(𝑹𝒆𝒈 = 𝑼. 𝑾𝒆𝒔𝒕) + 𝟎. 𝟐𝟕𝟖(𝑹𝒆𝒈 = 𝑽𝒐𝒍𝒕𝒂) + 𝟎. 𝟎𝟓𝟒(𝑹𝒆𝒈 = 𝑾𝒆𝒔𝒕𝒆𝒓𝒏) + 𝟎. 𝟎𝟕𝟏(𝑮𝒐𝑪 = 𝑴𝒂𝒍𝒆) + 𝟎. 𝟎𝟔𝟏(𝑫𝒆𝒍. 𝒕𝒚𝒑𝒆 = 𝒀𝒆𝒔) − 𝟎. 𝟎𝟑𝟖(𝑪𝒆𝑩 = 𝑴𝒐𝒅𝒆𝒓𝒂𝒕𝒆) − 𝟎. 𝟏𝟐𝟒(𝑪𝒆𝑩 = 𝑺𝒎𝒂𝒍𝒍) + 𝟎. 𝟔𝟎𝟓(𝑩. 𝒕𝒚𝒑𝒆 = 𝑺𝒊𝒏𝒈𝒍𝒆) + 𝟎. 𝟎𝟎𝟐(𝑴𝑾𝒆𝒊𝒈𝒉𝒕) + 𝟎. 𝟎𝟎𝟕(𝑴𝑯𝒆𝒊𝒈𝒉𝒕) − 𝟎. 𝟎𝟕𝟐(𝑾. 𝒊𝒏𝒅𝒆𝒙 = 𝑷𝒐𝒐𝒓𝒆𝒓) − 𝟎. 𝟎𝟒𝟕(𝑾. 𝒊𝒏𝒅𝒆𝒙 = 𝑷𝒐𝒐𝒓𝒆𝒔𝒕) − 𝟎. 𝟎𝟏𝟒(𝑾. 𝒊𝒏𝒅𝒆𝒙 = 𝑹𝒊𝒄𝒉𝒆𝒓) − 103 University of Ghana http://ugspace.ug.edu.gh 𝟎. 𝟎𝟐𝟑(𝑾. 𝒊𝒏𝒅𝒆𝒙 = 𝑹𝒊𝒄𝒉𝒆𝒔𝒕 ) + 𝟎. 𝟐𝟎𝟗(𝑺𝒐𝑪 = 𝑳𝒂𝒓𝒈𝒆𝒓 𝒕𝒉𝒂𝒏 𝒂𝒗𝒆𝒓𝒂𝒈𝒆) − 𝟎. 𝟑𝟎𝟓(𝑺𝒐𝑪 = 𝑺𝒎𝒂𝒍𝒍𝒆𝒓 𝒕𝒉𝒂𝒏 𝒂𝒗𝒆𝒓𝒂𝒈𝒆) + 𝟎. 𝟓𝟔𝟐(𝑺𝒐𝑪 = 𝑽𝒆𝒓𝒚 𝒍𝒂𝒓𝒈𝒆 − 𝟎. 𝟓𝟕𝟓(𝑺𝒐𝒄 = 𝑽𝒆𝒓𝒚 𝒔𝒎𝒂𝒍𝒍) 4.5.2 Towards Building a Logistic Regression Model The analyses in our study also utilized logistic regression modeling due to its conversion into a binary dependent variable (birthweight), indicating low or normal weight. This modeling approach will try to use the predictors available in the dataset to predict the weight of the newborn baby. The multiple regression models demonstrated not many quality results with respect to its R-Square, that might suggest that these predictors are not enough to explain all the variance observed in the birthweight response variable. However, the major goal of the study is to identify our risk factors accounting for a low-weight infant at birth. For that, we will reformulate our modeling problem as a classification problem, testing for threshold in the dataset, which will split healthy infants from infants at risk, and fitting logistic regression on this binary outcome — *no* for normal infants and *yes* for infants with low weight. 4.5.2.1 Fitting Logistic Regression The study began by trying to model the probability of birthweight based on low weight. The birthweight dataset contains information on 3,361 neonates. Birthweight classification is the dependent variable in the data, where 1 denotes presence of low weight and 0 denotes the absence of low weight (normal weight). There are 90.1% of normal weight infants and 9.9% of infants are low weight in this data. This dataset is used to develop logistic regression in order to predict our likelihood of having low birthweight. Step Zero: Creating a Classification table with full data First, a “cut-off” value c (0.5) is chosen. For each subject in the low birthweight dataset, the study “predicts” the baby’s birthweight status as 0 coded to be normal if their fitted probability of being 104 University of Ghana http://ugspace.ug.edu.gh normal birthweight is greater than c, otherwise, we predict it as 1 coded to low). Then after, a table is constructed to show how many of the observations our study has correctly predicted. Table 67: Classification Table on full low weight data Predicted Birthweight group Observed LB NB Percentage Correct Birthweight NBW 3026 0 100.0 LBW 335 0 0 Overall Percentage 90.0 There are 3026 normal birthweights and 335 low birthweights. Thus, our odds of low birthweight are equal to 335/3026=0.1107. Our odds are confirmed in the output for the model with a constant only. Variables in the Equation Standard Wald Degree of Est Error Statistics freedom Sig. value Exp(B) Constant -2.201 .058 1440.072 1 .000 .111 Step One: Data Splitting for the Logistic model building Likewise, we load the cleaned data and split it into train and test subsets. We do this because we will use the train set for model selection and use the test set for backtesting the performance of the chosen model. In providing an accurate understanding of the generalizability of our final optimal model, a 70% train and 30% test split of our data is done using the same randomized sampling from the caTool package in R software. Step Two: Apply Sample methods to Imbalanced Data As the study has less low birthweight (less 10%), we apply sample methods to balance the data. The study employed the use of Over, Upper, Mixed (Both) and ROSE sampling methods using ROSE package and SMOTE sampling method using DMwR package in R. 105 University of Ghana http://ugspace.ug.edu.gh • Oversampling, as Birthweight (1) is having less occurrence, so this oversampling method will increase the birthweight records until matches good records 3361. Here N= 2352*2=4704. • Under-sampling, as Birthweight (1) is having less occurrence, so this under-sampling method will decrease the normal birthweight records until matches birthweight records, but the limitation to this is that we can lose significant information from the sample. • Mixed Sampling apply both undersampling and oversampling on this imbalanced data. • ROSE Sampling, this helps us to generate data synthetically. It generates artificial data instead of duplicate data. • SMOTE (Synthetic Minority Over-sampling Technique) is a technique based on nearest neighbors judged by Euclidean Distance between data points in feature space. There is a percentage of Over-Sampling which indicates the number of synthetic samples to be created and this percentage parameter of Over-sampling is always a multiple of 100. Table 68: Birthweight in train dataset after applying Sampling method Sampling Method Birthweight Classification Area Under Curve (ROC) on test data LBW NBW Train dataset 234 2118 0.652 Over Sampling 2586 2118 0.736 Under Sampling 234 766 0.721 Mixed Sampling 1707 1654 0.752 ROSE Sampling 1173 1179 0.813 SMOTE Sampling 468 468 0.793 Now, we have five different types of inputs which are balanced and ready for prediction. We then apply the logistic classifier to all these five datasets and calculate the prediction performance of each on the test set. The highest data accuracy is obtained using the ROSE method. However, there is not much variation in these sampling methods. The present data employed the ROSE Sampling technique to balance data for the logistic model fitting. (See Table 68) 106 University of Ghana http://ugspace.ug.edu.gh Step Three: Variables selection for Logistic Model building LogModel 1: Stepwise variable selection using AIC (Backward Selection) Next, we applied the backward selection method on the training set and compared indicators using the AIC. The AIC indicator converges in the idea that we should only consider nine predictors in our analysis. We selected the predictors with p-value less 0.05. LogModel 2: Variables significance using Chi-square test In LogModel 2, a chi-square test of significance of variables is conducted to validate the selection on the training set made by the stepwise backward selection. The p-value indicators converged in the idea that we should also consider nine variables. Table 69: Summary of Variable selection for Logistic Regression Model building LogModel 0: LogModel 1: LogModel 2: Full Model Stepwise variable selection using Chi-square test for AIC (Backwards Selection) significance of variables Number of variables Entered 17 9 9 • Region • Region • Gender of child • Gender of child • Delivery place • Delivery place All predictors • Birth type • Birth type Variables Entered • Birth order • Birth order • Mothers height • Mothers height • Preceding birth interval • Preceding birth interval • Wealth index • Wealth index • Size of child • Size of child Since each variable selection method yielded the same nine predictors, the study concluded by using these significant predictors in the logistic model building and knocked out eight variables considered as insignificant. The significant predictors selected were reran in order to build the final logistic regression model. 107 University of Ghana http://ugspace.ug.edu.gh Step Four: Model Evaluation Our logistic regression model has been formulated and their corresponding coefficients will later be analyzed. In any case, few critical questions need to be answered. Is our model any good? How well does our model fit the training data? Are our predictions accurate? Which independent variables are most significant? (1) Goodness of Fit test on the model • Likelihood Ratio Test (Drop-in-Deviance) Our model is said to produce the best fit to our dataset if it depicts an improvement over a reduced model (fewer covariates). Given that our null hypothesis (H0) holds that the reduced model is true, a p-value less than (0.05) for our overall model fit statistic would compel us to reject our null hypothesis. This would provide enough evidence against the reduced model in favor of the full model. Likelihood ratio test Model #df Likelihood Ratio test Difference Chisq Pr(>Chisq) LogModel 0 44 -1019.44 LogModel 1 30 -992.55 -14 53.777 0.1083 In the likelihood ratio test table above, value for the reduced model is higher than the full model - 992.55 > -1019.44, the p-value (0.1083) is greater than the alpha value of (0.05). This provides evidence for the null hypothesis that our null hypothesis is true. • Hosmer-Lemeshow Test To validate our goodness of model fit, the Homer-Lemeshow statistic was also computed. This investigates whether our observed proportions of events are like the predicted probabilities of occurrence in subgroups of the dataset using a Pearson chi-square test. The study expresses our null and alternative hypotheses as follows: 108 University of Ghana http://ugspace.ug.edu.gh 𝑯𝒐: The model is a good fit for the data against 𝑯𝟏: The model does not fit for the data well Hosmer and Lemeshow Test Step X-squared Chi-square Df Sig. 1 2352 8.110 8 0.43 From the Hosmer and Lemeshow test table, a Chi-square of 8.110 with 8 degrees of freedom and the corresponding p-value is computed as 0.43. P-value greater than 0.05 tells us that the study fails to reject our null hypothesis that there is no difference between the observed and model- predicted values. It suggests that our model provides an adequate fit to the train data. (2) Measures of the Proportion of Variation explained by our logistic model After assessing our model fit, it was necessary to establish the strength of our association among the significant predictors with birthweight. Few statistics have been proposed on account of logistic modeling that can be viewed as generally equal in interpretation to the coefficient. Nonetheless, the study utilized the application of the Pseudo R-Square. • McFadden’s R-square The measurement lies between 0 to just under 1, with estimates nearer to zero indicating that our model has no power of prediction, values between above 0.2 generally being considered as satisfactory. If McFadden’s R-square is less than 0.2, then the model does a poor job in explaining the target variable. • Nagelkerke’s R-Square and Cox and Snell’s R-Square This is based on the computation of our relative change in the log-likelihood for our intercept- only-model to our full model. 109 University of Ghana http://ugspace.ug.edu.gh Strength of Relationship of the Logistic Model Pseudo R-squared McFadden 0.3911766 Cox and Snell (ML) 0.4185809 Nagelkerke (Cragg and Uhler) 0.5581091 The interpretation in table above is that our model (region, gender of child, delivery place, birth type, birth order, mother’s height, preceding birth interval, wealth index and size of child) as our predictors explains about 39% of the variation in our model using the McFadden R-Square, 42% and 56% respectively in Cox and Snell and Nagelkerke . (3) Statistical Tests for Individual Predictors • Wald Test The Wald test was utilized to examine each of our coefficient’s statistical significance in our model. The idea is to test the hypothesis that our coefficient of predictors is not significantly different from zero. If our test fails to reject the null hypothesis, the suggestion is that removing the variable would not substantially harm our fitted model. Variable Degree of freedom F-Statistics P-value Region 9 7.966189 0.0000* Gender of child 1 21.64232 0.0000* Delivery place 5 3.431877 0.0043* Birth type 1 210.4018 0.0000* Birth order 1 24.3509 0.0000* Mothers height 1 15.58437 0.0000* Wealth index 4 3.316142 0.0102* Size of child 4 114.0508 0.0000* * Significant at alpha 0.05 110 University of Ghana http://ugspace.ug.edu.gh Since our p-value for the Wald statistics is less 0.05, it ought to force us to reject our null hypothesis and accept that our variable ought to be incorporated into the fitted model. Notwithstanding, a p- value exceeding 0.05 proposes that such covariates can be excluded from the model. (4) Validation of Predicted Values using ROC Curve (AUROC) In the y-axis, true positive rate (TPR) or sensitivity is displayed and x-axis false positive rate (FPR) or 1 – specificity is displayed, the curve approaching 1 represents the best performance of the model. The Area under Curve (AUC) values greater than 70% is considered a model with high predictive accuracy. In Figure 19, illustrating how our logistic regression model, Area under curve (AUC) value is around 81%, which is considered an accurate model. (See figure below). Figure 16: ROC Curve for our Logistic Model Step Five: Diagnostic Test of the Model 1. Variance Inflation Factor for Multicollinearity Variance inflation factor was calculated for each predictor in our model. While there is disagreement on the appropriate cutoff for identifying whether the model suffers from multicollinearity, the following rules provide a rough guideline. 111 University of Ghana http://ugspace.ug.edu.gh • If VIF > 1 and VIF < 2.5, then those explanatory variables are moderately correlated. • If VIF > 2.5, then those variables are highly correlated. GVIF Df GVIF^(1/(2*Df)) Region 2.185337 9 1.058560 Gender of child 1.050815 1 1.025093 Delivery place 1.568413 5 1.046035 Birth type 1.219042 1 1.104102 Birth order 1.216167 1 1.102800 Mothers height 1.089146 1 1.043622 Wealth index 2.069814 4 1.125221 Size of child 1.398831 4 1.042847 Looking at the rule of thumb above, one could conclude that there is no existence of multicollinearity in our predictor variables. 2. Influential Observations After our model development, we also want to examine whether certain observations in the dataset that had a significant influence on the estimates. Figure 17: Cooks distance for the Logistic Model 112 University of Ghana http://ugspace.ug.edu.gh During the analysis, 18 data points were significantly different from others, which in one way or the other might had to affect our parameter estimates. Hence, a decision was made to examine these influential data points. The influential data points were removed, and our model was rerun, it could be seen from the table below an improvement on our model performance. Table 70: Strength of Relationship of the Logistic Model Old Pseudo R-Squared New Pseudo R-Squared McFadden 0.3911766 0.595460 Cox and Snell (ML) 0.4185809 0.561877 Nagelkerke (Cragg and Uhler) 0.5581091 0.749265 Step Six: Model Summary and Interpretation The variables in our Parameter Estimates in Table 71 has several significant components. The Wald statistic and its associated probabilities provide us an index of each predictor importance in our equation. In our Table below, the second column the value of ‘B’ is our estimated coefficient, with standard error (S.E.) (third column of the same table). The ratio of ‘B’ to Standard error, squared, equals to Wald statistic. The simplest way to assess Wald statistic is to take the significance values (as shown in the fifth column of the same table) and if it is less than 0.05, then reject the null hypothesis as the variable does make a significant contribution. Further, it is to be concluded that the parameter is useful to the model. “Exp (B)” column in the table below represents the extent to which raising the corresponding measure by one unit influences the odds ratio. In other words, “Exp (B)” or odds ratio is the predicted change for a unit change in odds for a unit increase or decrease in the predictor. The “exp” refers to the exponential value of B. When Exp (B) is less than 1; the increasing value of the variable corresponds to the decreasing odds of the event’s occurrence. When Exp (B) is greater than 1, increasing the value of the variable correspond to the increasing odds of the event’s 113 University of Ghana http://ugspace.ug.edu.gh occurrence. In other words, if the value of Exp (B) exceeds 1, the odds of an outcome occurring increase; if the value is less than 1, any increase in the predictor leads to a drop in the odds of the outcome occurring. Table 71: Summary analysis of the final Logistic regression model Model Likelihood Ratio Pseudo R-Squared Rank Discrimination Test Indexes Obs. 2352 LR chi2 1256.51 McFadden 0.595460 C 0.885 NB 1179 df 29 Cox and Snell 0.561877 LB 1173 Pr (> chi2) <0.0001 Nagelkerke 0.749265 Where; • Obs. is the number of observations in our model fit • LR is model likelihood ratio χ2 • df is the degree of freedom • Pr (> chi2) is the P-value • c index is the area under ROC curve • The McFadden R-Square, Cox and Snell and Nagelkerke R-Square 114 University of Ghana http://ugspace.ug.edu.gh Parameter Estimates Estimate (B) Std. Wald z value Pr (> | z| ) Odds ratio = CI'S for the Odds Ratios Error Test Exp (B) Lower Upper (Intercept) 12.014* 2.090 33.043 5.749 <0.0001 164999.800 2856.718 10365548.000 Base(RegionAshanti) RegionBrong Ahafo -1.375* 0.350 15.439 -3.929 <0.00001 0.253 0.127 0.500 RegionCentral -1.975* 0.380 27.009 -5.197 <0.00001 0.139 0.065 0.289 RegionEastern 0.373 0.354 1.112 1.055 0.292 1.452 0.726 2.913 RegionGreater Accra -2.274* 0.370 37.826 -6.150 <0.00001 0.103 0.049 0.211 RegionNorthern -0.318 0.430 0.547 -0.740 0.460 0.727 0.313 1.692 RegionUpper East -1.109* 0.362 9.406 -3.067 0.002 0.330 0.162 0.668 RegionUpper West 0.265 0.350 0.573 0.757 0.449 1.303 0.656 2.590 RegionVolta -2.216* 0.410 29.167 -5.401 <0.00001 0.109 0.048 0.240 RegionWestern -0.271 0.344 0.618 -0.786 0.432 0.763 0.387 1.494 Base(Gender_of_childFeMale) Gender_of_childMale -0.925* 0.155 35.710 -5.976 <0.00001 0.397 0.292 0.536 Base(Delivery_place_Gov. clinic) Delivery_placeGovernment hospital 0.221 0.198 1.246 1.116 0.264 1.248 0.847 1.843 Delivery_placeMaternity home -1.200 0.664 3.266 -1.806 0.709 0.301 0.072 1.019 Delivery_placeOther 1.074 0.863 1.549 1.245 0.213 2.927 0.581 16.176 Delivery_placePrivate hospital, clinic 0.463 0.353 1.717 1.310 0.190 1.588 0.797 3.185 Delivery_placeRespondent's home 1.516 0.304 24.937 4.994 0.124 4.553 2.526 8.314 Base(Birth_typeMultiple birth) 115 University of Ghana http://ugspace.ug.edu.gh Birth_typeSingle birth -6.848* 0.647 111.986 -10.582 <0.00001 0.001 0.000 0.003 Birth_order -0.714* 0.161 19.687 -4.437 <0.00001 0.490 0.356 0.669 Mothers_height -0.107* 0.019 32.086 -5.664 <0.00001 0.898 0.865 0.932 Base(Wealth_indexAverage) Wealth_indexPoorer 0.447 0.282 2.513 1.585 0.113 1.564 0.900 2.722 Wealth_indexPoorest 0.680* 0.305 4.964 2.228 0.026 1.973 1.086 3.596 Wealth_indexRicher 0.291 0.270 1.156 1.075 0.282 1.337 0.787 2.274 Wealth_indexRichest -0.179 0.323 0.307 -0.554 0.580 0.836 0.444 1.575 Base(Size_of_childAverage) Size_of_childVery small 2.277* 0.224 103.331 10.189 <0.0001 9.752 6.371 15.332 Size_of_childSmaller than average 1.540* 0.149 106.824 10.322 <0.0001 4.666 3.494 6.274 Size_of_childLarger than average -1.486* 0.151 96.846 -9.844 <0.0001 0.226 0.168 0.303 Size_of_childVery large -3.404* 0.403 71.346 -8.443 <0.0001 0.033 0.014 0.069 Note: Base is the reference category * means p<0.05 116 University of Ghana http://ugspace.ug.edu.gh 4.5.2.3 Summary of our Logistic Model Fit Considering low weight (< 2.5kg) and normal weight (≥ 2.5kg), results as depicted in Table 71 shows that variables found to be significantly related to birthweight of a newborn are: Table 72: Significant Variable in the Logistic model Variable Entered Estimated (B) Odds ratio Sig. Overall Intercept 12.014 33.043 <0.0001 Region=Brong Ahafo - 1 . 3 75 0.253 <0.0001 2.0484976 Region=Central - 1.975 0.139 <0.0001 3.5554126 Region=Greater Accra 2 . 2 7 4 0.103 <0.0001 4.01998 Region=Upper East -1.109 0.330 0.0022 1.6830045 Region=Volta - 2.216 0.109 <0.0001 3.3761294 Gender of child=Male - 0 . 9 25 0.397 <0.0001 4.5074859 Birth type=Single birth - 6 . 848 0.001 <0.0001 14.130663 Birth order -0.714 0.490 <0.0001 5.6868232 Mothers height - 0.107 0.898 <0.0001 4.0588195 Wealth index=Poorest 0 . 6 80 1.973 0.0258 3.60798 Size of child=Very small 2 . 2 7 7 9.752 <0.0001 10.1894 Size of child=Smaller than 1.540 4.666 <0.0001 10.322 average Size of child=Larger than -1.486 0.226 <0.0001 9.84358 average Size of child=Very large - 3 . 4 04 0.033 <0.0001 8.44257 *Overall = Variable Importance Variables (Entered and Removed) Interpretation For every one-unit change in mothers height, our log odds of low birthweight (against normal birthweight) of a child decreases by -0.107kg with an odds ratio of 0.898kg. Likewise, for a unit decrease in the birth order of the infant, the log odds of being low weight decreases by - 0.714kg with an odds ratio of 0.490kg. 117 University of Ghana http://ugspace.ug.edu.gh The indicator variables for Region depicts that Brong Ahafo, Central, Greater Accra, Upper East, Volta are found to be significantly related to birthweight of a newborn with respective estimated (B) as [-1.375, -1.975, 2.274, -1.109, -2.216]. It implies that the newborn whose mother comes from Brong Ahafo would 0.253 times more likely to be normal weight as compared to a mother from the Ashanti. Similarly, the newborn whose mother’s region of residence is at Central would 0.139 times more likely to be NBW than whose mother is from the Ashanti. The newborn whose mother is from Volta would 0.109 times more likely to be normal weight than whose mother is from the Ashanti. Infants of mothers from Greater Accra would 0.103 times more likely to be NBW as compared to mothers from Ashanti. Finally, the newborn whose mother’s region of residence is from Upper East would 0.330 times more likely to be NBW than whose mother is from the Ashanti. In terms of odds, the result can be concluded as follows. For the mothers’ region, we would conclude that newborn whose mothers’ region of residence is in Brong Ahafo, Central, Greater Accra, Upper East, Volta are less likely to be LBW, compared to mothers’ region of residence is in Ashanti (the reference category). The odds of newborn being LBW whose mother’s Region is in Brong Ahafo are [0.253 – 1] *100% = -74.7%, i.e. 74.5% lower than the odds of being LBW whose mother’s region is in Ashanti. The indicator variables for the gender of the child indicated that male is found to be significantly related to birthweight of a newborn with an estimated (B) value of as -0.925. It implies that the newborn whose gender is male would 0.397 times more likely to be NBW as compared to a female child. The odds of newborn being LBW whose gender is male are [0.397 – 1] *100% = -60.3%, i.e. 60.3% lower than the odds of being LBW whose gender is female. 118 University of Ghana http://ugspace.ug.edu.gh The indicator variables for birth type showed that Single birth is found to be significantly related to birthweight of a newborn with an estimated (B) value of as -6.848. It implies that the newborn who came as single born was 0.001 times more likely to be NBW as compared to multiple births (as birth with more than one child). The indicator variables for maternal wealth index depicted that only mothers from the poorest class are found to be significantly related to birthweight of a newborn with an estimated (B) value as 0.680. It implies that the newborn whose mothers are classified under the poorest class would 1.973 times more likely to be NBW as compared to mothers who belong to the middle class of wealth index. Finally, the indicator variables for the size of a child depicted that sizes of child [Very Small, Smaller than average, Larger than average and Very Large] are found to be significantly related to birthweight of a newborn with respective estimated (B) as [2.277, 1.540, -1.486, -3.404]. It implies that the newborn whose size at birthweight is Very Small would 9.752 times more likely to be NBW as compared to a child whose size at birth is average. Similarly, the newborn whose size at birth is Smaller than average would 4.666 times more likely to be NBW than whose child whose size at birth is average. The newborn whose size at birth is Larger than average would 0.226 times more likely to be NBW than whose birth size is average. Finally, the newborn whose birth size is very large would 0.033 times more likely to be NBW than infants with average birth size. 4.5.2.4 Constructing the Logistic Model fit line Each predictor in our logistic regression model fit provides us a coefficient ‘B’ which measures its individual contribution of variations on our dependent variable. The study noted that our dependent variable can take only on one of the two values: 0 or 1, therefor our logistic regression model is as follows. 119 University of Ghana http://ugspace.ug.edu.gh 𝝅 𝐥𝐧 [ ] = 𝟏. 𝟏𝟖𝟏(𝑰𝒏𝒕𝒆𝒓𝒄𝒆𝒑𝒕) + 𝟎. 𝟎𝟐𝟔(𝑹𝒆𝒈 = 𝑩. 𝑨𝒉𝒂𝒇𝒐) + 𝟎. 𝟎𝟏𝟕(𝑹𝒆𝒈 = 𝑪𝒆𝒏𝒕𝒓𝒂𝒍) + 𝟏−𝝅 𝟎. 𝟎𝟏𝟔(𝑹𝒆𝒈 = 𝑬𝒂𝒔𝒕𝒆𝒓𝒏) + 𝟎. 𝟏𝟖𝟓(𝑹𝒆𝒈 = 𝑮. 𝑨𝒄𝒄𝒓𝒂) − 𝟎. 𝟎𝟓𝟒(𝑹𝒆𝒈 = 𝑵𝒐𝒓𝒕𝒉𝒆𝒓𝒏) + 𝟎. 𝟏𝟏𝟏(𝑹𝒆𝒈 = 𝑼. 𝑬𝒂𝒔𝒕) − 𝟎. 𝟎𝟑𝟎(𝑹𝒆𝒈 = 𝑼. 𝑾𝒆𝒔𝒕) + 𝟎. 𝟐𝟕𝟖(𝑹𝒆𝒈 = 𝑽𝒐𝒍𝒕𝒂) + 𝟎. 𝟎𝟓𝟒(𝑹𝒆𝒈 = 𝑾𝒆𝒔𝒕𝒆𝒓𝒏) + 𝟎. 𝟎𝟕𝟏(𝑮𝒐𝑪 = 𝑴𝒂𝒍𝒆) + 𝟎. 𝟎𝟔𝟏(𝑫𝒆𝒍. 𝒕𝒚𝒑𝒆 = 𝒀𝒆𝒔) − 𝟎. 𝟎𝟑𝟖(𝑪𝒆𝑩 = 𝑴𝒐𝒅𝒆𝒓𝒂𝒕𝒆) − 𝟎. 𝟏𝟐𝟒(𝑪𝒆𝑩 = 𝑺𝒎𝒂𝒍𝒍) + 𝟎. 𝟔𝟎𝟓(𝑩. 𝒕𝒚𝒑𝒆 = 𝑺𝒊𝒏𝒈𝒍𝒆) + 𝟎. 𝟎𝟎𝟐(𝑴𝑾𝒆𝒊𝒈𝒉𝒕) + 𝟎. 𝟎𝟎𝟕(𝑴𝑯𝒆𝒊𝒈𝒉𝒕) − 𝟎. 𝟎𝟕𝟐(𝑾. 𝒊𝒏𝒅𝒆𝒙 = 𝑷𝒐𝒐𝒓𝒆𝒓) − 𝟎. 𝟎𝟒𝟕(𝑾. 𝒊𝒏𝒅𝒆𝒙 = 𝑷𝒐𝒐𝒓𝒆𝒔𝒕) − 𝟎. 𝟎𝟏𝟒(𝑾. 𝒊𝒏𝒅𝒆𝒙 = 𝑹𝒊𝒄𝒉𝒆𝒓) − 𝟎. 𝟎𝟐𝟑(𝑾. 𝒊𝒏𝒅𝒆𝒙 = 𝑹𝒊𝒄𝒉𝒆𝒔𝒕 ) + 𝟎. 𝟐𝟎𝟗(𝑺𝒐𝑪 = 𝑳𝒂𝒓𝒈𝒆𝒓 𝒕𝒉𝒂𝒏 𝒂𝒗𝒆𝒓𝒂𝒈𝒆) − 𝟎. 𝟑𝟎𝟓(𝑺𝒐𝑪 = 𝑺𝒎𝒂𝒍𝒍𝒆𝒓 𝒕𝒉𝒂𝒏 𝒂𝒗𝒆𝒓𝒂𝒈𝒆) + 𝟎. 𝟓𝟔𝟐(𝑺𝒐𝑪 = 𝑽𝒆𝒓𝒚 𝒍𝒂𝒓𝒈𝒆 − 𝟎. 𝟓𝟕𝟓(𝑺𝒐𝒄 = 𝑽𝒆𝒓𝒚 𝒔𝒎𝒂𝒍𝒍) Where 𝜋 is probability of low birthweight, Reg = Region, Del.type= Delivery typ, B.type=Birth type, GoC=Gender of child, CeB=Children ever born, MWeigh=Mothers weight, MHeight=Mothers height, W.index=Wealth index and Soc = Size of child. 4.5.2.4 Classification Rate of our Logistic Model How good the classification model (after including all the independent variables) is? The answer to this question is given in a classification table shown in Table 67. One way to assess how well our model fits the observed data is to obtain a classification table on our test dataset. This simple technique would indicate how good our model is at predicting our outcome variable. First, a “cut-off” value c=0.5 will be chosen. For each subject in the sample, newborns are “predicted” birthweight status as 0 (coded as normal weight) if their fitted probability of being normal birthweight is greater than c, otherwise predicted it as 1 (coded as low weight). The table constructed below shows how many of our test data had a correct prediction. 120 University of Ghana http://ugspace.ug.edu.gh Table 73: Classification Table on the Test data Observed Predicted Birthweight group NB LB Percentage Correct Final Step Birthweight group NB 822 62 93.0 LB 28 97 77.6 Overall Percentage 91.08 a. The cut value is .500 We should note that seven explanatory variables were used in the fitted model above for our test data. In the study, we noticed that our model percentage of correct predictions reported in Table 73 is 91.08% of the observations in the testing set, indicating an improvement on the full data reported earlier in table 69. Table 73 shows that out of 884 cases predicted to be the NBW group, only 822 cases are observed to be in the NBW group, while 62 in the LBW group. Similarly, out of 125 cases predicted to be LBW group, only 97 cases are correctly classified as LBW, while 28 in the NBW group. So out of 1009 cases, only 919 (822 + 97) are correctly classified and 90 (62+28) cases are misclassified. From this, it can be said that (1009 – 90) / 1009, or 91.08% of the cases are correctly classified with this model. The researcher can predict with 91.08% accuracy on the test data. Our proportional by chance accuracy rate was calculated by computing the proportion of cases for each group based on the number of cases in each group in the classification table. The proportion in the "LBW" classification is 97/1009= 0.0961. The proportion in the "NBW" classification is 822/1009 = 0.815. Then, we square and sum the proportion of cases in each group (0.0961² + 0.815² = 0.673). That is 67.3% is proportional by chance accuracy rate. Our accuracy rate was reported as 91.08% which is greater than or equal to the proportional by chance accuracy criteria of 84.13% (1.25 * 67.3 =84.13%). Hence, the criterion for classification accuracy is satisfied. 121 University of Ghana http://ugspace.ug.edu.gh CHAPTER FIVE SUMMARY, CONCLUSIONS, AND RECOMMENDATIONS 5.0 Introduction Results of both the longitudinal and low birthweight data are summarized in relation to other findings, regarding limitations and strength of the study. The conclusions, recommendations, and future research are also discussed. 5.1 Summary There are social and economic costs attached to managing infants born with a birthweight of less 2.5kg. The 10%-15% prevalence of low birthweight in the Ghanaian population is a cause for alarm. However, establishing maternal level characteristics associated with infants of low weight provides a starting point to identify other key risk factors and modify to improve birth outcomes of neonates. The purpose of this research was to establish indicators of birthweight in an estimation and a classifier way using data from mother-child data from GDHS. A very wide spectrum of factors including physical, demographic, social, economic as well as previous obstetric factors has been considered to identify their relationship with the birthweight of the child. The study started with the explanation of the different type of factors considered and the exploration of their place in the determination of birthweight. The problem is considered in its complete form by considering all possible type of factors having a bearing on the birthweight. Research Questions addressed in the present investigation are (i) What are the prevalence rates of birthweight (low and normal)? (ii) What is the type of birthweight reporting system in Ghana? (iii) Can we investigate the use of the Mother’s perception of child size at birth (variable) as a proxy to birthweight of an infant if weigh data is missing? (iv) Which factors contribute to and influences the birthweight of newborns in Ghana? (v) If possible, can we hypothesize a statistical model in 122 University of Ghana http://ugspace.ug.edu.gh predicting the likelihood of a child’s birthweight belonging to a low or normal class using chosen factors such as maternal factors, outcomes of birth and maternal anthropometric variables? 5.1.1 Key Observations from Study Objectives 1. The mean birthweight for the study was 3.137kg, median birthweight was 3.130kg with a standard error of 0.01kg and standard deviation of 0.593kg. The minimum birthweight recorded is 0.8kg while the maximum is 5.5kg. The male infant recorded a higher average birthweight (3.195kg) compared to female infants (3.074kg). 2. Towards answering the first research question on the prevalence rate of birthweight, the statistics on low birthweight was recorded as 9.9% using the WHO’s definition of birthweight classification whiles the Researcher’s Adjusted Prevalence reported an adjusted low infant weight at 15.7%, a 5.8% increase in the WHO’s low birthweight recommendation. 3. The second research question revealed that almost two-fifth births in Ghana were reported at birth through mother memory recall. This finding suggests that the picture of reporting of birth weight in Ghana is still challenging. There is a need to have a formal procedure or recording of birth weight even birth outside a health facility. These findings corroborate with the study by Channon et al., which suggested that health systems in poor countries should initiate efforts to systematically monitor the recording of birth weight data ensuring for both quality and comparability at the international levels (Channon et al., 2011). Based on the result from the type of birthweight reporting system, it might be hypothesized that birthweights extracted from a health card were more accurate than those acquired from memory recall. 4. Moreover, in developing countries majority of birth occurs outside health facilities. The estimates of birth weight are prone to biases in the form of being inaccurate in measurement; 123 University of Ghana http://ugspace.ug.edu.gh imprecise in methods of reporting and varying background characteristics that influences the reporting on part of mother. These measurement issues can substantially distort the actual prevalence of LBW and hence the intervention formulated on this empirical information remains ineffective in practical sense. Therefore, accurate reporting of prevalence of LBW is important for monitoring the health of a population. It was reported that there were portions of proof that the size at birth of the neonates as disclosed by the mother may count as a possibility of a good proxy of birthweight in case, we have missing information of birth weight or unrecorded birth weight, even or deliveries in the institutional settings. 5. On account of the research in attempt to answer the forth research question stating which factors contribute to and influences the birthweight of newborns in Ghana, the study identified numerous risk factors for LBW, covariates found to statistically associated with newborn’s weight are Gender (0.02), Size at birth (0.000), Region (0.02), Wealth index (0.043), Birth order (0.02), Children ever born (0.039), Delivery place (0.011), Preceding birth interval (0.000), and Birth type (0.000). For parsimony, the significance level was established at p < 0.05, a 95% confidence interval (CI). 5.1.2 Final Comments on Multiple Linear Regression fit The objective of the research was to build a model that allows us to estimate the separate effects of some maternal characteristics and birth outcomes on an infant's birthweight. The following seventeen independent variables were considered: Gender of child, Type of delivery by caesarean, Size of child at birth, Mothers age, Children ever born, Birth type, Maternal anthropometry (Weight and height), Maternal education, Mother’s region of residence, maternal religion, type of residence, Birth order, Delivery Place, Preceding birth interval and Wealth index. 124 University of Ghana http://ugspace.ug.edu.gh We examined the relationship between neonate’s birthweight and the seventeen independent variables in the multiple linear regression modeling. Thereafter, a variable selection was conducted and found to fish out significant predictors for the model building. It converged in the idea that, of the seventeen variables, nine had p-values less 0.05. We found that over 46 % of the variation in newborn birthweight is clarified by these nine covariates. The value of the determination coefficient (0.46) indicates that the model can be improved by adding new significant variables. The above model demonstrated a very strong effect of size of a child at birth on infant birthweight subsequent to representing both maternal and birth outcome factors. The region, Mothers height, Mothers weight, Gender of a child, Delivery type, Wealth index, Children ever born, and Birth type is likewise important contributors. However, the remaining eight variables: Mothers age, Maternal education, Type of residence, Maternal Religion, Birth order, Delivery Place and Preceding birth interval were found to be nonsignificant contributors. The comparison of our nine predictor variables by means of individual t-statistics depicts the relative magnitude of the unique contribution of each variable to the overall variability in birthweight. According to the above output, the Size of child=Very Large is the largest contributor to the explained variation in birthweight. The regression coefficient associated with the Size of child=Very Large is 0.5621383 with a corresponding t-ratio of 22.231, indicating a very strong effect of infants’ weight after accounting for the effect of maternal and birth outcome variables. Birth type=Single (15.402), Size of child=Larger than average (10.521) and Region=Volta (7.726) are the next three most important contributors. 125 University of Ghana http://ugspace.ug.edu.gh The linear fit was backed by a normality test. Our model was free from multicollinearity. Few unusual data points that affected our initial fit were treated and hence we can be dependent on the inference made from the linear fit. 5.1.3 Final Comments on Logistic Regression fit Likewise, the study examines impacts of the seventeen predictors (maternal characteristics, the outcome of child and obstetric factors) on infant and low birthweight in Ghana using chi-square analysis, binary logistic, likelihood, and odds ratio tests corresponding with the level of significance. Out of these seventeen predictors, only seven were significantly related to low birthweight. We found that between 56% to 75% of the statistical variation in infant low birthweights is explained by seven covariates using the three metric strength test score in table 73 above. The value of the determination coefficients above indicates that the model can still be improved by adding new significant variables. The findings suggest that birth type of child status has been discovered as the most significant obstetric predictor of newborn’s weight determination. That means the birthweight rate decreases with multiple birth type. Further, some maternal characteristics variables have a considerable influence on newborn birthweight. Among these variables include the region of residence and maternal classification of wealth were found to be having contributions to a child’s weight. That is, children whose mother is from Greater Accra, Volta, Brong Ahafo, Upper East, Central and from the richer home have a lower risk of low weight at birth. Factors like the mother’s height and birth order of child are likewise found to be significant factors to determine infants’ weight. In conclusion, the findings of the research demonstrate that different factors such as Region, Mothers height, Gender of a child, Wealth index, size of a child, Children ever born, and Birth type have statistically significant impacts to determine infants’ weight in Ghana. The logistic fit 126 University of Ghana http://ugspace.ug.edu.gh was backed by its assumptions. Our model was multicollinearity free. Few unusual data points that affected our initial fit were treated, henceforth we can depend on the inferences produced using the logistic fit. 5.1.4 Comparison of Two Sets of Fits Nearly similar factors appear to be significant with its direction of influence perfectly coordinating. Among longitudinal and low birthweight dataset, a few factors were common. Among the explanatory factors: Size at birth, Region, Mothers height, Gender of child, Delivery type, Wealth index, Children ever born and Birth type emerge as significant, while Mothers age, Maternal education, Type of residence, Maternal Religion, Birth order, Delivery Place and Preceding birth interval were not statistically significant. The study discovered only one opposite insignificance of the variable mother’s weigh, the longitudinal data, mothers weight seems to be significant whiles mothers weight turned out to be insignificant in the low birthweight data. 5.2 Conclusions In the Ghanaian context, this study has contributed to the understanding of maternal determinants associated with infant birthweight at the population level. Findings from this study have, therefore, provided a starting point towards identifying risk factors and providing clues to health service providers on maternal determinants and birth outcomes factor to concentrate health promotion messages on. Almost all identified risk factors are for low birthweight in the study is modifiable and preventable. The problem is multidimensional and hence need an integrated approach incorporating medical, social, economic, and educational measures to address this issue. In conclusion, a comprehensive approach which institute a combination of interventions to improve the overall health of women are needed. Such approaches are likely to be most effective in reducing the low birthweight of infants in Ghana. 127 University of Ghana http://ugspace.ug.edu.gh 5.3 Recommendations The study closed by pointing out some significant limitations to the extent of our study, and proposed areas for further studies. • Better pregnancy outcomes can be expected by providing adequate antenatal care and nutrition, effective management of complications and the provision of proper family planning services for proper spacing and family size. • Regional and community educational programmes should be organized to educate women on the importance of prenatal care. • Access to quality antenatal and health facilities should be accessible in all regions. • More micro level analyses (both Qualitative and Quantitative) in establishing the associations between maternal health and their socio-economic, cultural and programmatic antecedents would go a long way in tailoring services in a locally relevant way. • While we have focused on three key influences of birthweight distribution (maternal factors, the birth outcome of child and obstetric history), there are without a doubt different indicator that impact birthweight. The future researcher has a scope to include more psychological and genetic factors to study the effect on birthweight. • The present study looked at data for the 2014 GDHS only. To generate trends and have a better understanding of the nature of determinants across time periods, a similar analysis needs to be done with previous and future GDHS data. • Given the small numbers of low birthweight, no attempt was made to explore data on whether an associated existed between regional categories and low birthweight. Future studies to consider large sample sizes so that urban and rural dynamics can be explored in greater detail and depth. 128 University of Ghana http://ugspace.ug.edu.gh • Finally, our results center around the status of the newborn child during childbirth, or not long after birth. Along these lines, we can't make any decisions about the connection between low birthweight (or birthweight) and longer-run outcomes, for example, such as cognitive development, educational attainment, and adult health. A more straightforward investigation is by all accounts a valuable bearing for future studies. 129 University of Ghana http://ugspace.ug.edu.gh REFERENCES Abrams, B., Hoggatt, K., Kang, M., & Selvin, S. (2008). History of weight cycling and weight changes during and after pregnancy. Paediatric and Perinatal Epidemiology, 15(4), A1- A1. doi:10.1111/j.1365-3016.2001.381-1.x Antwi – Boasiako Ishmael (2011). Assessing the Risk Factors Associated with Low Birthweight (LWB) And Mean Actual Birthweight of Neonates: A Case Study of St. Martin’s Hospital, Agroyesum. Master of Science. Kwame Nkrumah University of Science and Technology, Kumasi Atinuke O. Adebanji, A., & Puurbalanta R, A. (2015). Determinants of Low Birth Weight Neonates: A Case Study of Tamale Metropolis in Ghana. Journal for Studies in Management and Planning, 90-102.. Avchen, R. N. (2001). Birth Weight and School-age Disabilities: A Population-based Study. American Journal of Epidemiology, 154(10), 895-901. doi:10.1093/aje/154.10.895 Babaei, Z., Rejali, M., Mansourian, M., & Eshrati, B. (2017). Prediction of low birth weight delivery by maternal status and its validation: Decision curve analysis. International Journal of Preventive Medicine, 8(1), 53. doi:10.4103/ijpvm.ijpvm_146_16 Balakrishnan, N., Barnett, V., & Lewis, T. (1995). Outliers in Statistical Data. Biometrics, 51(1), 381. doi:10.2307/2533352 Barker, D. (2002). EDITORIAL: The developmental origins of adult disease. European Journal of Epidemiology, 18(8), 733-736. doi:10.1023/a:1025388901248 Barker, D. J. (2011). Fetal Origins of Adult Disease. Fetal and Neonatal Physiology, 192-197. doi:10.1016/b978-1-4160-3479-7.10018-7 130 University of Ghana http://ugspace.ug.edu.gh Basso, O., Olsen, J., Johansen, A. M., & Christensen, K. (1997). Change in social status and risk of low birth weight in Denmark: Population-based cohort study. BMJ, 315(7121), 1498- 1502. doi:10.1136/bmj.315.7121.1498 Carr-Hill, R., & Pritchard, C. (1985). The Development and Exploitation of Empirical Birthweight Standards. doi:10.1007/978-1-349-07434-1 Chatterjee, S., & Hadi, A. S. (1986). Influential Observations, High Leverage Points, and Outliers in Linear Regression. Statistical Science, 1(3), 379-393. doi:10.1214/ss/1177013622 David W. Hosmer, J., Lemeshow, S., & Sturdivant, R. X. (2013). Applied Logistic Regression. Hoboken, NJ: John Wiley & Sons. Evelyn Yaa Dufie Mensah (2015). Statistical Analysis of Retroviral (HIV) Status and Other Maternal Risk Factors Associated with Low Birthweight and Low Apgar Score of Infants: Evidence from the Greater Accra Regional Hospital. Master of Philosophy. University of Ghana Fonseca, W., Kirkwood, B. R., Barros, A. J., Misago, C., Correia, L. L., Flores, J. A., … Victora, C. G. (1996). Attendance at day care centers increases the risk of childhood pneumonia among the urban poor in Fortaleza, Brazil. Cadernos de Saúde Pública, 12(2), 133-140. doi:10.1590/s0102-311x1996000200002 Hilbe, J. M. (2016). Practical Guide to Logistic Regression. doi:10.1201/b18678 Kanade, A., Rao, S., Yajnik, C., Margetts, B., & Fall, C. (2005). Rapid assessment of maternal activity among rural Indian mothers (Pune Maternal Nutrition Study). Public Health Nutrition, 8(6), 588-595. doi:10.1079/phn2004714 131 University of Ghana http://ugspace.ug.edu.gh Kayode, G. A., Amoakoh-Coleman, M., Agyepong, I. A., Ansah, E., Grobbee, D. E., & Klipstein-Grobusch, K. (2014). Contextual Risk Factors for Low Birth Weight: A Multilevel Analysis. PLoS ONE, 9(10), e109333. doi:10.1371/journal.pone.0109333 Kramer, M. S. (1987). Determinants of Low Birthweight: Methodological Assessment and Meta-Analysis. Bull World Health Organization, 65, 663–737 Leppert, P. C., Namerow, P. B., & Barker, D. (1986). Pregnancy outcomes among adolescent and older women receiving comprehensive prenatal care. Journal of Adolescent Health Care, 7(2), 112-117. doi:10.1016/s0197-0070(86)80006-7 Logan, M. (2010). Biostatistical Design and Analysis Using R. doi:10.1002/9781444319620 MacLeod, S., & Kiely, J. (1988). The effects of maternal age and parity on birthweight: a population-based study in New York City. International Journal of Gynecology & Obstetrics, 26(1), 11-19. doi:10.1016/0020-7292(88)90191-9 Malhotra, N., & Dash, S. (2013). Future of research in marketing in emerging economies. Marketing Intelligence & Planning, 31(2). doi:10.1108/mip.2013.02031baa.001 Murray, Barbara A. (1999) A Statistical Analysis of Low Birthweight in Glasgow. Ph.D. Thesis. Noora Nidhal Saleh (2016). Nature and Causes of Low Birthweight of Babies: A Statistical Analysis. Master of Science. Ball State University Muncie, Indiana. Paul Nesara (2018). Determinants of Low Birthweight in A Population-Based Sample of Zimbabwe. (Doctoral Studies). Walden University, USA Porter, T., Fraser, A., Hunter, C., Ward, R., & Varner, M. (1997). The risk of preterm birth across generations. Obstetrics & Gynecology, 90(1), 63-67. doi:10.1016/s0029- 7844(97)00215-9 132 University of Ghana http://ugspace.ug.edu.gh Sareer Badshah (2007). Exploratory Analysis of Low Birthweight Data from a Survey of Births Delivered During 2003 at Four Main Public-Hospitals in Peshawar. Shea Oscar Rutstein (2006). Guide to DHS Statistic, Demographic and Health Surveys Methodology, ORC Macro Calverton, Maryland Sommerfelt, K., Troland, K., Ellertsen, B., & Markestad, T. (2008). Behavioral Problems in Low-Birthweight Preschoolers. Developmental Medicine & Child Neurology, 38(10), 927-940. doi:10.1111/j.1469-8749.1996.tb15049.x Tampah-Naah, A., Anzagra, L., & Yendaw, E. (2016). Factors Correlate with Low Birth Weight in Ghana. British Journal of Medicine and Medical Research, 16(4), 1-8. doi:10.9734/bjmmr/2016/24881 Tema, T. (2006). Prevalence and determinants of low birth weight in Jimma zone, Southwest Ethiopia. East African Medical Journal, 83(7). doi:10.4314/eamj.v83i7.9448 Wardlaw, T. M. (2004). Low Birthweight: Country, Regional and Global Estimates. UNICEF. UNICEF & WHO: Reduction of Low Birthweight: A South Asia Priority, 2002. United Nations Children’s Fund - Regional Office for South Asia. UNICEF (2008) Global Database on Low Birthweight. Low Birthweight Incidence by Country (1999-2006) UNICEF-WHO (2004) United Nations Children’s Fund and World Health Organization, (2004) Low Birthweight; Country and Global Estimates. UNICEF, New York Wilcox, A. J. (2001). On the importance—and the unimportance— of birthweight. International Journal of Epidemiology, 30(6), 1233-1241. doi:10.1093/ije/30.6.1233 133 University of Ghana http://ugspace.ug.edu.gh Wilcox, A. J., & Skjaerven, R. (1992). Birth weight and perinatal mortality: the effect of gestational age. American Journal of Public Health, 82(3), 378-382. doi:10.2105/ajph.82.3.378 Zupan, J., Åhman, E., & World Health Organization. (2006). Neonatal and Perinatal Mortality: Country, Regional and Global Estimates. WHO. 134 University of Ghana http://ugspace.ug.edu.gh APPENDICES Appendix 1: Group Statistics for Birthweight Reporting Type Reporting Type N Mean Std. Deviation Std. Error Mean Birthweight From mother's recall 719 3.2304 .7292 .0272 From written card 2642 3.1114 .5472 .0107 Appendix 2: Independent Samples Test for Birthweight Reporting type Levene's Test for Equality of Variances t-test for Equality of Means 95% CI of the Sig. (2- Mean Std. Error Difference F Sig. T df tailed) Difference Difference Lower Upper Birthweight Equal variances 86.810 .000 4.790 3359 .000 .1190 .0249 .0703 .1678 assumed Equal variances not 4.076 948.861 .000 .1190 .0292 .0617 .1764 assumed 135 University of Ghana http://ugspace.ug.edu.gh Appendix 3: Boxplot of Birthweight data 136