UNIVERSITY OF GHANA, LEGON Survival Analysis Among Tuberculosis Patients: A Case Study of Adults in Kano State in Nigeria BY IBRAHIM ADAMU (10754228) A THESIS SUBMITTED TO THE DEPARTMENT OF STATISTICS AND ACTUARIAL SCIENCE, UNIVERSITY OF GHANA IN PARTIAL FULFILLMENT OF THE REQUIREMENT FOR THE DEGREE OF MASTER OF PHILOSOPHY, ACTUARIAL SCIENCE May, 2022 University of Ghana http://ugspace.ug.edu.gh Declaration I hereby declare that this submission is my own work towards the award of the Master of Philosophy degree and that, to the best of my knowledge, it contains no material previously published by another person nor material which had been accepted for the award of any other degree of the university, except where due acknowledgement had been made in the text. Name of Student . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Student (107514228) Signature Date Certified by: Dr.Louis Aseidu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Principal Supervisor Signature Date Certified by: Dr.Samuel Iddi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Co-supervisor Signature Date i 11-05-2022 11/05/2022 University of Ghana http://ugspace.ug.edu.gh noblesaintkp860@gmail.com Typewritten text 11-05-2022 Dedication This research project is dedicated to Almighty Allah and his Prophet Muhammad (SAW) for showering me with knowledge, wisdom, understanding, kindness, protection and provision through the course of this study. Also, to my Mum (Hajiya Aisha Aliyu), Late Father (Alh Adamu Maikifi),Late Sister (Hauwa Adamu) and beloved Niece (Nana Aisha Hamisu) who has returned to the Almighty Allah after a brief illness. ii University of Ghana http://ugspace.ug.edu.gh Abstract Tuberculosis (TB) is an infectious disease that has been considered as a signifi- cant risk factor that causes ill health. Globally, it has been found to be among the top 10 causes of death and ranks above HIV/AIDS as a single infectious agent that causes death in patient. Many researches have been documented using semi- parametric and non-parametric models to analyze survival data in Nigeria. There is dearth of studies on the use of parametric models on tuberculosis survival data. Parametric models such as Weibull, Exponential, Log-logistic, Gompertz etc have been used in various studies to analyze data and Weibull was mostly found to be suitable. The popular non-parametric and semi-parametric tests used in various studies include the K-M, Log rank and Cox Proportional hazard model. However, necessary diagnostic checks on model fitness and non-violation of assumptions were mostly ignored. This reduces the reliability of result and increase chance of estimation error. This study assessed the parametric and semi-parametric model of survival such as Cox Model, Weibull, Exponential and Gompertz Models. A retrospective cohort analysis was conducted on the tuberculosis patients receiving treatment under the Tuberculosis & Leprosy Control Program in Kano, Nigeria. The risk factors for death were assessed using the Cox proportional hazard model. The risk factors for death were assessed using the Cox proportional hazard model. The parametric models were compared, and the gompertz model was found to be the best fit for the data based on its minimum AIC & log-likelihood value. Among 2,555 the TB cases, the success rate of TB treatment was 97.06% and the mortality rate was 2.94%. Multivariate analysis showed that HIV, Age & Weight were significant factors associated with mortality in TB patients during therapy. The study recommends the use of diagnostic checks such as Martin- gale, Deviance Residuals in model fitness. Also, comparism of parametric models is recommended in determination of best model that fits tuberculosis data of iii University of Ghana http://ugspace.ug.edu.gh patients. Key words: Survival Analysis, Kaplan Meier, Cox Proportional Hazard Model, Parametric Models, Tuberculosis. iv University of Ghana http://ugspace.ug.edu.gh Acknowledgment The successful completion of any research work depends on the contribution and assistance received from many people to whom I owe a duty to express my warmest gratitude and appreciation. Firstly, I am grateful to Federal University Dutse and National Insurance Com- mission (NAICOM) for finding me worthy of a scholarship to undertake this pro- gramme. My sincere gratitude goes to my Former Vice Chancellor, Prof. Fatima Batul Mukhtar, Dr Fatima Abdulkarim (Former HOD, Actuarial Science Depart- ment) and Dr Aminu Nass Ma’aruf (HOD, Actuarial Science Department), Dr Sanusi Sa’ad (Former Dean, Faculty of SMS), and my colleagues at the Depart- ment of Actuarial Science for the support towards achievement of this goal. My sincere appreciation goes to my thesis supervisors, Dr Asiedu Louis and Dr Samuel Iddi for their constructive criticisms, invaluable advice, and thorough supervision on the content of this research work. Further gratitude to Prof K. Doku-Amponsah (HOD, Department of Statistics & Actuarial Science, University of Ghana) who has been guiding, mentoring and providing assistance towards successful completion of my studies. Also, I appreciate the moral support of Madam Charlotte Chapman-Wardy and other lecturers of the Department of Actuarial Science and Statistics. My unreserved gratitude wholeheartedly goes to my family for their guidance, advice, prayers and financial support especially my sibblings (Hamisu, Usman, Jamila, Tijjani & Jubril). Thanks to my Lovely wife (Halima) & son (Faheem) who I cherish. To my friends, Mahmud Nura Ringim, Wilfred Nettey, Samuella Adams, Anasthasia Kukah, Isaac Essel and rest of my course mates for their support during my studies. Special appreciation to Fred Mawuli Amenu. v University of Ghana http://ugspace.ug.edu.gh Contents Declaration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi Abbreviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Abbreviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Background of the Study . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.4 Significance of the Study . . . . . . . . . . . . . . . . . . . . . . . 9 1.5 Scope of the study . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.6 Organization of Study . . . . . . . . . . . . . . . . . . . . . . . . 10 2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 Tuberculosis Disease (TB) Epidemiology . . . . . . . . . . . . . . 11 2.3 Survival Analysis and TB related deaths . . . . . . . . . . . . . . 13 vi University of Ghana http://ugspace.ug.edu.gh 2.4 Related Literature on Parametric Models . . . . . . . . . . . . . . 24 2.5 Summary of Literature Review . . . . . . . . . . . . . . . . . . . 26 3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.2 Description of Data and Variables of the Study . . . . . . . . . . 28 3.3 Research Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.4 Data Collection and Technique of Sampling . . . . . . . . . . . . 30 3.5 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.6 Analytical Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.7 Ethical Consideration . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.8 Concept of Survival Analysis . . . . . . . . . . . . . . . . . . . . . 32 3.9 Non-Parametric Survival Model: Kaplan–Meier (K-M)Survival Es- timate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.9.1 Assumptions for the Kaplan-Meier (K-M) method . . . . . 36 3.9.2 The Kaplan-Meier’s Product-Limit formula . . . . . . . . . 37 3.10 Hazard Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.10.1 Relationship between Cummulative Hazard and Survival Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.11 Cox Proportional Hazard . . . . . . . . . . . . . . . . . . . . . . . 41 3.12 The Log Rank Test . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.13 Model Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.13.1 Martingale Residuals . . . . . . . . . . . . . . . . . . . . . 44 3.13.2 Deviance Residuals . . . . . . . . . . . . . . . . . . . . . . 45 3.13.3 Schoenfeld Residuals . . . . . . . . . . . . . . . . . . . . . 46 3.13.4 Cox Snell Residuals . . . . . . . . . . . . . . . . . . . . . . 47 3.14 Parametric Models for Estimating Survival . . . . . . . . . . . . . 48 3.14.1 Exponential Distribution . . . . . . . . . . . . . . . . . . . 49 3.14.2 Weibull Distribution . . . . . . . . . . . . . . . . . . . . . 50 3.14.3 Gompertz Distribution . . . . . . . . . . . . . . . . . . . . 52 vii University of Ghana http://ugspace.ug.edu.gh 3.15 Parameter Estimation for the Proportional Hazard models . . . . 53 3.16 Model Selection Criterion . . . . . . . . . . . . . . . . . . . . . . 53 3.17 Summary of Statistical Modelling Technique and Analytical Tools 54 4 Result & Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.2 Classification of Variables . . . . . . . . . . . . . . . . . . . . . . 55 4.3 Descriptive Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.4 Comparing Survival of Categorical Variables using K-M Curves . 59 4.5 Log Rank Tests to Compare Survival among groups . . . . . . . . 63 4.6 Cox Proportional Hazard Model . . . . . . . . . . . . . . . . . . 64 4.6.1 Proportionality of Hazard Assumption in the Cox Model . 66 4.7 The Schoenfeld Residual Plots for the Categorical Variables . . . 67 4.8 Assessment of linearity for Continuous Variables in the Model . . 69 4.9 Model Diagnostics using Residual Plots . . . . . . . . . . . . . . . 70 4.10 Comparing the Hazard Ratios of Various Models . . . . . . . . . . 72 4.11 Analysis of the Models Using AIC and Log-Likelihood Values . . . 73 5 Summary, Conclusion & Recommendation . . . . . . . . . . . . . 76 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.2 Summary & Conclusion . . . . . . . . . . . . . . . . . . . . . . . 76 5.3 Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Appendix B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 Appendix C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 viii University of Ghana http://ugspace.ug.edu.gh List of Abbreviation PHM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Proportional Hazard Model K-M . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kaplan Meier WHO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . World Health Organization TB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tuberculosis HIV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Human Immunodeficiency Virus AIDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acquired Immune Deficiency Syndrome FoMH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Federal Ministry of Health DOTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . Directly Observed Treatment Short Course AFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Accelerated Failure Time ART . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Antiretroviral Therapy CPT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cotrimoxalone Preventive Therapy AIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Akaike Information Criterion CI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Confidence Interval NTLCP . . . . . . . . . . . . . . . . .National Tuberculosis and Leprosy Control Program HR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hazard Ratio TB/HIV . . . . . . . . . Tuberculosis and Human Immunodeficiency Virus Infection MDR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multi-Drug Resistance Tuberculosis HAART . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Highly Active Antiretroviral Therapy ix University of Ghana http://ugspace.ug.edu.gh List of Tables 4.1 Description of Variables Used in the Data . . . . . . . . . . . . . 56 4.2 Demographic and Clinical data of TB patients . . . . . . . . . . . 57 4.3 Summary of Survival Data . . . . . . . . . . . . . . . . . . . . . . 58 4.4 Test statistic of the Log rank for Tuberculosis patients . . . . . . 63 4.5 Cox Proportional Hazard Model for Univariate Analysis of Variables 64 4.6 Multivariate Cox Proportional Hazard Model . . . . . . . . . . . . 65 4.7 Multivariate Cox PHM for the significant variables . . . . . . . . 66 4.8 Test of Proportionality Assumption of the Cox Model . . . . . . . 66 4.9 Cox and Parametric Models of the Proportional Hazard . . . . . . 72 4.10 Akaike Information Criterion & Log-likelihood of Parametric Models 73 4.11 Multivariate Gompertz Proportional Hazard Model . . . . . . . . 74 5.1 Multivariate Weibull Proportional Hazard Model . . . . . . . . . 97 5.2 Multivariate Exponential Proportional Hazard Model . . . . . . . 98 x University of Ghana http://ugspace.ug.edu.gh List of Figures 4.1 Kaplan Meier Curve of Survival for TB Patients . . . . . . 59 4.2 Kaplan Meier Curve of Survival for Gender Category of Patients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.3 Kaplan Meier Plot for Age Category of Patients . . . . . . 61 4.4 Kaplan Meier Curve for Pre-treatment Weight of Patients 61 4.5 Kaplan Meier Curve for HIV Status of Patients . . . . . . 62 4.6 Kaplan Meier Curve for Type of TB . . . . . . . . . . . . . 62 4.7 Schoenfeld Residual Plots for Type of TB Variable . . . . 67 4.8 Schoenfeld Residual Plots for Gender Variable . . . . . . . 68 4.9 Schoenfeld Residual Plots for HIV Variable . . . . . . . . . 68 4.10 Martingale Residuals Plots for Weight Linearity . . . . . . 69 4.11 Martingale Residuals Plots for Age Linearity . . . . . . . . 70 4.12 Martingale Residuals Plots . . . . . . . . . . . . . . . . . . . 70 4.13 Deviance Residuals Plot . . . . . . . . . . . . . . . . . . . . . 71 4.14 Plot of Estimated Cumulative Hazard against Cox Snell Residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 xi University of Ghana http://ugspace.ug.edu.gh Chapter 1 Introduction 1.1 Background of the Study Tuberculosis (TB) is an infectious disease that has been considered as a significant risk factor that causes ill health. Globally, it has been found to be among the top 10 causes of death and ranks above HIV/AIDS as a single infectious agent that causes death in patient (WHO, 2017) cited in Asgedom et al. (2018). The contagious bacillus mycobacterium tuberculosis is responsible for the disease. TB can be easily contacted when a person is exposed to the bacteria expelled through coughing by a TB infected person. Based on the affected sites, TB can be broadly categorized as either pulmonary or extrapulmonary. The Tuberculosis of the lungs is called Pulmonary TB while TB that affects other parts of the body such as abdomen, bone, spinal cord etc is referred to as Extrapulmonary TB. According WHO (2019) many people face the risk of developing TB disease as studies have found that out of the global population, mycobacterium TB has infected about 25% (WHO, 2019). Tuberculosis is still a major public health issue around the world. Despite the fact that attempts to control the pandemic have lowered mortality and incidence, there are a number of predisposing factors that should be changed in order to reduce the disease’s burden such as illicit drug abuse, alcoholism, smoking etc. this can be achieved through the evaluation of illicit drug user’s behavior and establishment of policies that can intervene in controlling the spread of the disease (Silva et al, 2018). 1 University of Ghana http://ugspace.ug.edu.gh Globally, Tuberculosis (TB) remains a significant issue that has bedeviled public health system. However, the discovery of chemotherapy has been effective in controlling the worsening effect of the disease. Studies have shown that large number of deaths recorded annually is still attributable to the Tuberculosis. Early diagnosis or notification of suspected cases (especially in less developed countries) still remains a major challenge even countries with advanced medical facilities still face such challenge (Smith et.al, 2006). Despite scientific advances aimed at reducing the disease’s negative effects, millions of people contract tuberculosis each year (WHO, 2018). From the total deaths in world population recorded in 2017, about 1.6 million deaths are attributed to TB infection. TB-HIV co- infected patients accounted for only 18.75% out of the 1.6 million TB related deaths while Patients with negative HIV accounted for 81.25% of the TB related deaths (WHO, 2018). Tuberculosis affects both male and female gender and can be present in both adults and children. Most studies found the TB to be predominant among adults (male in most cases) especially those that are above 15 years of age. These adults accounted for 57% of all TB cases while women accounted for 32% of TB related deaths in 2018. Due to advancement in research, early detection and treatment of HIV/AIDS patient, deaths among patients who are con-infected with TB accounted for less than 9% of all global TB related deaths (WHO, 2019). In 2014, a total of 1.5 million of global deaths were TB related. Most of these deaths are found to have occurred within sub-Saharan Africa and South-East Asia (WHO, 2015). According to McNerney (2012) cited in Adamu et.al (2017), infection with HIV, rising drug resistance and co-morbidities like diabetes, as well as social deprivation (such as poverty and illiteracy), further amplified by out- dated diagnostics, and treatment are significant risk factors that leads to increase in TB infection. There are still cases of large under reporting of TB cases globally. The gap has 2 University of Ghana http://ugspace.ug.edu.gh been estimated at 3 million confirmed cases of TB not reported. Out of this number, 80% are found within ten countries. Nigeria represents 12% of the top four countries, others are India, Indonesia and Philippines with 25%, 10% and 8% respectively. Nigeria ranks second among countries that accounted for the highest gap in reported TB with 13% below India among top four countries identified. As a result, Nigeria has Africa’s highest tuberculosis burden and one of the world’s largest disparities between estimated and reported cases (WHO, 2019). The year 2020 witnessed the outbreak of the novel Coronavirus Disease (COVID- 19). COVID-19 is a respiratory infection caused by the coronavirus 2 that causes severe acute respiratory syndrome (SARS-CoV-2) according to Lai et al (2020) cited in Mustapha et al (2020). This virus mostly affects the lungs and causes illness such as common cold. The disease is highly contagious and easily transmis- sible. An infected person can easily transmit the virus through coughing similar to tuberculosis. The WHO in first quarter of 2020 labelled it a pandemic after a surge in cases recorded across the globe. Limited access to safe, effective, high quality, and inexpensive medicines exacerbates these illness loads (WHO, 2020). In 2020, an estimated 1.8 million individuals would die from TB disease alone with an additional 200,000 to 400,000 deaths based on a model developed by WHO. This additional deaths projected to occur in 2020 if there is a 25% de- cline in detection and treatment of TB infected patients over a period of three month largely due to actions/inactions of government in prioritizing the control of COVID spread (WHO, 2020). Stop TB partnership projected 1.3 million ad- ditional deaths in TB patients to occur between 2020 and 2025 due to effect of stringent lockdown measures implemented in 2020. It has been reported that there are various declines in cases notification among 14 countries rated as high burden between January and June 2020. India, Indonesia, the Philippines, and South Africa accounted for 44% of worldwide tuberculosis infections in 2020, with a more than 25% decrease in TB notification and an 80% decline in daily 3 University of Ghana http://ugspace.ug.edu.gh notification cases (especially in periods immediately after lockdown imposition) compared to 2019 (Ravelo, 2020). During the lockdown between period of March and April, India recorded its major decline in TB notification in 2020 before even- tually picking up while that of Indonesia was between March and May of the same year. There was also a decline in TB notification cases in the Philippines from January until an eventual pick up from April 2020. On the other hand, South Africa’s TB notification drop was observed from March to June 2020 (Ravelo, 2020). According to report released by Devex (2020), people’s avoidance of health fa- cilities, insufficient number of health facilities for TB cases, disruptions in the acquisition and transportation of medical supplies, movement restrictions com- bined with partial and full lockdowns imposed by the government, and loss of livelihood due to a decline in economic activities are all possible reasons for a de- crease in TB notification cases. COVID-19 has the potential to raise the number of persons getting tuberculosis by more than one million per year between 2020 and 2025. The report further stated that “Although physical distancing policies may help to reduce TB transmission, this effect could be offset by longer dura- tions of infectiousness, increased household exposure to TB infection, worsening treatment outcomes, and higher levels of poverty” (Ravelo, 2020). Inadequate infrastructure and weak healthcare systems have hampered responses to African epidemics, including a lack of proper monitoring to establish the scope of the outbreak and insufficient mechanisms to prevent, diagnose, and treat in- fections (Mustapha et al, 2020). The various lockdown measures taken to curb the wide spread of the disease in Nigeria has affected economic activities, create fear and anxiety among people. There were several mysterious death recorded in Kano and the exact cause is yet to be established during the lockdown period. Although many attributed the cause of death to the coronavirus disease but not scientifically proven which is largely due to inability for autopsies to be conducted 4 University of Ghana http://ugspace.ug.edu.gh on the dead. The relationship between the “mystery” disease and COVID-19 was met with a slew of ambiguity, equivocation, and denial in Kano, with state offi- cials dismissing any link between deaths from the strange disease and COVID-19 (Nwozor et al, 2020). Other observers blamed the deaths on a shortage of medical care due to the state’s shutdown of health facilities (Kanabe, 2020). In the same vein, one may see that inability to access health facilities during this period may slow down efforts in detecting TB patients and affect their ability to access drugs. There may be increased number of loss to follow up during this period compared to the previous. The coronavirus disease outbreak and measures taken to curb it may have impacted on the ability of TB patients to easily access the hospital facility for regular drugs or diagnosis to confirm TB infections. In epidemiology, data of participants are being analyzed using survival analysis after they have been followed up to a particular time when an event of interest has occurred or otherwise. The best statistical method usually applied in this type of survival analysis is the Kaplan Meier estimate. This methods helps to analyze the data and to also compare survival among independent groups such as control and treatment groups. To compare survival probability among two or more independent groups, the log rank test statistic is used. It is used to test the null hypothesis that the survival between two or more independent groups are equal (IIker, Sulaiman & Rukayya, 2017). Survival times are time intervals measured from a certain starting point to the occurrence of a specific event, such as from the time of diagnosis of a disease to the time of death (Bewick, Cheek & Ball, 2004; Goel, Khanna & Kishore, 2010). During survival studies or time to event study, some event may not have occurred (e.g. death) during the follow-up time. This may be due to Loss to Follow Up or the subject has not suffered the event of interest at end of the study period. Such subjects are considered censored and this characteristics is a unique feature in survival studies. 5 University of Ghana http://ugspace.ug.edu.gh Survival analysis can also be applied in a business setting. Risk factors that can affect the survival of a business can be assessed using the multivariate Cox Regression Model. In a study conducted to predict risk factors associated with bank failures in Nigeria, the Cox Proportional hazards was adopted. Data were obtained from financial statement of banks for a period of 9 years. The researchers identified 12 risk factors that can lead to bank failures. The study recommended that regulators should design specific policies to address these factors to avoid collapse of banks which can adversely affect the economy (Babajide, Olokoyo & Adegboye, 2013). To compare survival among two or more independent groups, the Kaplan-Meier Survival Estimate is used. Kaplan-Meier analyses can be used in a variety of fields, including medicine, engineer, economics, physics, and demography. Kaplan- Meier’s example can be seen in cancer studies, where patients are monitored for a set period of time until they either die, relapse or drop out of the research dependent on the event of interest. Patients who drop out of the study or are considered censored. The Kaplan-Meier approach, which is non-parametric, can be used to determine the percentage of patients that lived beyond a certain time period (IIker, Sulaiman & Rukayya, 2017). Several statistical tests have been developed to compare survival among groups such as Peto & Peto (1972), Tarone & Ware (1977), Kalbfleisch & Prentice (1980), and Cox & Oakes (1984). The proportion of subjects living within a given period of time can be best esti- mated using the Kaplan Meier. The effect of an intervention or medication ad- ministered on patients is usually assessed by measuring the number of survivors during the period of the intervention (Armitage, Berry and Mathews, 2002). 6 University of Ghana http://ugspace.ug.edu.gh 1.2 Problem Statement Over the years, there has been strenuous efforts by both local and international organisations in collaboration with various government agencies to curb the men- ace of TB. However, the disease continues to constitute a major health challenge that has led to several deaths in the world. More efforts are required to combat this deadly virus. The need to equip our health facilities cannot be overempha- sized as it can help in accurate diagnosis of presumptive cases as can be seen in the use of GenExpert machines to detect traces of TB. Also, having qualified medical practitioners with proper training that can improve clinically confirmed cases for immediate treatment (Murray, 2018). In Nigeria, The incidence of TB increased by almost 3% in 2018 as compared to 2017 which showed a contrast with reported improvement worldwide. Con- sequently, fatality rates also increased to 157,000 in 2018 from reported case of 155,000 in the previous year. There is insufficient treatment of TB infected pa- tients which stood at less than 25% of the entire confirmed cases during these periods (Adepoju, 2020). The management of TB patients in an effective manner is vital towards improving survival among patients (especially those with HIV/AIDS co-infection). lack of proper health care to such patient can lead to increase in spread of the disease which can hinder economic growth as infected patients may not be able to be engaged in productive activities. Identification of the health risk factors can go a long way in drug administration and prioritizing the patients based on those identified to have a severe risk factor that can easily cause death or suffering to such patient if not urgently attended to. identifying major risk factors through research can help medical practitioners in curbing the menace of the virus and saving lives. 7 University of Ghana http://ugspace.ug.edu.gh Although several studies have been conducted on TB treatment outcomes in Nigeria (Alobu et al., 2014; Adamu et al., 2017; Dauda 2010; Fatiregun et al., 2009; Ifebunandu et.al, 2012; Ige & Akindele, 2011; Ukwaja et.al, 2014; Peters et.al, 2004; Salami & Oluboyo, 2003; Michael & Bolarinwa, 2020). Most of these studies use non-parametric and semi-parametric models for the survival analysis with only few studies applying the parametric models without employing diagnosis on the model such as assumption of hazard proportionality of the cox model, linearity of continuous variable or assess goodness of fit for the model. Failure to carryout model diagnostic checks and assumptions makes the result less reliable and increases the probability of errors. This study will also compare survival of patients using the multivariate analysis methods of both parametric (such as Gompertz, Weibull, Exponential Model) and the semi-parametric (Cox) proportional hazard models. Necessary statistical tools will be employed to select the best model that fits the data of TB patients. 1.3 Objectives The general objective of this study is to estimate the survival probability of TB patients receiving treatment in various health facilities in Kano State Metropolis, Nigeria. Specific objectives of the study is to: � Estimate survival times of TB patients using non-parametric survival model. � Compare the survival functions of TB patients in Kano State � Assess the effect of risk factors for mortality of TB Patients in Kano State using the Cox Proportional Hazard Model. � Compare Parametric models of the Proportional hazard by their log-likelihood 8 University of Ghana http://ugspace.ug.edu.gh and AIC values to select the best model that fits the tuberculosis data. 1.4 Significance of the Study It will assist policy makers, health care professionals and public in creating further awareness on the risks associated with TB that can lead to premature deaths. It will assist in using the appropriate survival model to assess risk factors asso- ciated with death in patients and make necessary policy to improve survival of patients. This study can also help policy makers in closing the widening gaps in TB reported cases so as to increase activities that can curb the menace of the contagious disease. The results of this study may provide useful information helpful to health care professionals, clinicians, policy makers, and health educators, and enlighten the public on TB health risk factors that can adversely affect survival of patients. The appropriate model to be used in analyzing tuberculosis data will be recom- mended from the compared parametric and non-parametric models. This will help in adopting the best model in analyzing tuberculosis data by policy makers, healthcare professionals and researchers. 1.5 Scope of the study This is a retrospective cohort study of adult TB patients registered for treatment at various DOTS (Directly Observed Treatment-Short Course) Unit of a licensed TB treatment facility within metropolitan area of Kano State, Nigeria for the period of January 1, 2019 to December 31, 2020. Patients record are followed up to July 31, 2021.The event of interest is time to death recorded during the period of study among the infected patients as per the TB register. 9 University of Ghana http://ugspace.ug.edu.gh 1.6 Organization of Study The study will be broken down into five sections. The context of the study, problem statements, explanation of objectives, and significance of the study in- cluding scope of the investigation are all covered in the first chapter. Chapter two contains existing and relevant literature reviews of scholarly publications on the subject matter (conceptual and empirical reviews) as well as a discussion of the underlying theory. Chapter three discusses the researcher’s method for presenting the study’s find- ings. The research design, sampling technique, source of data, statistical tech- niques in data analysis and analytical tools will all be covered. Chapter four presents the results, analysis and discussions of findings from the research. The fifth chapter will summarize the findings, draw pertinent conclusions, and provide recommendations based on findings. At this point, the researcher will assess the data collected and, if necessary, consult supplementary literature to back up or refute the findings. 10 University of Ghana http://ugspace.ug.edu.gh Chapter 2 Literature Review 2.1 Introduction This chapter discusses tuberculosis infection from conceptual standpoint, and empirical review of risk factors associated with TB deaths among patients. It also provides some explanations on the survival model to be employed in the study and other test statistics relevant in analysing the data for this study. The review also summarizes the findings of previous studies on this topic. 2.2 Tuberculosis Disease (TB) Epidemiology Tuberculosis is a fatal disease that is highly contagious. It ranks above HIV/AIDS as a single infectious virus that has led to several deaths in the world. Once a per- son contracts the virus, it weakens the immune system thereby causing sickness in patients. According to WHO (2017) cited in Asgedom et al. (2018), Tuberculosis is one of the diseases that contribute to global death rates. Mycobacterium tu- berculosis is the bacteria that causes tuberculosis. TB is an airborne disease that can be easily contracted through close contact with an infected person especially when such person released droplets of the bacteria into the air by coughing. It can affect several part of the body such as lungs, bone, spinal cord, neck etc. Most cases of tuberculosis reported have found to be Pulmonary (TB of the lungs) with few cases of Extrpulmonary (TB of other parts of the body excluding the lungs). About 25% of the world’s population stands at risk of developing tuberculosis 11 University of Ghana http://ugspace.ug.edu.gh as studies have shown that these estimated persons are infected with Mycobac- terium tuberculosis (WHO, 2019). According to WHO (2019) in its Global Report, transmission of TB can be curbed when infected patients are diagnosed and treated early. Treatment usually re- quires an administration of drugs to TB patients for a period of 6 months. In treatment of the disease, the first-line antibiotics have been found to be effec- tive with most patients being cured after treatment completion with few deaths or relapse cases being reported. Tobacco smoking, diabetes and HIV are found to be significant health risk factors that increase TB infection among patients. Other determinants of TB infection include poverty and malnutrition. TB related deaths can be reduced when measures are taken to prevent latent TB infection, improved quality of life and reduction in risk factors that can affect health. According to the Nigerian 2019 Annual TB Report issued by the Nigerian Federal Ministry of Health (FMoH), “Nigeria has the highest burden of TB in Africa and is among the eight countries that accounted for two thirds of the Global TB burden”. Nigeria is ranked 6th out of 30 countries with high TB burdens, as well as first in Africa with the greatest TB burden. In addition, the country is one of the top three global high burden countries, with high TB, TB patients co-infected with HIV and MDR-TB (Multi Drug Resistance TB) mortality rates. The report further shows that TB is a major public health challenge in Nigeria that needs to be dealt with decisively (FMoH, 2019). The disease can be seen in both pulmonary and extrapulmonary areas of the human body. Around 80% of TB cases, which mostly affect the lungs, are lo- cated in the former, whereas 20% of cases are found in the latter. The infection of other organs via the circulation, lymphatic vessels, or proximal transmission from one organ to another is known as extrapulmonary tuberculosis. Management of 12 University of Ghana http://ugspace.ug.edu.gh tuberculosis has usually been issued via guidelines and recommendations of the WHO which is regularly updated in line with recent findings. The guidelines clearly states the treatment regimens, approved anti tuberculosis drugs, and the dosage of anti tuberculosis drugs (WHO, 2019). 2.3 Survival Analysis and TB related deaths Ajagbe, Kabir and O’Connor (2014) conducted a study of adult TB patients in Ireland. Survival Analysis tools were used to analyze the survival data. It was a retrospective cohort study where data of 647 confirmed TB cases were re- viewed. Medical records of adult TB patients that have been bacteriologically confirmed were obtained from two teaching hospitals. Health risk factors that may likely affect survival were also obtained for the analysis. These risk factors were used as predictor variables in the Cox regression Model to determine impact of such factors on survival and in computing the hazard ratio. In the univariate model, survival among independent groups were estimated and compared using the Kaplan-Meier method. In comparing survival curves between male and fe- male gender, the study found no significant difference. Also, the K-M curves found no significant difference among the age groups. However, survival time of men is less than that of women. This is supported by the higher hazard ratio of men as compared to women. Proper medication for identified risk factors such as Anti-diabetic medication and other immunosuppressive drugs increases survival time. Decreased survival time is found in patients who engage in alcohol con- sumption & tobacco smoking (Ajagbe, Kabir & O’Connor, 2014). Adamu et al. (2017) investigated the cause of deaths among TB Patients in Nigeria. The researchers’ analysed data for 5 years from records of TB Patients 13 University of Ghana http://ugspace.ug.edu.gh receiving treatments at a large tertiary hospital. The study found that 16.6% of TB Patients died during the period after commencing TB treatment and most of the deaths occurred within the period of 0-1 month of treatment initiation. Risk factors associated with deaths in TB patients was evaluated using Cox Pro- portional Hazards model. It was discovered that TB/HIV co-infected patients have higher risk of death than those who are not (aHR 1.39(CI:1.04–1.85)), Pa- tients who reside not within the city (aHR 3.18(CI:2.28–4.45)) are at higher risk compared to those who have close proximity to the health facility in the city. Patients with previous tuberculosis therapy (aHR 3.48(CI:2.54–4.77)) have lower survival than new patients. Patients on treatment based on clinically suggestive TB have increased risk of death than those who were bacteriologically confirmed (aHR 4.96(CI:2.69–9.17)), there is 1.45 times increased risk of death in patients with both pulmonary and extrapulmonary TB than those with one of the two types of TB (aHR 1.45(CI:1.03–2.02)), and patients that were on referral from other health facilities other than one being monitored by TB programme have higher risk of death than those were receiving treatment from a health facility coordinated by the TB programme (aHR 3.02(CI:2.01–4.53)). The study con- cluded that patients who were earlier diagnosed with TB and placed on ART have lesser risk of deaths from the disease compared to those not receiving ART, the distance from place of residence to the treatment centre hinders patients from accessing the medication. This may be due lack of funds to transport patient to such facility. In high burden settings, TB related deaths among patients occur early during the period of therapy. Studies have shown that deaths are mostly recorded during the first 2 months of treatment and is prevalence among adult patients (<49 years). Also, poverty, HIV co-infection in TB Patients and low body mass index (BMI) have been found to be contributory factors in increased death rates among TB patients. Prolonged illness caused by having advanced diseases in patients such as HIV (< 200 CD4 cell count) in adult patients, children below the age of 5, 14 University of Ghana http://ugspace.ug.edu.gh acute malnutrition etc. affects survival of TB patients. Most deaths during early stage of TB-HIV co-infected patients that are receiving therapy are caused by TB infection rather than HIV (Bhargava and Bhargava, 2020). A prospective cohort research was conducted among patients from various treat- ment centres in Tanzania to assess risk variables linked to deaths in patients with pulmonary tuberculosis. Survival among HIV co-infected patients and non-HIV infected patients were analyzed. Patients were followed up from time of treatment therapy to the period of treatment completion in Tanzanian health facilities by Nagu et al. (2017). The study found that two third of deaths among tuberculosis patients died during the first two months of treatment (intensive phase) with 67% of total deaths among patients occurring early in the study. During the study, 1,696 patients were enrolled out of which 58 deaths (3.4%) were recorded. The mortality risk among patients co-infected with HIV who commenced antiretro- viral therapy (ART) within two weeks after initiating TB treatment was less as compared to patients who are not co-infected with HIV (RR = 3.55; 95% CI: 1.44, 8.73 p < 0.0001). The risk was higher in patients who commenced ART within 3 months before anti-TB therapy (RR = 10; 95% CI: 3.28, 30.54; p < 0.0001). A prospective cohort study conducted in Nigeria on Tuberculosis patients with HIV co-infection receiving treatment at an Infectious Diseases Centre found that efficacy rate was low as only 40% of patients within the cohort were cured of the disease after 8th month when treatment has been completed. Additionally, most of the cured patients from the disease initially had a negative sputum for AFB after the 2 months intensive phase of regimen. Most patients are found to maintain the negative sputum status after the intensive phase with only few additional patients (not more than 6%) reporting a negative sputum during the continuation phase and at completion of treatment. After a positive sputum AFB test observed for patients at the intensive phase, the chance of getting completely 15 University of Ghana http://ugspace.ug.edu.gh cured is low for such patients at the continuation or completion stage. Hence, there is need for improved drugs administration that can be more effective in curing TB-HIV patients (Dauda, 2010). A study found that mortality rates in TB patients that are not engaged in to- bacco smoking is less as compared to TB patients who are smokers. The risk of death among TB patients is significantly higher among smokers. Hazard rates in TB patients that have history of smoking is 9 times higher than non-smokers. An important factor that contributes to reduction in mortality cases among TB patients is ability to quit smoking (Mollel & Chilongola, 2017). The risk of TB related deaths is significantly reduced when smokers quit smoking. Termination of smoking by patients under treatment therapy has significant impact in reduced mortality among TB patients. This is supported by finding in the study which shows a 65% drop in TB related deaths among patients who quit smoking (Mollell & Chilongola, 2017). Previous history of TB (especially among men) is found to be related with in- crease in risk of deaths among TB Patients. There have been reports of lung damage after TB patient has completed treatment or cured. Impaired lungs that caused difficulty in breathing after completion of TB therapy has been found among men in Brazil (Menezes et al., 2007). Most patients who reported to have experienced difficulty in breathing had a previous history of TB treatment. The study conducted in Brazil found that the risk in experiencing the event of airway obstruction among men with previous TB treatment is 4.1 times greater than those who had no prior history of being treated for tuberculosis. The result of the finding was not altered after adjusting for several predictor factors associated with airway obstruction such as literacy level, smoking, exposure to smoke or dust, history of childhood respiratory disease, gender and age of subjects in the study. The study concluded that airway obstruction is prevalent in advanced age 16 University of Ghana http://ugspace.ug.edu.gh adults who had prior history of tuberculosis infection (Menezes et al., 2007). Several studies have shown a correlation between risk of tuberculosis infection and diabetes mellitus. The risk of tuberculosis infection increases for patients diagnosed with Diabetes Mellitus (DM). Chances of progressing from latent to active tuberculosis is higher among Diabetes Mellitus Patients (WHO, 2016). Ac- cording to case-control studies, the risk of contracting tuberculosis is 2.44 to 8.33 times higher in individuals with DM than in those without (Shetty et al., 2006 & Jabbar et al, 2006). Diabetes Mellitus Patients have a higher risk of being infected with Tuberculosis. In a comprehensive evaluation of 13 observational studies, Jeon & Murray (2008) cited in Silva et al (2018) discovered that diabetes mellitus patients have higher risk of being infected with tuberculosis or to develop tuberculosis illness with 3 times higher risk of TB infection as compared to patients who have not been diagnosed with diabetes mellitus (relative risk = 3.11; CI: 2.27-4.26). Thus, DM is an important risk factor that can cause TB infection among patients. In a related study, individuals with a hemoglobin A1c (HbA1c) level not lower than 7% had three times the risk of TB infection as those with a HbA1c level less than 7% (HR = 3.11; CI: 1.63-5.92). The risk of developing tuberculosis is higher among patients with increased insulin intake compared to those who use less of insulin in diabetes treatment. The risk is twice as high among patients who take more dosage of insulin (>40 units) than those who use less (Dooley & Chaisson, 2009) cited in Silva et al (2018). Shimazaki et al (2013) examined the factors relating to deaths among hospitalized Tuberculosis patients that are HIV negative in Philippines during a 3 month study conducted in 2009. 151 out of 403 patients died while receiving treatment in the hospital (37.5%) due to tuberculosis. Bacterial pneumonia, anorexia, anaemia 17 University of Ghana http://ugspace.ug.edu.gh and older age are found to be risk factors significantly associated with deaths among admitted patients. Poor urban area in Philippines recorded high case of TB patients’ mortality. Risk factors associated with deaths among hospitalized TB patients was analyzed during the study. It was found that complications of bacterial pneumonia is a significant factor associated with deaths in hospital- ized patients (aOR 4.53, 95%CI: 2.65–7.72). Pneumonia which mostly affects the lungs, is the highest factor that causes deaths in patients. Anorexia (aOR 3.01, 95%CI: 1.55–5.84), anaemia of less than 10 g/dl haemoglobin level (aOR 2.35, 95%CI: 1.34– 4.13) and older age (aOR 1.85, 95%CI: 1.08– 3.17) are the next significant factors associated with deaths. Haemoptysis was also linked to a higher chance of survival (aOR 0.44, 95%CI 0.25–0.80). Due to recall bias as exact moment of onset of disease could not be confirmed, the study could not ascertain the existence of any significant association between the duration from detection of TB infection to time of hospitalization and mortality. Lack of proper diagnosis leads to increase in poor treatment outcome among TB patients. In a study on TB patients’ treatment and determinant of outcomes in Nigeria, it was discovered that patients who were not properly diagnosed of the disease have a higher risk of poor treatment outcome, and male patients have a higher risk of poor treatment outcome than their female counterpart (Fatiregun et.al, 2009). In a similar study, Hayibor, Bandoh, Asante-Poku, and Kenu (2020) found HIV/TB co-infection, older age, previous TB treatment and category of patients to be sig- nificant in determining patients’ TB treatment outcome in a study conducted in Accra. Type of TB was not significantly related to adverse treatment outcome in TB patients. According to Subramani et al (2008) cited in Bhargava and Bhargava (2020), TB burden and related mortality is higher in both low and middle income countries. 18 University of Ghana http://ugspace.ug.edu.gh The study found that deaths in younger age groups in these countries is higher as compared to adult patients who are much older (> 44 years). In rural South India, TB related deaths among patients in the age group of < 44 years was found to be 12 times higher than those expected in the entire population. Asgedom et al. (2018) conducted a study in Ethiopia to assess the survival and risk factors that affects survival among tuberculosis (TB) patients. Death was the outcome of interest. Place of residence, age, gender, pre-treatment weight, type of therapy (Antiretroviral or cotrimoxazole preventive therapy), Type of TB, TB/HIV co-infection, year of anti-TB therapy and TB category at presentation were the predictor variables. Kaplan Meier, Cox Proportional were used in the univariate and multivariate analysis respectively. To compare survival among groups, Log Rank, Tarone-Ware and Generalized Wilcoxon (Breslow) Tests were used. Using Kaplan Meier, TB patients infected with HIV on either ART (an- tiretroviral therapy) or CPT (Cotrimoxalone Preventive Therapy) treatment and Type of TB were found to be significantly associated with survival (p < 0.05). Also, Results from cox proportional hazard models showed that site of TB and CPT were significant predictors of survival. Extra-pulmonary TB patients were 17 times more likely than pulmonary TB patients to have the event of interest (HR = 17.38, 95% CI; 3.88– 77.86, p < 0.001). Furthermore, patients with TB/HIV co-infection who were receiving CPT had an 85% lower risk of dying from TB infection than those who were not on CPT (HR = 0.15, 95% CI; 0.03–0.74, p = 0.02). A study conducted on timing and causes of deaths among TB Patients in South Africa by Field et al. (2014) found that TB related deaths was higher among patients during the first month after treatment initiation. Also, Patients with HIV Positive not on ART have higher mortality risk than TB-HIV patients who have initiated treatment. Risk factors associated with deaths in TB Patients was found to include older age, previous history TB, HIV Positive, pulmonary TB, 19 University of Ghana http://ugspace.ug.edu.gh and uncertainty in TB diagnosis (Field et al., 2014). Age, gender, type of tuberculosis and drug susceptibility profile affects the risk of death due to Tuberculosis disease. However, age as a predictor variable of risk of death varies according to countries. In some European and Asian countries, mortality has been found to be associated with older people as people above 45 years recorded more mortality in TB Patients (Lee et al., 2017 & Shuldiner et al., 2014). Bajehson et al. (2019) studied the factors that are related to deaths among patients with Multidrug resistant Tuberculosis in Northern Nigeria. Survival data were analyzed using the non-parametric model (Kaplan-Meier) and semi- parametric (Cox proportional hazard models) were used to assess impact of co- variates on survival. During the study, data of 147 confirmed patients with Drug resistance were analyzed and 25% of DR-TB patients died. Probability of mor- tality increases among TB patients with HIV co-infection. Delays by patients in receiving treatment after initial diagnosis of confirmed TB case reduces their sur- vival probability. There was a significant negative association between survival and delay in treatment initiation of patients. The study also found that survival probability of patients is not significantly influenced by the mode of care (facility or community based). Many studies have considered HIV as predictor variable in assessing the incident of Tuberculosis infection (Akesa et al., 2015; Ifebunadu & Ukwaja, 2012; Adamu et al., 2017). According to a study by Pathmanathan et al. (2017) the use of ART (antiretroviral therapy treatment) for HIV infected patients reduces mortality and morbidity among Tuberculosis patients. However, the use of ART does not grant immunity to patients against TB infection. ART administration issued to HIV patients serves as treatment to such patients and they can get co-infected with TB while on ART. The study found that the rate of TB infection among HIV patients on ART was low which may be related to the awareness of such risks 20 University of Ghana http://ugspace.ug.edu.gh among patients after ART initiation. Advanced HIV, history of previous TB infection and non-treatment of suspected TB cases are found to be predictors of TB incidence among HIV Patients (Pathmanathan et al., 2017). Lin et al. (2014) found that most TB related mortality in patients occurred within three weeks during the study and are caused by septic shocks. The study was conducted to ascertain deaths among TB patients and classify deaths based on TB and non-TB related deaths during the 5 year retrospective study. Other co-morbidities found to be predictors of TB deaths include malignancy, liver cirrhosis, renal failure, cavitary and radiographic patterns (military and pneu- monic) etc. the study concluded that most deaths recorded among tuberculosis patients are non-TB related as such timely recognition of clinical manifestations that causes septic shock can help prevent TB deaths. The study concluded that deaths in TB patients are due to septic shocks rather than the TB itself (Lin et al., 2014). Patients with AIDS-TB con-infection have higher risk of death. The risk among co-infected patients was 1.65 times the not co-infected in a study conducted in Brazil. Factors associated with increased survival was found to be Female gender, years of education (minimum of 8 years education experience) and CD4 diagnostic criteria. On the other hand, non-use of HAART (Highly Active Antiretroviral Therapy) for AIDS infected patients, no prior investigation on Hepatitis B status of patients during enrolment, age (above 60 years) and patients with more than two opportunistic infections (such as candidiasis, pneumonia etc) are associated with lower survival (Melo, Donalisio, and Cordeiro, 2017). Aung et al. (2019) found that survival rate among TB/HIV con-infected patient’s declines during a 12 year retrospective cohort study in Myanmar (82.0% at 5 years and 58.1% at 10 years) and some patients that are co-infected with HIV but are receiving the ART died within a period of ten years from when the ART treatment has commenced. About 40% of such TB/HIV patients died. 21 University of Ghana http://ugspace.ug.edu.gh In India, Pardeshi (2009) studied the survival trends of patients on Directly Ob- served Treatment-Short Course (DOTS) based on treatment categories, age, and sex. The survival of patients in three categories was compared using Kaplan-Meier plots and log rank tests. In multivariate analysis, Cox Proportional Hazard was utilized to analyze the effect of risk factors on survival. The study discovered a significant difference in patients’ survival curves across the three DOTS categories (i.e new smear positive PTB, sputum smear positive for TB relapsed and new smear negative PTB & less EPTB), between pulmonary and extrapulmonary TB (HR=0.61; P= 0.45) but not between male and female sex (HR= 0.87; p = 0.69) or between new and retreatment patients (HR = 0.75, p= 0.94). Patients with age above 40 years have higher risk of death (HR=7.81, p=0.012). Ghazal et al. (2014) investigated factors associated with mortality among in- hospital Tuberculosis patients in Pakistan. It was a retrospective cohort study of 120 patients. The patients were divided into two equal groups of 60 patients each for cases (i.e those who did not survive hospitalization) and control group (i.e those who were discharged after treatment). Positively diagnosed tuberculosis patients who were hospitalized in the same month of the year were included in the study. Late presentation of disease for treatment led to increase in early death among patients, lack of compliance with the treatment therapy, extended period of illness prior to treatment initiation, and low body weight were all found to be significant risk factors for mortality in hospitalized patients at 5% level of significance. Comorbidities such as leukocytes and low serum protein were found to be significantly related to deaths among hospitalized TB patients. Lefebvre & Falzon (2008) conducted a study to assess the risk factors associated with Tuberculosis related deaths among patients in Europe. The study employed case based data of patients from 15 European Union Countries within the period of 2002-2004. The study found that risk of deaths varies among the countries. 22 University of Ghana http://ugspace.ug.edu.gh However, the most significant factors of death were determined to be old age and treatment resistance. In bivariate analysis, the following characteristics were found to be significantly related with death: above 19 years old, male sex, pul- monary TB, and a previous history of TB. Multi-Drug resistance TB (MDR) was connected to death, and this connection was nearly as twice as high in individu- als with a past history of tuberculosis (secondary MDR) than in cases without a prior history of tuberculosis (primary MDR). The length of time between notice and death had no bearing on the outcome. Patients with tuberculosis in Europe have a two to five time higher risk of dying than those in other parts of the world. Male gender, Age, European origin, pulmonary TB, and MDR, were all found to be significantly associated with patient death in a multivariate analysis, with the risk of death being higher for secondary MDR (OR 3.6, 95% CI 3.0–4.3) than for primary MDR (OR 2.5, 95% CI 2.0–3.1). Portugal and the Czech Republic had ORs(Odds Ratio) that were much lower than the reference country (Lefebvre & Falzon, 2008). Advanced age, male gender, TB/HIV co-infection, first sputum positivity, TB retreatment, and delayed visit were all identified to be risk factors for mortality among pulmonary tuberculosis patients in a retrospective analysis of 7,032 TB patients in China. The forward stepwise cox model based on partial maximum likelihood was used to examine the risk factors, while Kaplan-Meier was em- ployed to predict survival probability. The study found that visit delays (greater than 14 days after initial diagnosis) is the risk factor associated with TB-related deaths (RR=1.386, 95% C.I = 1.096-1.753) and survival probability of patients with pulmonary TB is not affected by a delay in diagnosis prior to therapy. The risk of death was 1.8 times greater among patients who are male than females (RR=1.847, 95% C.I = 1.387-2.459) while age (RR= 1.059, 95% C.I = 1.051 – 1.067) is associated with increase in risk of death by 5.9% (Yi et al., 2020). 23 University of Ghana http://ugspace.ug.edu.gh 2.4 Related Literature on Parametric Models Saroj (2019) conducted a study on data of under-five child mortality in Indian setting. The objective of the study was to identify the risk factors which affect under-five child mortality using both parametric and semi-parametric survival models. Cox proportional hazard model and parametric models were compared to find out the best parametric model for under-five child mortality. Accelerated Failure Times (AFT) Models of the Weibull, exponential, log-normal, and log- logistic were used for this study. The weibull distribution was found to be the best of the four parametric models based on the minimum value using AIC criteria. Michael & Bolarinwa (2020) used data of Tuberculosis Patients in Nigeria to model survival using parametric models. It is a retrospective cohort study of TB patients between the periods of 2010 to 2016. The time it took to recover from tuberculosis infection was the outcome variable. In order to model survival of TB patients, the study used the AFT Models of the Weibull, Exponential and Log-logistic. To find the best model that fits the data, the AIC criteria was utilized. The impact of covariates on survival (time to recovery) of TB patients was assessed using age, gender, type of TB and patient occupation. Age, gender and occupation were identified to be key factors of recovery for TB patients. The Weibull AFT Model with the lowest AIC value was also discovered to be the best model in assessing TB patients’ survival. Parametric models were also adopted in a breast cancer studies conducted in Malaysia. The researchers used data set of breast cancer patients during the period of December 2008 and February 2017. Age and type of treatment were the covariates modelled against the time of breast cancer infection (survival time, t). The performance of each of the 3 models were compared using the log likelihood, AIC and BIC criterion. The study found that the Weibull distribution was the best fitted model with the highest log-likelihood, lowest AIC and BIC value as 24 University of Ghana http://ugspace.ug.edu.gh compared to both Exponential and Log-logistic Models (Amra et al., 2017). Daniel, Lasisi & Banister (2020) conducted a study on Tuberculosis patients’ data from 2015 to 2017 using parametric models, generalized gamma frailty model with a mixture of gompertz distribution and cox proportional hazard to analyze the data. The aim was to determine the best model that fits the data. Data was obtained from a hospital in Bauchi State, Nigeria. To estimate survival proba- bility of patients, several predictor variables such as pre-treatment weight, drug usage, Smoking, level of education, age, marital status and type of TB were used in the model. Smoking and administration of drug (Treatment Therapy) are significantly associated with survival while other covariates are insignificant in predicting survival of TB patients. The result of the fitted Weibull PH, Ex- ponential PH, Gompertz PH, Cox P H, Lognormal AFT, Log-logistic AFT and generalized frailty gamma with a mixture of gompertz were compared using AIC to determine the best that fits the data. The frailty model of the gamma with a combined gompertz distribution was discovered to have fitted the model much better than the other parametric models based on its lowest AIC value as com- pared to others. Survival analysis was used to assess determinants of length of stay by tourists in Turkey. Data was collected from primary sources through the administration of questionnaire to tourist during the summer vacation of 2005. The study found that 16 variables were significant in determining the causes for the length of stay. The parametric models of Weibull, Exponential and Gompertz were used in the study. The semi-parametric model of the cox proportional hazard was also used to analyze the data. Using the AIC values of the fitted parametric models, the Weibull distribution was found to be the best fit which has the minimum AIC value (Gokovali, Bahar & Kozak, 2007). The Weibull, Log-logistic and Exponential Models was also used in a study on survival of smokers. Time to event for smokers who initial quitted smoking to 25 University of Ghana http://ugspace.ug.edu.gh the time when they resumed the habit was studied by Elketroussi & Fan (1991). The study was for a period of 44 months and most smokers were found to have resumed smoking within 8 months after quitting. Using the maximum likelihood estimates to compare between the 3 models, the Weibull and Log-gistic were found to have performed better in fitting the model than the Exponential. The lognormal model was compared with cox proportional hazard in modelling the survival of ovarian and breast cancer patients by Royston (2001). The result of the lognormal distribution was compared with that of the cox model. The researcher opined that using the lognormal, the median survival may improve by 25% as compared to result obtainable using the cox model on the same data. According to the author, cox model has weakness of validating a model using a new data. This is so because, the cox model uses a partial likelihood esti- mate which prevents it from specifying the complete probability of an event as the baseline hazard is not estimated in the model. This makes it inefficient in estimating survival as compared to parametric models. The study fitted other parametric models in to the breast cancer data such as Weibull, Gompertz, Expo- nential, Loglogistic and Gamma distributions. Using the AIC values, the gamma and lognormal were very close and differences between the two was found to be insignificant at 5% level of significance when compared using chisquare test statis- tic. This is expected as the lognormal is a special case of the gamma distribution. However, the logistic model was a better fit for the ovarian cancer data as it per- forms better in terms of AIC values. When the lognormal was compared to the cox model, the lognormal performed better in predicting survival. 2.5 Summary of Literature Review This chapter reviewed empirical studies on tuberculosis, discussed its epidemi- ology and risk factors associated with survival among TB patients. The non- 26 University of Ghana http://ugspace.ug.edu.gh parametric model of Kaplan Meier and Cox Proportional Hazard along with other parametric models of survival were used in several studies. The parametric mod- els of Weibull, Exponential, Log-normal etc were usually compared using AIC, BIC or log-likelihood to determine which of the models best fits the data where the data is assumed to follow a particular distribution. Researchers have jus- tified the use of the parametric models as superior tool to the Cox Model in survival analysis due to the semi-parametric attribute associated with the cox model. As such, only a partial likelihood is obtainable from the cox model as a non-parametric baseline hazard is always used in the Model. For the purpose of this study, both parametric and semi-parametric models of survival will be applied as available in the literature. 27 University of Ghana http://ugspace.ug.edu.gh Chapter 3 Methodology 3.1 Introduction The methods and statistical programs used in the study are discussed in detail in this section. Description of data employed for the study, models to be used in analyzing the survival data, tests of goodness of fit etc will be discussed in this section. 3.2 Description of Data and Variables of the Study Consecutive patients with positive Tuberculosis cases managed between January 2019 and December 2020 at the Directly Observed Therapy Short Course (DOTS) Unit of the hospital were enrolled for the study. The goal of this study was to determine the survival probability of TB patients who are receiving treatment at the DOTS Unit within Kano Metropolitan Area. Four Health care facilities were selected based on the recorded high number of TB in the facility during the period. The Infections Disease Hospital (IDH), Murtala Muhammed Specialist Hospital, Umma Zaria Healthcare Centre & Gwagwarwa Health Centre accounted for most TB cases in Kano from January 1, 2019 to December 31, 2020. Patients entered the cohort on the date of enrollment for TB treatment and remained in the cohort for a minimum period of six months unless they are either cured, 28 University of Ghana http://ugspace.ug.edu.gh completed treatment, Loss to follow up, transferred out or died within this period when treatment is being administered on them. Patients were also followed up to July 31, 2021 especially for those who commenced the 6 months therapy in December, 2020. The records of positively diagnosed TB patients who are either clinically con- firmed (such as x-ray & biopsy results) or bacteriologically confirmed cases (spu- tum result from GenExpert Machine) were obtained from the Tuberculosis reg- ister located at the health facility. Survival time for TB Patients constitutes the dependent variable of the study. The independent variables include pre-treatment weight of patients, Gender, HIV Status, Age and type of TB Disease (Pulmonary and Extrapulmonary) based on available patient’s medical information taken during treatment. 3.3 Research Design This is a retrospective cohort study of adult TB patients registered for treatment at various DOTS (Directly Observed Treatment-Short Course) Unit of a licensed TB treatment facility within metropolitan area of Kano State, Nigeria for the period of January 1, 2019 to December 31, 2020. Being a secondary source of data, records of patient can be objectively and accurately obtained for the period of therapy. The medical records within the TB register are easily obtained and information used retrospectively for data analysis. 29 University of Ghana http://ugspace.ug.edu.gh 3.4 Data Collection and Technique of Sampling Secondary source of data was used to obtain clinical and demographic informa- tion of tuberculosis patients from the TB register at the DOTS unit of the facility. The Age and Sex of patients, Type of TB disease, HIV status and initial body weight were the information obtained from the register. This register also in- cludes details of the period when patient commenced treatment, when he/she dies or loss to follow. Patient status such as treatment completion, cured, relapse and transferred out/in patient were also available in the register. A convenience sampling technique was used based on data availability and personal judgment of the researcher. Kano Metropolis, being a city has concentration of health facili- ties that provides treatment to patients within and outside the city. As a result of its population and urban migration, majority of TB cases reported in the state are predominant among people living within the city. Out of the available health facilities that provides TB therapy in Kano, 4 were identified to have the largest reported cases during the period under study and were selected for this study. 3.5 Data Analysis In terms of means and proportions of the survival data, descriptive statistics will be employed to describe the data properties. Chi-square will be employed to test for a connection between the independent factors and dependent variable (death) in bivariate analysis. To assess average failure time, associated level of risk, and difference in patient average time, Kaplan-Meier estimates and log rank test would be utilized. Log rank test is a popular test which is used for determining whether there 30 University of Ghana http://ugspace.ug.edu.gh is no difference in survival between two or more independent groups. The log rank test, which is closely related to the chi-square test statistic, compares the observed and expected number of events at each time point during the follow- up period. The Gehan-Breslow-Wilcoxon test is another method for comparing survival functions among independent groups. However, it places higher weight on deaths that occur at early time period. The log rank test, on the other hand, gives all time points an equal weights and is the more powerful of the two tests assuming the proportionality of hazards assumption is not violated (Avijit and Gogtay, 2017). Other variants of the Generalised log rank test include the Fleming and Harrington & Peto-Peto Prentice tests for comparing survival among independent groups. For the purpose of this study, the log rank test statistic was used. Cox and Parametric Proportional Hazard Models of multivariate analysis was used to assess the effect of predictor variables on death. Statistical significance is determined by considering nominal p-value of less than 5% (p¡ 0.05) with a 95% confidence level and less than 10% with a 90% confidence level. 3.6 Analytical Tools Data was analyzed using both STATA and R Statistical Package software pro- grams. Data collected from the hospital was initially entered into Microsoft Excel Spreadsheet and variables coded appropriately for reference during analysis. This Excel document is subsequently exported to both STATA and R statistical pack- ages for further analysis. 31 University of Ghana http://ugspace.ug.edu.gh 3.7 Ethical Consideration The Health research ethics committee of Kano State Ministry of Health approved the application to conduct the study using the TB data (NHREC Approval Num- ber: NHREC/17/03//2018). Written inform consent of participants could not be obtained during the course of the study, the informed consent was waived by the committee. Hence, restricted records of Patient was listed in the anonymized list prior to the study. 3.8 Concept of Survival Analysis Satagopan et al (2004) described survival of individuals at a particular time as “the conditional probability of surviving to a specific time given that the individ- ual is at risk for the event (such as mortality) at that time” (Satagopan et al., 2004). Survival of any individual at any given time is estimated as number of individuals that have not experienced the event (e.g death) at that time divided by number of individuals that have not experienced the event at least up to that time (Satagopan et al, 2004). The analysis of data measured from a given moment of inception until an event of interest or a predetermined endpoint is known as survival analysis (Collett, 1994). The study of time to event data is known as survival analysis (death, relapse from a treatment, cured etc). The event of interest in most clinical tri- als is evaluating the chance of experiencing an event by a certain time. The Kaplan-Meier approach can be used to produce a nonparametric estimate of the cumulative incidence or probability of encountering the event of interest when the data consists of patients who experience an event and censored persons (Sa- tagopan et al., 2004). To estimate the survival probability of patients (in clinical studies), the status (dead, alive or censored) and length of stay (time to event) 32 University of Ghana http://ugspace.ug.edu.gh are used. Survival analysis is a set of approaches for studying data in which the outcome variable is the time until an event of interest occurs (Viv et al., 2004). In most clinical studies, the event of interest is time to death, time between response to treatment and relapse-free period (such as in cancer studies).Some individuals may not have experienced the event of interest during or after the study has ended, as such, their survival times will not be known. This is a major difficulty in survival analysis. This is known as censoring, and it can happen in a number of ways: patient has not experienced the outcome of interest, such as death or relapse, at the end of the study, or a patient is lost to follow up during the study period, or a patient has a different event that prevents further follow-up (Clark et al., 2003). Time in which the event of interest has occurred is the major concern in survival analysis (Ajagbe et al., 2014). Survival function S(t) is defined as: S(t) = Pr(T > t) = 1− Pr(T ≤ t) (3.1) characteristics of Survival S(t) (a.) S(t) =1, if t=0 (b.) S(∞) = limn→∞S(t) = 0 (c.) S(t) is non-decreasing in t In general, the survival function S(t) provides useful summary information such as the median survival time, t-year survival rate, etc. Density Function for S(t): 33 University of Ghana http://ugspace.ug.edu.gh a. if T is a discrete random variable, f(t) = Pr(T = t) (3.2) b. If T is (absolutely) continuous, the density function is: f(t) = lim 4t→0+ Pr(failure occuring in[t; t+4t]) 4t (3.3) = Rate of occurrence of failure at t = lim 4t→0+ F (t+4t)− F (t) 4t = dF (t) dt = dP (T ≤ t) dt = [1− P (T > t)] dt = d[1− S(t)] dt = −dS(t) dt (3.4) 3.9 Non-Parametric Survival Model: Kaplan–Meier (K-M)Survival Estimate In Survival Analysis studies, the major difference between Kaplan-Meier (non- parametric analysis) and the usual parametric analysis (such as using Weibull, Gompertz and other probability distribution) is Censoring. To avoid bias, re- searchers use the K-M method which considers censored data in analysis. To account for censored cases during the study, data must be adjusted at each point where patient(s) are lost (Rich et al., 2010). When it cannot be confirmed that a participant during a study has died or the participant drops out during or at end of study, he/she is considered censored. When the participant has not en- 34 University of Ghana http://ugspace.ug.edu.gh countered the event of interest during or at the end of the study, and more data cannot be gathered, right censoring is said to have occurred. According to Jager et al. (2008) cited in Dudley (2016), “patients who are cen- sored must meet the following critical assumptions: censored patients have the same chance of survival as those who continue in the study, and survival odds are the same whether participants enroll early or late in the study”. Patients who have been censored are included in probability estimations of the event up until the evaluation point prior to their censoring, but they are omitted from subsequent analysis (Blagoev, Wilkerson, & Fojo, 2012). In survival data, censorship is an inherent phenomenon (life time data). As a result, applying parametric models to survival data is a tough task. In inves- tigations with censored data, the non-parametric estimators Kaplan- Meier or Nelson-Aalen are used (Saranya & Karthikeyan, 2015). The success of any clinical or community based study depends on the number of participants or patients that are prevented from having an adverse event (e.g. death) or alive. Some participants may be loss to follow-up, drop out of the study etc. Hence, not all participants will remain in the study until its completion. In order to come up with unbiased analysis and valid conclusion concerning pa- tients’ probability of survival, Kaplan–Meier estimate (also called “product-limit method”) serves as a simple, reliable measure (Kalra, 2017). The Kaplan Meier method, according to Rich et al. (2010), is the most commonly utilized survival analysis method in randomized medical clinical trials. Patients are allocated to various arms at random; they do not enter the study at the same time; and they drop out or are lost to follow up from the study at different times after they begin. The outcome variable of interest may or may not arise within the study observation period (Rich et al., 2010). Kaplan Meier estimate is a useful non parametric estimation where there is an 35 University of Ghana http://ugspace.ug.edu.gh incomplete observation in a data used in anaylsis. Kaplan and Meier (1958) de- scribed an event subject to random sampling with incomplete observation of all members as “death”. Kaplan–Meier test is nonparametric in nature typically used for estimating the survival distribution, that is, to compute the fraction of participants who survived for a certain specified period after the intervention or treatment. Even when individuals drop out or are investigated for varying durations of time, K-M allows the estimation of survival throughout time. The inclusion of censored observations makes the K-M a superior non-parametric es- timate in estimating survival than other parametric measures. Generally, when there is loss of participant, the proportion of survival decreases (Kalra, 2017). 3.9.1 Assumptions for the Kaplan-Meier (K-M) method The following assumptions on Kaplan Meier were provided by Koletsi & Pandis (2017): (1.) The K-M method presupposes that the likelihood of censoring is unrelated to the event of interest. (2.) Irrespective of when an individual enters into the study, they are all assumed to have same risk of experiencing the event. The survival probabilities for all study participants are the same and no circumstances are assumed to alter the baseline survival risk of participants. (3.) Lastly, the events are assumed to have occurred within a specific period of time. Information of the exact time may not be known in some instances but the status of the participant during the last follow up time is known prior to the event. 36 University of Ghana http://ugspace.ug.edu.gh 3.9.2 The Kaplan-Meier’s Product-Limit formula The survival probability can be estimated non parametrically from observed sur- vival periods for both censored and uncensored failure times, using the K-M (or product-limit) approach introduced by Kaplan and Meier in 1958. Saranya & Karthikeyan (2015) define the K-M formular as follows: Let t1, t2, t3, ··· denote the actual times of death of the n individuals in the cohort. Let d1, d2, d3, · · · represent the number of deaths that occured at each of these times, and Let n1, n2, n3, · ··be number of people who are at risk within the cohort.” (Saranya & Karthikeyan, 2015) The survivor function’s Kaplan-Meier estimator at time t ,for t(k) ≤ t < t(k+1) S(t) = ( 1− d1 n1 )( 1− d2 n2 ) · · · ( 1− d(i−1) n(i−1) ) (3.5) = ∏ i;yi T ≥ t|T ≥ t) 4t = f(t) S(t) (3.9) 38 University of Ghana http://ugspace.ug.edu.gh In comparing the hazard between two independent groups (treatment and con- trol), The likelihood (or hazard) of events occurring in the treatment group as a percentage of events occurring in the control group is referred to as a hazard ratio. A hazard ratio has no dimensions and merely tells you about the data’s consistency and reliability (Blagoev et al., 2012). Spruance et al (2004) cited in Dudley (2016) opined that a hazard ratio is only relevant if two basic assumptions of constant and proportionality are met i.e the difference between groups in a survival analysis are constant and the hazard ratio is proportional. A hazard ratio of more than or less than 1 indicates that one of the groups fared better than the other in terms of survival (Dudley, 2016). The cumulative distribution function F(t) and the probability distribution function f(t) can be used to define the above hazard function in equation 3.11. Recall from eq (3.1) S(t) = Pr(T > t) = 1− Pr(T ≤ t) = 1− F (t) (3.10) Therefore, h(t) = f(t) 1− F (t) = f(t) S(t) (3.11) λ(t) Which is the instantaneous event rate at a given period t equals the prob- ability of events at time t, divided by the probability of non-occurrence of event at the specified period. It can be seen that there exists a relationship between the survival function S(t) and the hazard function h(t), this can be expressed in terms of the calculus 39 University of Ghana http://ugspace.ug.edu.gh formula provided below: h(t) =− d dt logS(t) (3.12) Hence, the negative derivative of the survival function when divided by the sur- vival function, gives the hazard function by substituting equation (3.4) into equa- tion (3.11). 3.10.1 Relationship between Cummulative Hazard and Sur- vival Function The hazard can be estimated from the cumulative hazard H(t) or Λ(t). The integral of the hazard, or the area under the hazard function between times 0 and t when T is (absolutely) continuous defines the H(t) as shown below: H(t) = ∫ t 0 λ(u)du (3.13) From equation (3.11). H(t) = ∫ t 0 f(u) S(u) du = ∫ t 0 − d du S(u) S(u) du =− ∫ t 0 d du S(u) S(u) du evaluating the integral at the limit 40 University of Ghana http://ugspace.ug.edu.gh = − logS(t) + logS(0) S(0) = 1 and log(1) = 0,we have the relation below H(t) =− logS(t) (3.14) S(t) = exp [−H(t)] (3.15) S(t) = exp [ − ∫ t 0 f(u) S(u) du ] (3.16) From the above, it can be concluded the survival function is used in obtaining the hazard density function. This can further be used to derive cumulative hazard using Equation (3.11). On the other hand, the survival function can also be derived from the hazard. This is done by integrating the hazard function and taking its exponent to obtain the survival function in Equation (3.16). 3.11 Cox Proportional Hazard In the study of survival data, Cox’s proportional hazards regression model and the log-rank test statistic has become a standard among statistician (Armitage, 1987) cited in Kelvin (2003). The cox model can be used for both univariate and multivariate analysis of multiple predictor variables. It can also be used to compare survival among two independent groups as applicable in the log rank test statistic. Cox proportional hazard model is very important in survival analysis; the advantage of this model is that it includes the nonparametric and parametric element (Geachew and Bekele, 2016) cited in Rakesh (2019). As cited in Bradburn et al. (2003), Cox (1979) proposed a model for estimating the hazard based on order statistics. Mathematically, the Cox model is written 41 University of Ghana http://ugspace.ug.edu.gh as: h(t) =h0(t) exp [β1x1 + β2x2 + · · ·+ βpxp] h(t) = log [ h(t) h0(t) ] = ∑ i βixi (3.17) From the above equation, the hazard function h(t) is dependent on (or determined by) a set of p predictor variables (x1, x2, · · · , xp), whose impact is quantified by the size of the relevant coefficients (β1,β2,· · · ,βp). The baseline hazard h0 is the value of the hazard when all the xi are equal to zero. From equation (3.17), a unit increase in the explanatory variables (covariates) is associated with βi increase in the log hazard rate. The log of the hazard rate in a particular group (treatment) can be compared to the control group when quantified using the regression coefficient while adjusting for the predictors included in the model. The widespread use of the Cox Proportional Hazard model arises from long ex- perience with it, as well as the fact that it is distribution-free, requiring no as- sumptions about the underlying distribution of survival times to make inferences about relative rates of the event. The assumption has been relaxed to make it more malleable and easier to utilize in real-life situations (Kevin, 2003). 3.12 The Log Rank Test For comparing survival among independent groups, the most widely used test by researchers is the Mantel-Haenszel, often known as log-rank test, which was first developed by Mantel in 1966 and then by Cox in 1972. As a result, some researchers refer to the process as the Cox- Mantel test (Martinez, 2007) cited in Etikan et al. (2018). The test determines the difference between expected 42 University of Ghana http://ugspace.ug.edu.gh and observed number of events in the two groups of participants (Etikan et al., 2018). This test statistic is useful where there is an assumption of a proportional hazard. In other words, the hazard ratio between independent groups should remain constant. The null hypothesis is for a right censored data is given by: H0 : hA(t) = hB(t) (or SA(t) = SB(t)), for all t (3.18) For a right censored data, log rank test statistic is derived by: Z2 = [∑k i=1D(i) − E0[D(i)]∑k i=1 var0(D(i)) ]2 ∼ X 2 1 , where n is large (3.19) Where; E0(Di) =mD (nA N ) = nA (mD N ) and , V ar0(Di) = nAnBnAmDmD̄ N2(N − 1) Where: mD=Total number of failures. nA=Number of individuals in the risk set at i from group A. nB=Number of individuals in the risk set at i from group B. N=Number of individuals in the risk set. mD̄ = N −mD=Total Number of survivors. Di=failures at i from group A. 43 University of Ghana http://ugspace.ug.edu.gh The Log Rank Tests approximately follows a Chi-square with a 1 degree of free- dom. 3.13 Model Diagnosis Residuals have been used as a tool for diagnosing the fitness of model. Weisberg (2014) developed residuals that have been applied in theory of linear regression and it has proven to be a vital tool in analyzing model’s goodness of fit. It is crucial to analyze the cox regression model’s suitability in characterizing the TB data utilized in this investigation, taking into account the assumptions made in the cox proportional hazard’s model. Several approaches for evaluating the cox model’s appropriateness are available in the literature; four of these diagnostics will be used in this study i.e. Martingale Residuals (Linearity assumption and fitness of model), Deviance Residuals (model fitness), Schoenfeld Residuals (pro- portionality assumption of the hazard) and Cox Snell Residuals (fitness of the cox proportional hazard model). 3.13.1 Martingale Residuals This is used to determine whether the variables and the log-hazard have a linear relationship. Breslow and Prentice (1998) proposed the use of martingale residual plots to determine the functional form of the continuous variables (or covariates). The residuals obtained from the martingales indicates the difference between observed and the expected number of events as assumed in the Cox Model (Klein & Moeschberger,2003). It is defined as: Mi = δ − Ĥ0(t) exp [ XT i β̄ ] (3.20) 44 University of Ghana http://ugspace.ug.edu.gh Where Ĥ0(t) is the estimate of the baseline cumulative hazard at ti and δi is the event indicator for subject i. Since non-linearity is not usually associated with a categorical variable, the mar- tingale residual is used to test for non-linearity in continuous variable (Age and Weight of Patients in this case). Martingales have a Mean of 0 and ranges be- tween −∞ and 1. A value close to 1 indicates early death of a patient while large negative values implies that a patient lived beyond expected time of death (lived too long). The martingale residuals in some computer application can present the plot along with a curve fitted on it. This curve is known as loess smoother which is used to show the linear assumption of the cox model has been satisfied. 3.13.2 Deviance Residuals This is also another model to check proportionality assumption of the cox model. It is usually used to detect outliers. It is a transformation of the martingale that has been normalized. It is usually symmetrical with zero distribution and standard deviation of 1. Non-negative values implies that there is early death in patients. Patients who did not experience the event (death) or survived beyond their expected survival time are assumed to have lived for a longer period of time. However, the model poorly predicts values for extremely large or small outliers. The deviance residual is given as: Di =sign(Mi) √ [−2(Mi + ∂ilog(∂i −Mi))] (3.21) where Mi is the Martingale Residuals and ∂i is the hazard 45 University of Ghana http://ugspace.ug.edu.gh 3.13.3 Schoenfeld Residuals Proportionality of hazard assumption can be assessed both graphically and by the statistical computation. The scaled schoenfeld residuals provides the graph- ical diagnostic of the proportional hazard assumption. The Schoenfeld residuals was invented by David Schoenfeld (1982) using expectation. By summing over all indices in the risk set at a particular time when the event of interest has occurred. It is used to confirm if each independent variable satisfies the cox proportional hazard assumption. The schoenfeld residual is given by: E(Xjm) = ∑ i∈Rj (Xj)[i][j]× P (i dies|Rj) (3.22) P (i dies|Rj) = exp (xTi β)∑ k∈Rj exp (xTi β) (3.23) Xi= mth regression variable’s value ith individual, E(Xjm) = Expected value of the mth regression variable in Rj Equation (3.22) can be used to estimate expected value for the column of each covariate in the study. From equation (3.23), we can see that Schoenfeld residuals assumes a common baseline hazard for all individuals. It is the probability of the cox hazard which also assumes similar baseline hazard for all individuals in the study. Data used to derive schoenfeld residuals are uncorrelated with each other in a large survival data set where the cox proportional assumption has not been violated (Schoenfeld, 1982). Also, Grampbsch and Therneau (1993) proved that the mean of the scaled Schoenfeld Residuals is Zero if the coefficients of the cox 46 University of Ghana http://ugspace.ug.edu.gh regression do not vary with time. 3.13.4 Cox Snell Residuals The residuals developed by Cox and Snell (1968) can be used to evaluate the fit of a Cox proportional hazards model. Assume you have fixed covariates that are fitted to a data using a Cox Model given by equation (3.17), if the model is properly fitted, then the integral transformation on the true death time T, results to a random variable with a uniform distribution U = H(Ti|Xi) has an exponential distribution with a hazard rate of 1 and a linear cumulative hazard rate. The Nelson-Aelen cummulative hazard estimation of the deviance residuals is used to confirm this (Klein & Moeschberger, 2003). Hence, If the Cox-PH model is correctly specified then, r1, r2, · · ·, rn should comprise a right censored sample from an exponential distribution. The residual can be estimated as: ri = ˆ(H0)(Ti) exp [ p∑ k=1 Xikβ̂k T ] ; i = 1, · · · , n (3.24) Where βk= estimated values of βs that are close to actual βs if model is true ˆ(H0)(Ti)= baseline hazard rate estimated by breslow, Xik= kth covariate for an ith subject In some text, the deviance residuals can be obtained from difference between the censoring variable and the martingale residuals i.e ri = ∂i −Mi (3.25) Where δi is censoring indicator (1 if failure occurs and 0 otherwise) and Mi is the martingale residuals. The Plot of the cox snell residuals against the cumulative hazard estimated using Neslson Aelen, should be a straight line passing through 47 University of Ghana http://ugspace.ug.edu.gh the origin with a slope of 1. If model is correct, there should be no serious deviation in the values from the straight line of 45-degree from the origin. Model is well predicted when values are on the 45-degree line. There is over and under prediction when the values are above and below the straight line respectively. 3.14 Parametric Models for Estimating Survival When data is known to follow a certain distribution, some studies have justified the use of parametric models over non parametric or semi-parametric models in estimating survival time. David, Stanley & Sussane (2008) highlighted the ben- efits of using a parametric model in survival analysis as follows: “full maximum likelihood can be used to estimate the parameters, estimated coefficients or trans- formations can provide clinically meaningful estimates of effect, fitted values from the model can provide estimates of survival time, and residuals can be computed as differences between observed and predicted values” (David, Stanley & Sussane, 2008)