UNIVERSITY OF GHANA 
COLLEGE OF BASIC AND APPLIED SCIENCES 
 
 
   
GENERALISED BERNOULLI MODEL FOR CORRELATED 
BINARY RESPONSES: APPLICATION TO THE NATIONAL 
INCOME DYNAMICS STUDY (NIDS) DATASETS 
 
 
 BY 
 MAWUTOR FLEKU 
(ID. NO. 10071460) 
 
THIS THESIS IS SUBMITTED TO THE UNIVERSITY OF GHANA, 
LEGON IN PARTIAL FULFILMENT OF THE REQUIREMENT FOR 
THE AWARD OF MPHIL STATISTICS DEGREE 
 
 
 
DEPARTMENT OF STATISTICS 
JUNE, 2016 
   
University of Ghana http://ugspace.ug.edu.gh
 i  
 
ABSTRACT 
 
The bivariate Bernoulli model was used to estimate covariate parameters for conditional as well 
as marginal models for the NIDs datasets. This is a follow- up research on one conducted by Islam 
et al. (2012) which introduced the use of  the bivariate Bernoulli model to properly specify the 
dependence among bivariate binary responses. The covariate parameters were estimated by first 
expressing the proposed model in the exponential family form, finding the log-likelihood function 
and then the corresponding estimating equations. The Newton Rahpson and the Nelder Mead 
methods of iteration were used to estimate the covariate parameters. The research revealed that the 
bivariate Bernoulli model fitted bivariate binary response data significantly better than the 
conditional logistic and the Generalized Estimating Equation (GEE) logistic models-marginal 
model. The result was same for both artificial and real-life data. It is worth mentioning that to aim 
at more efficient covariate estimates, the bivariate Bernoulli model is highly recommended for 
bivariate binary response data. However, further research is needed to choose an initial value when 
using the Newton Rahpson and the Nelder Mead methods of iteration, because it posed a serious 
challenge in this study. Again further research is recommended to probably use the multivariate 
Bernoulli distribution to fit multivariate binary response data and estimate covariate coefficients. 
 
 
 
 
 
University of Ghana http://ugspace.ug.edu.gh
 ii  
 
DECLARATION 
 
Candidate’s Declaration  
I hereby declare that this submission is my work in partial fulfilment of  MPhil and that, to the best 
of my knowledge, it contains no material previously published by another person nor material 
which has been accepted for the award of any other degree of the University, except where due 
acknowledgement has been made in the text.  
 
  
MAWUTOR    FLEKU 
  (10071460) 
 
 
SIGNATURE 
 
   DATE 
 
 
                              
Certified by:  
DR. ANANI LOTSI 
   (SUPERVISOR) 
 
 
SIGNATURE 
 
DATE 
           
             
Certified by:  
DR. KWABENA DOKU-AMPONSAH 
  (SUPERVISOR) 
 
 
SIGNATURE 
 
DATE 
          
                                                                         
 
 
 
 
 
 
 
 
University of Ghana http://ugspace.ug.edu.gh
 iii  
 
ACKNOWLEDGEMENTS 
I would like to express my profound gratitude to Dr. Nortey, who helped me to settle on the thesis 
topic and provided valuable corrections after reading through the final work. 
I would also like to thank my supervisors; Dr.  Anani Lotsi and Dr. Kwabena Doku-Amponsah   
who provided guidance and explanation whenever I got confused at various stages of this project. 
Additional, I am very grateful for the encouragement they offered when I felt like giving up.  
Special thanks to the Acting Commissioner of the Commission of Human Rights and 
Administrative Justice (CHRAJ), Mr. Richard Quayson, who granted me two-year study leave. 
Finally, I thank my parents for their support and concern throughout the period of my study. 
 
 
 
 
 
 
 
 
 
 
 
 
 
University of Ghana http://ugspace.ug.edu.gh
 iv  
 
TABLE OF CONTENTS 
ABSTRACT ..................................................................................................................................... i 
DECLARATION ............................................................................................................................ ii 
ACKNOWLEDGEMENTS ........................................................................................................... iii 
LIST OF TABLES ........................................................................................................................ vii 
LIST OF ABBREVIATIONS ...................................................................................................... viii 
CHAPTER ONE: INTRODUCTION ............................................................................................. 1 
1.1 Preliminary remarks .............................................................................................................. 1 
1.2 Statement of problem ............................................................................................................ 2 
1.3 Objectives of the study .......................................................................................................... 2 
1.4 Relevance of the study .......................................................................................................... 3 
1.5 Scope of the study ................................................................................................................. 3 
1.6 Sources of data ...................................................................................................................... 3 
1.7 Limitation of the study .......................................................................................................... 4 
1.8 Approach ............................................................................................................................... 4 
1.8.1 Simulation ....................................................................................................................... 4 
1.8.2 Application to real life data: ......................................................................................... 10 
1.9 Definition of some relevant terms ....................................................................................... 13 
CHAPTER TWO:LITERATURE REVIEW ................................................................................ 15 
2.1 Background ......................................................................................................................... 15 
2.2 Marginal models .................................................................................................................. 16 
2.2.1 Response probabilities and correlations ....................................................................... 16 
2.2.2 Marginal odd ratios ....................................................................................................... 17 
2.2.3 Markov chain ................................................................................................................ 18 
2.2.4 Quadratic exponential model ........................................................................................ 18 
2.2.5 Generalized multivariate logistic model ....................................................................... 19 
2.2.6 Multivariate plackett distribution ................................................................................. 19 
2.2.7 Bivariate or multivariate Bernoulli approaches ............................................................ 19 
2.3 Conditional models ............................................................................................................. 20 
2.4 Measuring dependence ........................................................................................................ 21 
University of Ghana http://ugspace.ug.edu.gh
 v  
 
2.5 Joint model .......................................................................................................................... 22 
2.6 Methodology ....................................................................................................................... 22 
2.6.1 Simulation ..................................................................................................................... 22 
2.6.2 Model checking ............................................................................................................ 30 
CHAPTER THREE:THE PROPOSED MODEL ......................................................................... 32 
3.1 The Bivariate Bernoulli model ............................................................................................ 32 
3.2 Comparison with an alternative test .................................................................................... 42 
3.3 Test for the model ............................................................................................................... 43 
CHAPTER FOUR:SIMULATION ............................................................................................... 45 
4.1 Introduction ......................................................................................................................... 45 
4.2 Exploratory analyses ........................................................................................................... 46 
4.2.1 Odds ratio ..................................................................................................................... 46 
4.2.2 Marshall–Olkin correlation ........................................................................................... 47 
4.2.3 
3Dependence value based on Ș ..................................................................................... 47 
4.3 Regressive model ................................................................................................................ 48 
4.4 Conditional (
1Y = 0 ) ........................................................................................................... 49 
4.5 Conditional 
1(Y = 1)  ........................................................................................................... 50 
4.6 The Marginal model ............................................................................................................ 51 
4.7 Bivariate Bernoulli model ................................................................................................... 52 
4.8 Overall fit of the proposed model ....................................................................................... 53 
CHAPTER FIVE:APPLICATION TO REAL -LIFE DATA ...................................................... 54 
5.1 Introduction ......................................................................................................................... 54 
5.2 Odds ratio ............................................................................................................................ 54 
5.3 Marshall–Olkin correlation ................................................................................................. 55 
5.4
3Dependence value based on Ș ............................................................................................. 55 
5.5 Traditional models............................................................................................................... 56 
5.5.1 Marginal model-GEE ................................................................................................... 56 
5.5.2 Conditional model 
1(Y = 1)  .......................................................................................... 58 
5.5.3 Conditional model 
1(Y = 0)  .......................................................................................... 59 
University of Ghana http://ugspace.ug.edu.gh
 vi  
 
5.6 Generalized bivariate Bernoulli model ............................................................................... 59 
5.6.1 Marginal........................................................................................................................ 59 
5.6.2 Conditional model 
1(Y = 0)  ....................................................................................... 60 
5.6.3 Conditional model 
1(Y = 1)  .......................................................................................... 60 
5.7 Overall fit of the model ....................................................................................................... 61 
CHAPTER SIX:RESULTS AND DISCUSSION ........................................................................ 63 
6.1 Introduction ......................................................................................................................... 63 
6.2 Correlation ........................................................................................................................... 63 
6.3 Dependence ......................................................................................................................... 63 
6.4 Investigation of the traditional models ................................................................................ 64 
6.5 The Bivariate Bernoulli model ............................................................................................ 64 
6.5.1 Deriving parameter estimates ....................................................................................... 65 
6.5.2 Parameter estimates ...................................................................................................... 65 
6.6 Comparison of results.......................................................................................................... 65 
6.7 Structure of models ............................................................................................................. 66 
6.8 Missing values ..................................................................................................................... 66 
CHAPTER SEVEN:CONCLUSION AND RECOMMENDATIONS ........................................ 67 
7.1 Conclusion ........................................................................................................................... 67 
7.2 Recommendations ............................................................................................................... 67 
REFERENCES ............................................................................................................................. 69 
APPENDIX A: Simulation tables ................................................................................................. 72 
APPENDIX B: Codes used in data analysis ................................................................................. 89 
B1:Codes used for simulation (Sample size  50 only) .............................................................. 89 
B2:Codes used for real-life-application .................................................................................. 118 
 
 
 
 
 
University of Ghana http://ugspace.ug.edu.gh
 vii  
 
LIST OF TABLES 
Table 1.1a: Summary of test statistics generated: Exploratory 7 
Table 1.1b: Summary of test statistics generated: Regression model 8 
Table 1.1c: Summary of test statistics generated: Conditional model 9 
Table 1.1d: Summary of test statistics generated: Marginal Model 9 
Table 1.1e: Summary of test statistics generated: Bivariate Bernoulli Model 10 
Table 1.2: Transition counts and respective probabilities on the employment status for the 
two periods    
11 
Table 1.3: Summary table for testing the usefulness of the models 12 
Table 3.1: 2 x 2 contingency table for the bivariate Bernoulli distribution  33 
Table 5.1: Transition counts and respective probabilities on the employment status for the      
two periods 
54 
Table 5.2a: Fitted models using data from NIDS: Traditional models 55 
Table 5.2b: Fitted models using data from NIDS: Generalized bivariate Bernoulli model
  
56 
Table 5.3: Comparison of results: probability of a covariate being employed at the time of 
second visit 
61 
Table A1a: Results of estimated models from 1000 simulations with sample of size of 50: 
Exploratory statistics 
72 
Table A1b: Results of estimated models from 1000 simulations with sample of size of 50: 
Regression model 
73 
Table A1c: Results of estimated models from 1000 simulations with sample of size of 50:        
Conditional models 
73 
Table A1d: Results of estimated models from 1000 simulations with sample of size of 50: 
Marginal model 
74 
Table A1e: Results of estimated models from 1000 simulations with sample of size of 50: 
Bivariate Bernoulli model 
74 
Table A2a: Results of estimated models from 1000 simulations with sample of size of 100: 
Exploratory statistics 
75 
Table A2b: Results  of estimated models from 1000 simulations with sample of size of 100: 
Regression model 
76 
Table A2c : Results  of estimated models from 1000 simulations with sample of size of 100: 
Conditional models 
76 
Table A2d: Results of estimated models from 1000 simulations with sample of size of 100: 
Marginal model 
77 
Table A2e: Results of estimated models from 1000 simulations with sample of size of 100: 
Bivariate Bernoulli model 
77 
University of Ghana http://ugspace.ug.edu.gh
 viii  
 
Table A3a: Results of estimated models from 1000 simulations with sample of size of 200: 
Exploratory statistics 
78 
Table A3b: Results of estimated models from 1000 simulations with sample of size of 200: 
Regressive model 
79 
Table A3c: Results of estimated models from 1000 simulations with sample of size of 200- 
Conditional model 
80 
Table A3d: Results of estimated models from 1000 simulations with sample of size of 200: 
Marginal model 
80 
Table A3e: Results of estimated models from 1000 simulations with sample of size of 200: 
Bivariate Bernoulli  model 
81 
Table A4a: Results of estimated models from 1000 simulations with sample of size of 500: 
Exploratory statistics 
82 
Table A4b: Results of estimated models from 1000 simulations with sample of size of 500: 
Regressive model 
83 
Table A4c: Results of estimated models from 1000 simulations with sample of size of 500: 
Conditional models 
83 
Table A4d: Results of estimated models from 1000 simulations with sample of size of 500: 
Marginal model 
84 
Table A4e: Results of estimated models from 1000 simulations with sample of size of 500: 
Bivariate Bernoulli  model 
84 
Table A5a: Results of estimated models from 1000 simulations with sample of size of 
1000: Exploratory statistics 
85 
Table A5b: Results of estimated models from 1000 simulations with sample of size of 
1000: Regressive model 
86 
Table A5c: Results of estimated models from 1000 simulations with sample of size of 
1000: Conditional models 
86 
Table A5d: Results of estimated models from 1000 simulations with sample of size of 
1000: Marginal model 
87 
Table A5e: Results of estimated models from 1000 simulations with sample of size of 
1000: Bivariate Bernoulli model 
87 
 
 
 
 
 
 
University of Ghana http://ugspace.ug.edu.gh
 ix  
 
LIST OF ABBREVIATIONS 
GEE     Generalized Estimating Equations 
NIDS    National Income Dynamics Study 
University of Ghana http://ugspace.ug.edu.gh
 1  
 
CHAPTER ONE 
INTRODUCTION 
1.1 Preliminary remarks 
In longitudinal studies, outcomes are normally taken from the same subjects over a period of time. 
In such situations, there is likely to be correlation between the outcomes. For example, in pre and 
post tests, outcomes are initially taken from subjects treatment is then given to the subjects and 
outcomes taken thereafter from the same subjects. In such cases, there is the likelihood that the 
initial outcomes and post outcomes will be correlated.  Similarly, in tests where records are taken 
over a period of time, such as examining the blood pressure level of patients, these records are 
likely to be correlated.  
 The possibility of correlations between repeated outcomes needs to be taken into account when 
analyzing such data. Using standard statistical models and assuming independence for correlated 
responses may lead to misleading results, particularly in the estimation of regression parameters. 
One therefore needs a statistical model that takes the dependence in response variables into 
consideration.  
To address this problem, most studies have employed either marginal or conditional models.  Many 
studies have employed the use of marginal models such as Liang and Zeger (1986) and 
Molenberghs and Leasaffre (1994). A few however employed the use of conditional models, 
notable among them are Bonney (1986) and Bonney (1987). 
However, in all the above studies, it is not easy to precisely specify the measures of dependence 
in response variables. Islam, Chowdhury, and Briollais (2012), proposed a joint modeling 
approach for bivariate binary response employing both the marginal and conditional models where 
University of Ghana http://ugspace.ug.edu.gh
 2  
 
the dependence in outcome variables can be measured and tested using a link function of the 
models. 
The usefulness of their proposed model was investigated thoroughly by using extensive simulation 
study and real-life-data. 
1.2 Statement of problem 
Marginal and conditional models proposed by various authors to tackle the problem of dependence 
between outcomes may fail to provide efficient population parameter estimates, because they do 
not specify the dependence of binary outcomes in the model, hence the introduction of joint 
modeling by Islam et al. (2012), for bivariate binary response using both the marginal and 
conditional models. However, Islam et al. (2012) did not investigate the performance of the 
regression coefficients and the model as a whole. In this thesis, the usefulness of their proposed 
model as well as the efficiency of the regression coefficient estimates was investigated by using 
an extensive simulation study and an application to real life data. 
1.3 Objectives of the study  
The overall objective of this study is to address dependence in binary outcomes, by investigating 
the usefulness of the proposed model by finding out whether the model fits bivariate binary 
response data significantly better.  
Specifically, the study wishes to  
• unify the marginal and conditional probabilities  
• test for dependence in outcomes and 
• test properties of the estimates of regression coefficients 
•  test the overall fit of the model 
University of Ghana http://ugspace.ug.edu.gh
 3  
 
1.4 Relevance of the study 
In various fields such as epidemiology, public health, economics and anthropology, studies 
involving repeated measures are carried out. Most clinical trials employ the use of repeated 
measures. Results from such studies need to have very little percentage of error. A case in point is 
clinical trials for testing the efficacy of drugs for the Ebola and Zika diseases. Using a statistical 
model that significantly fits available data better than existing models enhances the accuracy of 
results. The hope of the researcher therefore, is to affirm the use of the Bivariate Bernoulli Model 
for any analyses involving bivariate binary response outcomes.  
1.5 Scope of the study 
The thesis made reference to only one stage of repeated study. Specifically, populations included 
in this study only consisted of “Two Waves”, that is a study on the same population at only two 
time points.  
 Only binary response outcomes were taken into account. For instance, the presence or absence of 
a trait in a subject under study. 
Finally, the effect of any clustering was ignored. 
1.6 Sources of data 
Two sources of data were employed in this study, namely artificial and real-life data: 
1.6.1 Artificial data 
In pursuance of investigating the usefulness of the proposed model, we first make use of artificial 
correlated data. Correlated binary data were generated from a package called bindata on the R-
programming language platform. 
 
 
University of Ghana http://ugspace.ug.edu.gh
 4  
 
1.6.2 Real-life data 
Data from the National Income Dynamics Study (NIDS), a study on the personal records of   
28,000 South Africans was used to investigate the usefulness of the proposed model. NIDS is a 
programme to compile comprehensive longitudinal information on persons selected for the study 
and to find out who is moving ahead and who is falling behind. This data is also key for research 
and policy makers. 
The first study was conducted in 2008, and the data was compiled into the Wave1dataset. . The 
second, Wave 2 dataset was compiled after the second visit was made to the same group of people 
between 2010 -2011. 
1.7 Limitations of the study 
The limitations of the study are as follows: 
 The choice of rho (  ) to generate the simulated binary data was restricted. For example the 
package did not allow the choice of high values of  such as 0.9 and above. During the 
simulation exercise, especially where the sample size was 1000, the execution time run for 
more than 15 minutes and for some cases the process had to be truncated. 
 Estimating the parameters for the proposed model was very challenging. The initial choice of 
parameters to aid the running of the estimated parameter through the use of Newton-Raphson 
and the Nelder-Mead methods of iteration was time consuming. 
1.8 Approach 
Two approaches were employed:  first, simulation and then application to real life data. 
1.8.1 Simulation  
According to Kaiser et al. (2011), a typical strategy for testing the usefulness of a statistical model 
is the utilization of manufactured data, since, the dataset will have the desired set of properties. 
University of Ghana http://ugspace.ug.edu.gh
 5  
 
The models to be tested are fitted to the manufactured dataset and then checked for the presence 
or otherwise of these effects. The fitted models could also be checked to find out how they behave 
under different experimental conditions.  
In this study, simulation was undertaken to investigate the usefulness of the bivariate Bernoulli 
model. Correlated binary data were generated for simulations, using a technique suggested by 
Leisch et al. (1990, cited in Islam et al 2012, p852) known as bindata package for R. Leisch et al. 
(1990) came up with two ways of generating binary random variates namely  
(i) The inversion method and   
(ii) Direct Conversion of Real-Valued Random Variates;  (see literature review) 
However, the Direct Conversion of Real-Valued Random Variates approach, provides a faster 
approach for generating binary vector samples. Accordingly, data was initially generated from 
multivariate normal random variates and thereafter converted into binary data.  
 
Three variables were simulated; two dependent response variables 
1Y  and 2Y    and one covariate,
X . Different correlation combinations between the two response variables and pairwise 
correlations, that is their relationship with the covariate, X  were considered. Simulations were 
performed 1000 times with different sample sizes ( 20, 50 100, 500 and 1000). 
In generating the required data for simulation, this study used the following inputs:  
i) Marginal probabilities for 
1Y , 2Y  and X ,  
ii)  , this describes the correlation between the response variables. 
Specifically, five different sets of models were generated using different marginal probability 
combinations for 
1Y  and 2Y  as follows ( it is worth stating that the function rmvbin was used in 
generating correlated binary data. It required the user to provide the marginal probabilities whereas 
University of Ghana http://ugspace.ug.edu.gh
 6  
 
the function estimates its own pairwise probabilities, hence only marginal probabilities were 
provided): 
(i) Low marginal probabilities  
1 2( 0.1, 0.1)P P     with 0.2    
(ii) Average marginal probabilities 
1 2( 0.5, 0.5)P P    with 0.3    
(iii) A high and a low marginal probability 
1 2( 0.1, 0.8)P P   with 0.1   
(iv) An average and a low marginal probability 
1 2( 0.5, 0.3)P P    with 0.3   
(v) Above average and below average marginal probabilities   1 20.2, 0.6P P   with 
0   
In other words, the study came up with five (5) different tables with sample sizes 20, 50, 100, 500 
and 1000 for null, positive and negative correlations between the bivariate responses (
1Y , 2Y ) as 
stated above. 
The true pairwise correlations between (
1Y , X) and ( 2Y , X) were also shown for each model.As the 
dependence in outcome variables depends on the association between outcomes and covariates too, 
the study also examined the conditional correlations between 
2Y   and X for given 1 1Y y  
The following was extracted to form one complete table:  
(i) the average observed correlations using the Marshall–Olkin method : The Marshall-
Olkin estimator for correlation for the bivariate Bernoulli variates 
1Y  and 2Y  is 
11 00 10 01
0 1 0 1
ˆ ˆ ˆ ˆ
ˆ
ˆ ˆ ˆ ˆMO
P P P P
P P P P

   

,  where 
1 2( , )P Y u Y v   = uvP , 0,1; 0,1u v  and uP   or vP   are 
the marginal probabilities, 
(ii)  the average of the estimated  odds ratios given by 
11 00
10 01
P P
P P
, 
University of Ghana http://ugspace.ug.edu.gh
 7  
 
(iii) the average estimates of the parameters from the models,  
(iv) the average values of the proposed tests and total number of instances where p < 0.05,   
(v) the average values of the  regressive and conditional models approach and 
corresponding number of instances where p < 0.05, 
(vi) a check of the overall fit of the bivariate Bernoulli model. The likelihood ratio test was 
employed. The log-likelihood of the regressive, conditional and marginal models were 
estimated and compared to the log-likelihood of the proposed test. The formula for the 
log-likelihood test is given by : 
LR = -2In(log.likelihood(restrictive.model/ proposed.model)
2(log.likelihood(proposed.model) - log.likelihood(restrictive.model)
 
It is worth mentioning that, data for the simulation exercise is multivariate; the relationship 
between 
1Y  and 2Y , 1Y  and X as well as 2Y and  X were investigated. Also, there is alternative real 
life data which will result in similar conclusions, when analysed the same way. 
Tables 1.1a to 1.1e summarize test statistics generated. 
 
Table 1.1a: Summary of test statistics generated: Exploratory  
Models 
                  
 
Estimates 
1 
ȡ  0.2  
1 2( 0.1, 0.1)P P   
2 
ȡ  0.  
1 2( 0.5, 0.5)P P 
 
3 
ȡ  0.1  
1 2( 0.1, 0.8)P P 
 
4 
ȡ  0.  
1 2( 0.5, 0.3)P P 
 
5 
ȡ  0  
1 2( 0.2, 0.6)P P 
 
 1 2Avg.Cor ,Y Y  
 sd in   
 1Avg.Cor ,Y X  
 sd in   
     
 2Avg.Cor ,Y X  
 sd in   
     
Avg.Odds Ratio  
 1 2 ,Y Y  
     
University of Ghana http://ugspace.ug.edu.gh
 8  
 
Marshall – Olkin correlation
  1 2 ,Y Y   
     
 
Marshall – Olkin correlation   
 2 1, / 0Y X Y   
     
Marshall –Olkin correlation
2 1( , / 1)Y X Y 
  
     
Marshall – Olkin correlation
 
 1 2/ 1Y X Y ,    
     
Marshall –Olkin correlation
 
 1 2, /  0Y X Y    
     
Dependence value based on 3
 
     
Av means average values  
 
Table 1.1b: Summary of test statistics generated: Regression model 
Regressive model 
                  
 
Estimates 
1 
ȡ  0.2  
1 2( 0.1, 0.1)P P   
2 
ȡ  0.  
1 2( 0.5, 0.5)P P 
 
3 
ȡ  0.1  
1 2( 0.1, 0.8)P P 
 
4 
ȡ  0.  
1 2( 0.5, 0.3)P P 
 
5 
ȡ  0  
1 2( 0.2, 0.6)P P 
 
. 1Av   
%  0.05of p     
     
     
.Av   
%  0.05of p   
     
     
2.Av   
 :  Wald test %  0.05of p   
     
     
Av means average values  
 
 
University of Ghana http://ugspace.ug.edu.gh
 9  
 
Table 1.1c: Summary of test statistics generated: Conditional models 
Conditional  
                  
 
Estimates 
1 
ȡ  0.2  
1 2( 0.1, 0.1)P P   
2 
ȡ  0.  
1 2( 0.5, 0.5)P P 
 
3 
ȡ  0.1  
1 2( 0.1, 0.8)P P 
 
4 
ȡ  0.  
1 2( 0.5, 0.3)P P 
 
5 
ȡ  0  
1 2( 0.2, 0.6)P P 
 
Conditional 
1 0Y   
01
  
     
%  0.05of p        
:Wald test %  0.05of p       
Conditional 
1 1Y   
11
  
     
%  0.05of p         
:Wald test %  0.05of p       
Av means average values  
 
Table 1.1d: Summary of test statistics generated: Marginal model 
Marginal model 
                  
 
Estimates 
1 
ȡ  0.2  
1 2( 0.1, 0.1)P P   
2 
ȡ  0.  
1 2( 0.5, 0.5)P P 
 
3 
ȡ  0.1  
1 2( 0.1, 0.8)P P 
 
4 
ȡ  0.  
1 2( 0.5, 0.3)P P 
 
5 
ȡ  0  
1 2( 0.2, 0.6)P P 
 
. 1 ( . )Av s e  
     
:Wald test %  0.05of p       
Av means average values  
 
 
 
 
 
University of Ghana http://ugspace.ug.edu.gh
 10  
 
Table 1.1e: Summary of test statistics generated: Bivariate Bernoulli model 
Bivariate Bernoulli model 
                  
 
Estimates 
1 
ȡ  0.2  
1 2( 0.1, 0.1)P P   
2 
ȡ  0.  
1 2( 0.5, 0.5)P P 
 
3 
ȡ  0.1  
1 2( 0.1, 0.8)P P 
 
4 
ȡ  0.  
1 2( 0.5, 0.3)P P 
 
5 
ȡ  0  
1 2( 0.2, 0.6)P P 
 
11
  
     
%  0.05of p        
01
  
     
%  0.05of p        
. 1VA   
     
%  0.05of p        
                              Likelihood ratio test (proposed model and regressive model) 
%  0.05of p    
 Likelihood ratio test (proposed model and conditional 
1 0Y  ) 
%  0.05of p    
Likelihood ratio test (proposed model and conditional 
1 1Y  ) 
%  0.05of p    
                              Likelihood ratio test (proposed model and marginal model ) 
%  0.05of p    
Av means average values  
1.8.2 Application to real life data 
This study illustrated the application to real life data by using data from the National Income 
Dynamics Study (NIDS), a panel study conducted in South Africa. The panel study takes the 
personal records of 28,000 South Africans starting in 2008 to answer policy and research questions. 
The data helps to put the spotlight on who is getting ahead and who is falling behind and what 
factors might be contributing to their state. The NIDS data constitute areas such as health, 
University of Ghana http://ugspace.ug.edu.gh
 11  
 
education, labour market and birth history. Two periods were chosen for this study: 2008 
representing Wave 1 and 2011 representing Wave 2. 
This study focused on the employment status of persons selected for the exercise. The binary 
response variable measured therefore was whether an individual was employed or not at the time 
of visit. Specifically, (Not employed=0, Employed=1) 
Covariates chosen for this exercise include  
i) Sex of respondent ( Male= 0, Female = 1) 
ii) Ever been to school (No = 0,Yes = 1 ) and 
iii) Age (17-25 years = 0, 26-51years = 1) 
The covariates chosen were tested to find out whether there is significant impact for conditional, 
marginal models and joint models. 
Extracts done included finding out the number of persons who had 
i. not been employed for the two periods,  
ii. not been employed in the first period but employed in the second period , 
iii. been employed in the first period but not in the second, 
iv. been employed for the two periods, with their respective probabilities as shown in  Table 
1.2 
Table 1.2: Transition counts and respective probabilities on the employment status for the 
two periods 
  Wave 2 
 Transition count  Transition probability  
Wave 1 0 1 Total  0 1 Total  
0 No.(0,0) No.(0,1)   Prob.(0,0) Prob.(0,1)   
1 No.(1,0) No.(1,1)   Prob.(1,0) Prob.(1,1)   
No. means number and Prob. means probability 
University of Ghana http://ugspace.ug.edu.gh
 12  
 
Additionally, the study  
i. found out whether there exist dependence  between the response variables for the two 
periods;  
ii. found the estimated odds ratio for the employment status in Wave 1 and Wave 2. This 
was done to be find out  the proportion of the concordant and the discordant responses;  
iii. found the Marshall-Olkin correlation coefficient in order to find out whether there is 
correlation between  the two time points; 
iv. found out whether the covariates show significance for the traditional conditional, 
marginal models as well the proposed  joint model;  
v. Tested the overall fit of the proposed model using the likelihood ratio test. 
 Table 1.3 provides the summary for testing the usefulness of the models 
Table 1.3: Summary table for testing the usefulness of the models 
 Traditional models  
 Conditional   Marginal  
 
Model 
01 j  
 
Model 
11 j  
 
Model 
1 j -GEE 
Covariates 
01 j  
se p-value  
11 j  
se p-value  
1 j  
se p-value 
Constant            
Age            
Education            
Sex            
 
Log-
likelihood 
   
 
 
 
  
 
University of Ghana http://ugspace.ug.edu.gh
 13  
 
Generalized Bernoulli model 
 
Model 
01 j  
 
 Model 
11 j  
 
Model 
1 j  
Covariates 
01 j  
se p-value  
11 j  
se p-value  
1 j  
se p-value 
Age            
Education            
Sex            
Log-
likelihood 
     
Likelihood 
ratio test 
( proposed 
model vrs 
traditional 
methods 
     
 
1.9 Definition of some relevant terms 
Two definitions are provided; first the major difference between marginal and conditional 
models and second, longitudinal studies. 
1.9.1 Marginal and conditional models 
The major difference between marginal and conditional models depends on whether the regression 
coefficients are intended to describe the marginal response or an individual’s response to changing 
covariates, that is, one that does not attempt to control for unobserved subjects’ random effects. 
For example, a marginal gender contrast compares the mean among men to that among women, 
while a conditional gender contrast compares the mean among men to that among women holding 
the same value of a random effect (a particular value that corresponds to each individual). 
Consider the following normal models: First the conditional model 
( / )ij i ij iE Y X    
University of Ghana http://ugspace.ug.edu.gh
 14  
 
and then the marginal model  
( )ij ijE Y X                                                                                Lee & Nelder (2004).  
        
1.9.2 Longitudinal studies  
Longitudinal studies “are studies in which the outcome variable is repeatedly measured on two or 
more occasions over time” (Wilson & Lorenz, 2015). 
 
 
 
 
 
 
 
 
 
 
 
 
University of Ghana http://ugspace.ug.edu.gh
 15  
 
CHAPTER TWO 
LITERATURE REVIEW 
2.1 Background  
The generalized linear models by McCullagh and Nelder (1989) include special cases such as logit 
models, probit models and multinomial response models for quantal responses. On the subject 
matter of models for binary responses, the writers advocated that, to investigate the relationship 
between the response probability and the vector of covariates, it is expedient to come up with a 
model that can describe the effect on the response probability as a result of changes in the vector 
of covariates. They added that, in practice this formal model usually embodies assumptions such 
as correlation or independence, additivity and so on. They emphasized that the assumptions cannot 
be taken for granted and should be checked if possible. Furthermore, the behavior of the model as 
far as possible should be consistent with known mathematical, physical, or biological laws, 
especially in its limiting behavior. 
When we have a study where each subject to be investigated  has only one observation , generalized 
linear models can be used, however with repeated observations, possible correlation among the 
observations must be checked before further analysis is done. 
Liang and Zeger (1986) introduced an extension of general linear models to the analysis of 
longitudinal data .They introduced a class of estimating equations that give regression parameters 
that have consistent estimates. However, one objective of statistical analysis is to account for the 
correlation of repeated outcomes for subjects in order to arrive at the right conclusion when dealing 
with repeated outcome observations and a set of covariates. Zeger and Liang (1986) proposed a 
methodology for discrete and continuous longitudinal data that employs the quasi-likelihood 
approach because of the scarcity of multivariate distributions for non-normal distributed data. They 
University of Ghana http://ugspace.ug.edu.gh
 16  
 
specified that there should be a known function of the marginal expectation of the response 
variable that exhibits a linear function of the explanatory variables. They also assumed that the 
variance is a known function of the mean. In addition, they specify a “working correlation matrix 
for observations for each subject”. The above set up leads to generalized estimating equations 
which produces consistent estimators of the regression coefficients and of their variances.  
For example, with binary outcomes it is assummed that the logit of the probability of response p  
depends linearly on the covariates, the variance is just (1 )p p .This partial specification of the 
outcome distribution results in simple techniques for regression analyses of Gaussian, Gamma, 
Poisson, binomial and ordinal data (McCullagh and Nelder, 1983, as cited in Zeger and Liang, 
1986).   
2.2 Marginal models 
2.2.1 Response probabilities and correlations 
The Generalized Estimating Equations (GEEs) are a widely used method for estimating the 
covariate coefficients of marginal models for repeated binary responses. In particular, Liang and 
Zeger (1986) and Prentice (1988) have introduced moment based GEEs, which only required a 
specification of the form of the first two moments (the probabilities of success and their 
correlations) of the vector of binary responses (Fitzmaurice, & Lipsitz, (1995).  
The GEE models are built on probability of an event of interest and correlations as espoused by 
Liang and Zeger (1986) and Prentice (1988). 
Specifically, when regression is the primary focus, Liang and Zeger (1986) presented an extension 
of generalized linear models to the study of longitudinal data. Their approach employed a 
generalized linear model for the marginal distribution of 
ity , where ( 1, .....,  )iity t n They did not 
University of Ghana http://ugspace.ug.edu.gh
 17  
 
indicate any form of joint distribution for the repeated measures. However, they introduced 
estimating equations that resulted in consistent estimates of the regression parameters and of their 
variances .They modeled the marginal distribution, but did not consider the conditional distribution 
which takes into account previous observations. The methods they proposed reduced to maximum 
likelihood when the 
ity  are multivariate Gaussian. 
Some have argued that binary response models that condition on all or some of binary responses 
variables are useful for studying only particular types of dependencies. To estimate marginal 
response probabilities, the conditioning is not needed. In cases where the analysis involves paired 
data, parametric methods to these data appear not to be complicated, otherwise it is complicated. 
Hence, Prentice (1988) advocated a generalized estimating equation approach for inference on 
response probabilities and correlations.  
 2.2.2 Marginal odd ratios 
 Instead of using marginal correlations in modelling the association between a pair of binary 
responses ,Lipsitz et al. (1991), Liang et al. (1992) and Carey et al. (1993) proposed the use of a 
marginal odds ratio. To them, the odds ratio is a natural metric for measuring association and may 
also be easier to interpret, when dealing with binary responses.   
Fitzmaurice, & Lipsitz, (1995) also proposed a serial pattern models for the odd ratio. An attractive 
feature of these models is that they allow the repeated responses for each individual to be unevenly 
spaced in time and the times of measurement to vary between individuals. Also, the number of 
observations per individual may vary. 
 
 
University of Ghana http://ugspace.ug.edu.gh
 18  
 
2.2.3 Markov chains 
Azzalini (1994) proposed a marginal model which is founded on the binary Markov chain for a 
solitary stationary process. He proposed a stochastic model for the “study of the influence of time-
dependent covariates on the marginal distribution of the binary response in serially correlated 
binary data. Markov chains were expressed in terms of transitional rather than marginal 
probabilities”. 
2.2.4 Quadratic exponential model 
Cox (1972), Zhao and Prentice (1990) and Cox and Wermuth (1994) proposed different models 
using the quadratic exponential model approach. 
In suggesting likely distributions for multivariate binary data, Cox (1972) proposed that logistic 
models, in which we have the logarithm for the joint response probability to be “a quadratic 
function of the response vector are probably the simplest”. He however, proposed that, some 
conditional distributions from this set of models have a predominantly simple form, whereas 
marginal probabilities do not. 
Zhao and Prentice (1990),  went ahead to suggest a re-parameterized version of a probability 
distribution of this quadratic exponential form done in terms of marginal parameters and ways of  
estimating  such parameters. 
Cox and Wermuth (1994) on the other hand, proposed the joint distribution of k  binary variables 
“in the quadratic exponential form containing only main effects and two-factor interactions in the 
log probabilities. Approximate versions of marginalized forms of the distribution were studied 
based on Taylor expansion”.  
 
University of Ghana http://ugspace.ug.edu.gh
 19  
 
2.2.5 Generalized multivariate logistic model 
In another attempt, Glonek and McCullagh (1995) and Glonek (1996) proposed a generalized 
multivariate logistic model. 
Glonek and McCullagh (1995) compared the multivariate logistic regression models to that of the 
log-linear models, and argued that it seems the log-linear approach had gained acceptance largely 
because of its convenient theoretical properties and the relative ease with which models can be 
stated and its maximum likelihood found. To restore the balance they defined the class of 
multivariate logistic regression models in a way that leads to a computational scheme that is 
feasible for problems of moderate size. 
Again, Glonek (1996) on the other hand, discussed a set of “link functions that lie between the two 
extremes of the multivariate logistic transform” established by McCullagh and Nelder (1989) and 
the log-linear contingency table analysis breakdown. Models resulting from the link functions 
established from Glonek (1996) showed various desired properties of both the log-linear regression 
model and the multivariate logistic regression models. The computation for implementing these 
models were more manageable than that for the multivariate logistic regression models. 
2.2.6 Multivariate plackett distribution 
Employing a multivariate Plackett distribution for correlated data, Molenberghs and Lesaffre 
(1994) developed a marginal model. They used three link functions for association and marginal 
parameters.  
2.2.7 Bivariate or multivariate Bernoulli approaches 
 A few studies on accounts of bivariate or multivariate Bernoulli approaches such as Lin and 
Clayton (2005), Lovison (2006) and Sun et al. (2007) were introduced. 
University of Ghana http://ugspace.ug.edu.gh
 20  
 
Lin and Clayton (2005) described the use of quasi-likelihood estimating equations for spatially 
correlated binary data. In their paper, a logistic function is used to model marginal probability of 
binary outcomes in terms of regression parameters.  
Lovison (2006), on the other hand proposed a matrix-valued Bernoulli distribution, based on the 
log-linear representation introduced by Cox (1972) for “the Multivariate Bernoulli distribution 
with correlated components”. 
Sun et al. (2007) presented a new model called the “multivariate Bernoulli mixtures of normal”, 
and then compared it with “normal component mixtures-of-experts” and the “Rubin–Wu model”. 
This he did by investigating the joint distribution of observed data looking at each subject under 
each model.  
2.3 Conditional models 
 Compared to the marginal models, a relatively small number of studies have been conducted on 
the conditional approaches. Among some of the studies on conditional approaches are Muenz and 
Rubinstein (1995), Bonney (1986), Bonney (1987) and Islam and Chowdhury (2006).  
Muenz and Rubinstein (1995), assumed individuals from a heterogeneous group who are studied 
over time each having two states “ 0” or  “1”  and also  assumed  a binary Markov chain when the 
sequence of their state is taken. Through the use of two logistic regressions Muenz and Rubinstein 
(1995) modeled the transition probabilities for  “0” to “0” and “1” to “0”transitions . They showed 
how the explanatory variables relate to changes in state. Muenz and Rubinstein (1995), also 
showed “how to use transition probability estimates to test hypotheses about the probability of 
occupying state “0” at time     2, ..,  i i T   and the equilibrium probability of state 0”.  
University of Ghana http://ugspace.ug.edu.gh
 21  
 
Bonney (1986) used the regressive logistic models in the study of familial disease and other binary 
traits. According to him, regressive logistic models are formed by decomposing the probability of 
the binary response variable conditioned on all the explanatory variables. 
Bonney (1987) expressed the likelihood of a set of binary dependent outcomes that have or do not 
have covariates, as a product of conditional probabilities each assumed to be logistic; these models 
are known as regressive logistic models. They provide a simple form of the multivariate 
distribution. His paper was intended to stimulate the development and usage of regressive logistic 
models. 
For higher order Markov chains especially, Islam and Chowdhury (2006), sought to propose 
accurate approaches “for linking covariate dependence with transition probabilities” so as to 
analyze the factors associated with such transitions. It should be noted that the probability of a 
transition of a point at a time to another at another time hence involves conditional modeling. 
2.4 Measuring dependence 
For correlated binary data, it is worth noting that Le Cessie and van Houwelingen (1994) proposed 
measures of dependence using logistic regression.  
According to Le Cessie and Houwelingen (1994), the dependence between a pair of observations 
can be quantified. Prentice (1988) considered the correlation between
1Y  and 2Y . This is 
instinctively attractive but the correlation is restricted, and the restrictions depend on the marginals 
P1 and P2. Therefore, they proposed two other measures whose range does not depend on the 
marginals: the tetrachoric correlation and the odds ratio. This study however, introduces measure 
of dependence by expressing the bivariate Bernoulli distribution in the exponential family form 
and picking the link function for 
1Y  and 2Y . 
University of Ghana http://ugspace.ug.edu.gh
 22  
 
2.5 Joint models 
The marginal as well as conditional measures may fail to measure the dependence of binary 
outcomes owing to the deficiency of proper specification of the underlying model. Accordingly, a 
joint model is proposed in this study which takes into consideration both marginal and conditional 
probabilities of correlated binary events of interest. The joint function was specified by putting 
together marginal and conditional probabilities. In addition, the overall fit of the proposed model 
was tested in this study.  
2.6 Methodology  
2.6.1 Simulation 
Leisch et al. (1998) dealt with binary variables, typically encoded by the set {0,1}. In the least 
complex case the variables are scalar and, the equivalent distribution is likewise known as the 
alternative distribution. In their illustration, binary random variables were represented byA ,B  or
1A , 2A ,…….., respectively. True values of these random variables were represented by 
corresponding lower case letters. “The alternative distribution is fully determined by the single 
value ( 1)AP P A  ”, which is also the expectation of A , i.e. E(A)=PA . The variance is given by    
var( ) (1 )A AA P P ; for convenience they  wrote    A A1 A 0 .P Pq      
According to Leisch et al. (1998) ,given that “two binary random variables A  and B  are not 
necessarily  independent ,then the joint distribution of A  and B ” is completely defined by 
AP , BP  
and either
ABP , |A BP  or |B AP , where 
 
|
|
 ( 1, 1)
( 1| 1)
( 1| 1)
AB
A B
B A
P P A B
P P A B
P P A A
  
  
  
 (1) 
University of Ghana http://ugspace.ug.edu.gh
 23  
 
Bayes’ theorem may be used to express the remaining probabilities
| |AB A B B B A AP P P P P    or 
relationships like ( 1, 0) A ABP A B P P   . Leisch et al. (1998) also said, the bivariate binary 
distribution can simply be summed up to “the multivariate case, where 
1 {0,1}( ,....., ) ' ddA A A  is 
a vector with (possibly dependent) binary components. For a full description of an unrestricted 
distribution of A we need 2 -1d  parameters, e.g., the probabilities of all 2d  possible values of A 
(the last probability is determined by the condition that the sum equals 1)”. 
They came up with two ways of generating binary random variates namely  
(iii) The inversion method and   
(iv) “Direct Conversion of Real-Valued Random Variates”;  
 
(i) The Inversion method 
According to Leisch et al. (1998), “the task of generating a sample of size n of a binary random 
variable A   with given 
AP   can be accomplished if a random number generator is available which 
produces random numbers U  which are uniformly distributed in [0, 1]”. If we want to create a 
sample 
1a ,..........an   of A  we generate a sample 1u ,.........un  of  U  and set 
 ,  
,
,  
 0  
 1  
i A
i
i A
u qa u q
   
 (2) 
Referring to Leisch et al. (1998), they assigned M  to denote an arbitrary discrete random variable 
taking values in Z (the set of integers) with probabilities ( )iP P M i  . Then if we apply the 
uniform random variable U, then the transformation 
i i ii M i M
P U P
 
  
  can be used. 
University of Ghana http://ugspace.ug.edu.gh
 24  
 
However, the above relationship cannot easily be solved for U  or M , because for each 
transformation a search in the table of 
'i sP  has to be undertaken. So Leisch et al. (1998), went 
ahead “to generate a random number m for M” by first generating a random number u for U  and 
then setting m j  if   
i i ii j i j
P u P
 
  
. 
Using the above expression, d-dimensional binary data can now be generated when the 
probabilities of all d-tuples of {0,1} are specified. Leisch et al. (1998) provided that for example,  
“ 2 d  , one has to specify ( 0, 0), ( 0, 1), ( 1, 0)P A B P A B P A B     , and ( 1, 1)A B  . All 
d-tuples can be enumerated by interpreting the binary vector as a binary number, resulting in the 
index numbers {0,.........2 1}d  and corresponding probabilities (0),......., (2 1)dP P ”.They 
continued by providing another way of expressing the d-tuples. This they did by relying on the 
size of the
iP , bringing about computational gains. 
Leisch et al. (1998) continued by stating that specifying directly all 
iP  is simply impossible for 
small d , this is because of the exponential growth of the number of 'iP s . “In practice a lower-
dimensional parameterization of the 
iP   is used such that for some function f  
( , )iP f i   , 1( ,......, ) 'n     with the dimension  2dn   of the new parameter vector   much 
smaller than 2d . If pairwise combinations of components are given by their correlation, then only 
 2On d parameters have to be specified. E.g., autologistic models fall into this category”. 
 
They asserted that “the inversion method is very powerful, because it can generate random values 
from arbitrary distributions. On the other hand, it requires a search in 2d  values for each random 
number generated, which makes it rather slow”. 
 
University of Ghana http://ugspace.ug.edu.gh
 25  
 
(ii) Direct conversion of real-valued random variables 
Leisch et al. (1998) established that using a direct conversion of real-valued random variables 
provides a faster method for generating samples from a binary vector
, ...........,1( )dA A A . They 
initially assigned “
1( ,......, )dX X X  to be a d-dimensional normally distributed vector with mean 
  and covariance matrix . Here, normally distributed random variates can easily be transformed 
to binary values by componentwise threshold in 1 0i ia x   Due to the construction
( 1) ( 0)Ai i iP P A P X    and  ( 1, 1) ( 0, 0)AiAj i j i jP P A A P X X     where ( 0)iP X    
depends, for fixed variances, only on
i  whereas ( 0, 0)ji jP X X   depends on i , j  and on the 
correlation between 
iX  and jX ” . 
Generating random numbers using a normal distribution is easier and faster because only one 
additional comparison and assignment per value is undertaken. Leisch et al. (1998) established 
that “if only random numbers from the standard normal distribution (zero mean, unit covariance 
matrix) can be generated, then they have to be transformed to general multivariate normals. Let 
   be the (matrix) square root of   such that      
' = . Then ( , )Y X N     if ~ (0,1)X N ”. 
Determination of   and    
Leisch et al. (1998) proposed the determination of   and  as follows: 
“Let 
iY  be a 1-dimensional normally distributed random variable with mean i  and unit variance. 
Hence, ( 0) ( ) (( ) )i i i i i iP Y P Y P Y        
University of Ghana http://ugspace.ug.edu.gh
 26  
 
where the second equality holds, because ( )i iY   is normally distributed with zero mean. They 
choose 
i   to be the AiP -quantile of the standard normal distribution and restricted all variances to 
1, so that ( 0)i AiP Y P    . The desired marginal probabilities AiP are used to determine the mean 
vector   for the components of A . Allowing for different variances of  the iY  introduces additional 
degrees of freedom (and therefore modeling capabilities)”.  
Leisch et al. (1998) admitted that left out is “the relation between the covariance matrix  
b
 of 
the binary variables and the covariance matrix   of the normal distribution. They argued that 
using the option of specifying a covariance matrix, only pairwise relations between the 
components of the d-dimensional sample can be specified”. So, for ease of notation they restricted 
themselves to the bivariate case. 
The correlation coefficient 
ABp
 of the binary random variables A  and B   can be expressed as     
                                  
AB A B
AB
A A B B
p p pr p q p q

 (3) 
 
such that 
 
AB AB A A B B A Bp r p q p q p p
 (4) 
                                         
Leisch et al. (1998) continued by saying that, “if A   and B  are converted from two normal random 
variables X  and Y  as described above, 
then 
ABp  can be related to the normal distribution by 
University of Ghana http://ugspace.ug.edu.gh
 27  
 
 
, ,, 0) ( , )( 0 (x y x yAB YP P X P X Y L           
(5) 
 
Where 
1 1 2 2 2 2 1 1 1 1( , ) ( ; ) ( / )P Y y Y y x P Y y Y y x P Y y x     and yY Y      have a standard 
bivariate normal distribution with correlation coefficient 
xy      ; and 
 
)( , , ) ( , ; ), ( h kL h k P X h Y k x y dydx         
(6) 
 
with        
 
2 2
2 2
( )
1 2
, ; exp
1 2(1 )
x xy y
x y

 
   
  
    
 (7) 
 
being the density function of ( , )"X Y . 
Leisch et al. (1998) indicated that “the values of )( , ;L h k   are tabulated (see the references in Patel 
& Read, 1982, p. 293f). It can also be obtained by numerical integration or Monte Carlo simulation. 
For 
x = y = 0 there is the simple relation 
 
1 arcsin( )(0,0, )
4 2AB
P L

    (8) 
 
or 
 
( )1sin 2 "
4AB
P     
 (9) 
 
 
University of Ghana http://ugspace.ug.edu.gh
 28  
 
Pseudocode for direct conversion 
Leisch et al. (1998) proposed an algorithm for generating a d -dimensional binary vector A  with 
given first and second moments
,( )Ai Ai AjP P P : 
1. “ Choose the marginal probabilities 
AiP  and the covariance matrix b , 
2. Set 1( )i AiP    where   is the standard normal distribution function,  
      3.   Find an appropriate covariance matrix    for the normal distribution, 
      4.    Generate a sample of a d -dimensional normal variables X  with mean   and covariance 
           matrix  and 
      5. Set 1 0i ia x   ”. 
 
Leisch et al. (1998) continued by specifying pairwise relations: 
The pairwise relations between the variables 
, 1,....,iA i d  can be gotten, by specifying the 
covariance matrix 
b  or the pairwise probabilities  P , i jAiAj    . Leisch et al. (1998) indicated that 
one has to make sure that the values specified are valid in both options. 
They explained that a covariance matrix 
b  is valid if and only if it is symmetric and positive 
definite. In other words, all the eigenvalues of 
b  have to be positive. This property they said can 
be checked easily. However, if the matrix is not positive definite, they did not have a clear solution 
of which elements of the matrix to change in order to turn the matrix into a positive definite one. 
Ai; i = 1; : : :; d
Using the option of specifying pairwise probabilities, they were not aware of any convincing 
conditions that will lead to the validity of the specification. They however, provided some 
conditions for the pairwise probabilities: 
University of Ghana http://ugspace.ug.edu.gh
 29  
 
“ From 
 
|AB A B A AP P P P  
(10) 
              
we get 
 min( )AB A BP P P                                                                                                                                (11) 
                                                                                                                                
                                                                                
Letting  
A BP   be the probability that at least one of A  and B  equals 1. Then, we get 
 
1 A B A B A BP P P P P                                                                                                                       
(12) 
 
  which gives 
 ,max( 1 0)AB A BP P P                                                                                                                     (13) 
 
 Fulfilling conditions (11) & (13) is not sufficient as is shown by the following simple example. 
Let 1/ 2A B CP P P    and 0AB AC BCP P P  . Then, ( A ) & ( B ) are fulfilled and one can 
easily construct a 2-dimensional binary distribution which is valid for 
AP , BP , and ABP   by 
specifying that the pairs (0,1)  and (1,0)   have a probability of 1/ 2  each. However, there is no 
way to add the third variable C . 
The reason is that 
University of Ghana http://ugspace.ug.edu.gh
 30  
 
 1 A B C AB AC BC ABCP P P P P P P      
(14) 
    
A B C AB AC BCP P P P P P      
(15) 
 
which is not fulfilled in the above example ”. 
Leisch et al. (1998) concluded that the illustrated example above can be applied to d -dimensions 
by setting 1/ ( 1), 1,....,AiP d i d    and 0,AiAjP i j   . After this, for any ( 1)d    of the d  
variables, a distribution can be extracted that satisfies all pairwise conditions by giving every 
( 1)d  -tuple with exactly 1 the probability1/ ( 1)d  , however, they found no valid distribution 
for all d variables. 
Hence, to define a d-dimensional binary distribution by specifying , 1,....,AiP i d  and ,AiAjP i j  , 
they proposed that we have to ensure that for all 2 k d     and all possible choices 1 , ...., kJ J   of 
k   elements out of d   the condition 
 
1 , 1,
1
i i l i lAji AjiAjl
k n
P P
  
     is fulfilled. 
2.6.2 Model checking  
According to McCullagh and Nelder (1989), after a selection of a suitable model, the data used to 
test the model may indicate that the particular model selected is not useful. It is likely that the data 
as a whole shows some departure from the fitted values, or it may be that a few data values depart 
from the fitted values. In the view of the writers, the detection of both systematic and isolated 
inconsistencies is part of the technique of model checking. To exemplify a systematic discrepancy 
they considered a plot of the residuals r against x. 
University of Ghana http://ugspace.ug.edu.gh
 31  
 
“McCullagh and Nelder (1989) also said that “having selected a particular model, it is required to 
estimate the parameters and to assess the precision of the estimates.” This study will use various 
methods for checking the usefulness of the proposed model (as discussed under the approach 
section) as well as estimate parameters and assess its precision as advocated by McCullagh and 
Nelder. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
University of Ghana http://ugspace.ug.edu.gh
 32  
 
CHAPTER THREE 
THE PROPOSED MODEL 
3.1 The Bivariate Bernoulli model 
Marginal or conditional approaches have been proposed as discussed under the literature review 
in addressing binary responses in repeated studies. Some joint models have been considered as 
well but the models could not be made useful due to their limitations in estimating the parameters 
in light of the practical utility of such models to real life data. In this section, we propose the model 
based on the marginal-conditional approach to obtain joint models. 
In the univariate case, some distributions such as the binomial, Poisson, negative binomial, 
hypergeometric, Gamma and normal distributions originated from the Bernoulli distribution. 
These distributions are obtained as sums or limits, thereby forming an interrelated family of 
distributions (Marshall & Olkin, 1985). 
If  ( , ) X Y  has Bernoulli marginals then ( , ) X Y   has only four possible values (0,0), (0,1), (1,0) and
(1,1) .  Let  
    
00{( , ) (0,0)} ,  P X Y P   01{( , ) (0,0)}P X Y P  
    
10{( , ) (1,0)}P X Y P  ,      11{( , ) (1,1)}P X Y P                     (Marshall & Olkin, 1985) 
Islam et al. (2013) initiated the proposed model as follows: 
The bivariate Bernoulli distribution for outcomes 
1Y  and 2Y  can be expressed as   
   1 2 1 2 1 2 1 21 (1 )(1 ) (1 )( ) (1 )1 2 2 00 0 10 1, y y y y y y y yP Y y Y y p p p p                                        (16) 
  
University of Ghana http://ugspace.ug.edu.gh
 33  
 
The bivariate Bernoulli distribution can be expressed in a 2 X 2 contingency table as follows: 
Table 3.1: 2 x 2 contingency table for the bivariate Bernoulli distribution 
 
 
1Y   
2Y  
               0                                     1   
 
Total 
0  00P  01P  1 0( 0)P Y P    
1  10P    11P  1 1( 1)P Y P   
Total  
2 0( 0)P Y P   2 1( 1)P Y P   1  
 
The joint probability can be derived from the marginal and conditional and probabilities as  
 
1 1 2 2 2 2 1 1 1 1( , ) ( ) ( )P Y y Y y P Y y Y y P Y y                                                             (17) 
 
The bivariate probabilities as a function of covariate X are as follows: 
 
1 1 2 2 2 2 1 1 1 1( , ) ( ; ) ( / )P Y y Y y x P Y y Y y x P Y y x     
                             (18) 
 
The joint probability mass function in Equation (16) can be expressed in terms of the exponential 
family for the generalized linear models as 
 
 1 1 2 2 1 2 00 1 2 01 1 1 2 10 1 2 11, exp[(1 )(1 ) log (1 ) log ( ) log log ]P Y y Y y y y P y y P y y y P y y P          
(19) 
 
University of Ghana http://ugspace.ug.edu.gh
 34  
 
 
00 2 00 1 00 1 2 00 2 01 1 2 01 1 10 1 2 10 1 2 11exp[(log log log log log log log log log )]P y P y P y y P y P y y P y P y y P y y P         
(20) 
   
 
011 10 1 00 2 01 2 00 1 2 00 1 2 1 2 10 1 2 11 00exp[( log log log log log log log log log )]y P y P y P y P y y P y y P y y P y y P P        
 
(21) 
 
 
 
10 01 00 11
1 2 1 2 001 1 2 2
00 00 10 01
( , ) exp log log log logP P P PP Y y Y y y y y y P
    
   
              
       (22) 
 
 
   
                        1 2, (0,0),(0,1),(1,0),(1,1), 1y y pijij 
 
Let us consider a sample of size n then the log likelihood function in this case is given by 
 
10 01 00 11
1 2 1 2 00
00 00 01 10
log log log log
1 1
i i i i
i i i i i
i i i i
n n P P P Pl l y y y y Pii i
                  
       
   (23) 
 
It follows that the components of the link function can be denoted as follows: 
  00 1101 10000 1 2 300 00 01 10,  ,  ,    log log log and log
P PP PP   
         
     
   
 
where 
0
is the base line link function, 
2
 is the link function for 
1 Y  , 1
 is the link function for 
2Y   and 3
is the link function for dependence between  
1Y  and 2Y . 
We have demonstrated the probabilities without function of covariates in the previous expressions. 
Now let us consider 
1 2X (1,  ,  ,. . , )pX X X  and 1 2x (1,  ,  ,. . , )px x x  where * 1 2X (  ,  ,. . , )pX X X
is the vector of covariates and 
* 1 2x (  ,  ,. . , )px x x is the vector corresponding values of the 
covariates. We can now express the conditional probabilities in terms of the logit link functions as 
follows: 
University of Ghana http://ugspace.ug.edu.gh
 35  
 
 01
2 1 01
01
( 1/ 0, ) ( )
1
xeP Y Y x xxe

   
 (24) 
 
 
  
 
 
11
2 1 11
11
( 1/ 1, ) ( )
1
xeP Y Y x xxe

   
                                                             (25) 
 
 
2 1 00
01
1( 0 / 0, ) ( )
1
P Y Y x xxe
   
                                                         (26) 
 
and 
 
2 1 10
11
1( 0 / 1, ) ( )
1
P Y Y x xxe
   
                                                          (27) 
 
where  
01 010 011 012 01, , , ...( ) 'p       and  11 110 111 112 11, , , ...( ) 'p        
The marginal probabilities are as follows 
 
  1 11/    ( )P Y X x x      and     1 10 / ) 1 ( )P Y X x x                                   (28) 
 
 
Now, we assume that 
 1
1 1
1
( 1/  )     ( )
1
xeP Y x xxe

  
   and     
1 1
1
1( 0 /  )     1 ( )
1
P Y x xxe
  
                           (29) 
 
  
University of Ghana http://ugspace.ug.edu.gh
 36  
 
where 
1 10 11 12 1, , , ...( ) 'p       
Also we can write 
 
 01
01 2 1 1
01 1
.
1( ) ( 1/ 0, ). ( 0 / )
11
xeP x P Y Y X x P Y X x x xee

        
  
00 2 1
01 1
1 .1 1( ) ( 0 / 0, ). ( 0 / )
11
P x P Y Y X x P Y X x x xee  
      

  
11 1
11 2 1 1
11 1
.( ) ( 1/ 1, ). ( 1/ )
1 1
x xe eP x P Y Y X x P Y X x x xe e
 
         
  
1
10 2 1 1
11 1
.
1( ) ( 0 / 1, ). ( 1/ )
1 1
xeP x P Y Y X x P Y X x x xe e

         
  
(30) 
            
 Now we can show that  
000 101 1 01
.
11 1 1( ) ln( ( )) ln ln ln
111 1
x P x xx x x eee e
   
     
     
         
   
 
 
             
 
1
0
01( ) ln(1 ) ln(1 );x xx e e                      
(31) 
 
             
01
01 01 1
1
01 100
. .
( ) 1( ) ln ln 1 .1
( ) 11
xP x e x xx e e
x xP x ee

 
 
 
 
 
 
 
     
  
 
 
011 ( ) ;x x 
 (32) 
 
 
University of Ghana http://ugspace.ug.edu.gh
 37  
 
1
01
2 11
. .
1( ) ln 1
1
xe xx exe

 
 
 
 
 
 
 

 
011
2 11
1( ) ln ln ln(1 )
1
xxx e exe
 
 
 
 
 
   

 
 
01 1112 ( ) ln(1 ) ln(1 );x xx x e e       
 (33) 
 
 
01 111 1
00 11
3 01 1 11 101 1 11 101 10
( ) ( ) 1 11 1( ) ln ln . . . . . .
( ) ( ) 1 1 111 1 11
x xx xP x P x e ee ex x x x xx x x x e e eee e ee
  
       
 
 
 
 
  
     
11
3 01
( ) ln
xex xe

 
 
 
  
 
 
11 013 ( ) ( )x x   
                                                                                                     (34) 
 
This indicates that if there is no association between 
1Y   and 2Y   then this is true for 01 11    
01  and 11 are the estimated covariate coefficients for the conditional logit models ,given 
covariates X  for 
1Y  = 0 and 2Y  = 1 , respectively. The assumption underlying the covariates is 
that they are time independent.  
It is noteworthy that Glonek and McCullagh 1995 ( as cited in Islam et al., 2013 ) considered a 
separate model for the measure of association instead of the dependence link function as a function 
of two conditional models as shown in equations (31),(32), (33) and (34). In addition, they 
considered marginal models for the link functions for 
1Y  and 2Y . The proposed model employs the 
conditional and marginal models for the outcome variables of interest and thus the measure of 
University of Ghana http://ugspace.ug.edu.gh
 38  
 
association can be linked with the link function as a function of conditional models, which provides 
a natural measure from the odds ratio.  
On the other hand, Muenz and Rubinstein in 1995 and Islam and Chowdhury in 2006 (as cited in 
Islam et al., 2013 ), showed the conditional models only and no attempts were made to obtain the 
joint mass function for the correlated binary data. Hence, the use of the proposed conditional and 
marginal models provide the necessary background to obtain the test for dependence in the 
repeated outcome variables based on the link functions presented in equation (34). This is a new 
formulation to measure the dependence in terms of the parameters of the conditional models 
obtained from the joint mass function.  
We can test the overall significance of a model using the likelihood ratio test and the dependence 
can be examined on the basis of 
3
.  
This study, came out with a method of estimating the parameters .First, substituting
0 ( )x
,
1 ( )x
2 ( )x
,
3 ( )x
into the likelihood equation (23) will give us the following:  
 01 01 111 01 11 011 21 1 21 1
ln(1 ) ln(1 ) ( ln(1 )) ln(1 ) ( )ii i i
n n x x x xl l e e y x e e y x y y xii i
                         
 
(35) 
 
 The estimating equations will be   
 
1 0101
2 1 2
01 01 011
exp( )exp( ) ,1 e ( ) exp( )
i i ii i
i i i i
i i
n
i
Y X Xl X X Y X Y Y XX
 
       
 (36) 
 
 
  11 1 2
11 111
exp( ) ,1 exp( )
i i
i i i
i
n
i
X Xl Y Y XX

   
 (37) 
 
  .  1 1
1 11
exp( )
1 exp( )
i i
i i
i
n
i
X Xl Y XX

    
 
(38) 
University of Ghana http://ugspace.ug.edu.gh
 39  
 
Now, because of the complexity of solving for the estimated values of the parameters, this study 
adopted the use of the Newton -Raphson and Nelder-Mead algorithms. Both methods are used to 
solve non-linear -optimization problems .The Newton-Raphson method requires the first and 
second derivatives of the function whereas the Nelder-Mead does not use derivatives but rather 
iterates on a simplex and then replaces the worst simplex. 
Finding the maximum of a function of k variables using the Newton-Raphson algorithm 
According to Quinn (2001), consider, ( ) f x a b x x Cx    , where a is a scalar, b  and x are k  
vectors, and C is a k x k symmetric, negative definite matrix. The gradient of f  at x is 
( )  2f x b Cx   . Since the gradient at the value that maximizes f will be a vector of zeros we 
 know that the maximizer  x  satisfies 0  2b C x  .  
Solving for  x  we would have 1 12 Cx b
  
.  
Since C is assumed to be negative definite we know that this is a maximum. 
Using the R language, an initial value is chosen and then, the function then iterates till the optimal 
value pops up. The results also show the number of iterations. 
The Nelder-Mead algorithm 
According to Gavin (2013), “the Nelder-Mead algorithm provides a means of minimizing an 
objective function of n design parameters,    1 2 T, ,· · ·, ,  nx x xf x x . The Nelder-Mead algorithm 
iterates on a simplex, which is a set of 1n  designs, (1) (2) ( 1)[x , x , · · · , x ]n. The Nelder-Mead 
University of Ghana http://ugspace.ug.edu.gh
 40  
 
algorithm specifies a sequence of steps for iteratively updating the worst design in the simplex
( 1)(x )n . 
The simplex may be thought of as a polygon with 1n  vertices. If n = 2, the simplex is a triangle, 
and the Nelder-Mead algorithm may be easily visualized. If n = 3, the simplex is a tetrahedron. 
Considering triangular simplexes, [u, v, w] in the 
1 2x x   plane, the triangle connecting them, and 
the objective function evaluated at the three points, f (u), f (v), and f (w).” It iteratively improve 
the vertices of the triangle in order to minimize f (x).  
Steps for one iteration of the Nelder-Mead algorithm 
Gavin (2013) provided the following 
“1. Sort the vertices such that f (u) < f (v) < f (w). Point u is the best point, point v is the next-to-
worst point, and point w is the worst point. 
2. Reflect the worst point, w, through the centroid of the remaining points (u and v) to obtain the 
reflected point, r, and evaluate f (r). If the cost at the reflected point, f (r), is between the best and 
next-to-worst cost (f (u) < f (r) < f (v)) then replace the worst point, w, with the reflected point, r, 
and go to step 5. 
3. If the cost at the reflected point, f (r), is better than f (u) (f (r) < f (u)) then extend the reflected 
point, r, further past the average of u and v, to point e, and evaluate f (e). 
(a) If the cost at the extended point, f (e), is better than the reflected point cost, f (r), then replace 
the worst point, w, with the extended point, e, and go to step 5. 
(b) Otherwise replace the worst point, w with the reflected point, r, and go to step 5. 
University of Ghana http://ugspace.ug.edu.gh
 41  
 
4. If the inequalities of steps 2 and 3 are not satisfied, then it is certain that the reflected point, r, is 
worse than the next-to-worst point, v, (f (r) > f (v)) and, a smaller value of f might be found between 
points w and r. So try to contract the worst point, w, to a point c between w and r, and evaluate f 
(w). The best distance along the line from w to r can be hard to determine, and, in general, it is not 
worth trying too hard to find this minimum. Typical values of c are one-quarter and three-quarters 
of the way from w to r. These are called inside and outside contraction points, 
         . . . 1  :,1 , 2  :,2 ,f f f f X f X and oc . 
(a) If the cost at the better of the two contraction points is better than the next-to-worst cost, min 
[ f (ci),  f (co)] < f (v), then replace w with the better contraction point, 
ic  or 0c  , and go to step 5. 
(b) Otherwise shrink the simplex into the best point, u, and go to step 5.” 
5. Check convergence  
According to Gavin (2013), “The simplex has converged if the simplex is “sufficiently small” and 
if function values at the simplex vertices are “sufficiently close.” “Sufficiently” is quantified by 
convergence tolerances Ex and Ef. These convergence tolerances are problem-dependent and can 
be user-specified. 
(a) If the largest difference between the adjacent vertices is less than Ex times the average of 
adjacent vertices, 
   
   
, ,2 max , , x
u v v w Eu v v w
 
                             for n = 2, as in this example 
University of Ghana http://ugspace.ug.edu.gh
 42  
 
(:,1: ) (:,2 : 1)2 max (:,1: ) (:,2 : 1) x
X n X n EX n X n
   
      for n ≥ 2, more generally 
and if the difference between worst and best function values is less than Ef  times the best function 
value, 
9
( ) ( ) ( ) 10 f
f w f u Ef u 
 
                     for n = 2, as in this example 
9
1 1
1
( ) ( ) ( ) 10 f
nf f Ef 
  
                   for n ≥ 2, more generally 
then the iterations have converged, and the algorithm is terminated. 
In the convergence criteria for n ≥ 2, the simplex X is a n-by-(n + 1) matrix, in which each column 
of  X is a vertex of the simplex. The objective function for each column of the simplex is in a 
vector f of dimension 1-by-(n + 1). The vector of objective functions is sorted in increasing order 
f (1) < f (2) < · · · < f (n+1). The columns of X correspond to the elements of 
         . . . 1  :,1 , 2  :,2 ,f f f f f X X etc. 
In this 2D example,  2x3, ,X u v w  and       1 3 ,  ,  .f f f    f u v w  
(b) If the number of function evaluations has exceeded a user-specified limit then the algorithm is 
terminated. If the convergence criteria are not met and if the number of function evaluations has 
not been exceeded, then the algorithm returns to step 1 for the next iteration.” 
  3.2 Comparison with an alternative test 
We have compared the proposed test for dependence in the bivariate Bernoulli outcome variables, 
University of Ghana http://ugspace.ug.edu.gh
 43  
 
1Y  and 2Y , with a test proposed by employing the regressive model (7, 8). The joint mass function 
for 
1Y  and 2Y   can be expressed as shown: 1 2 1 2 1( , ) ( ) ( | , )P y y P y P y yx x x
,where X = x is the 
vector of covariate values. It therefore follows that the regressive model includes the previous 
outcome, 
1Y , as a covariate, in addition to the explanatory variables, 1 2, ,. . ., ,px x x  as follows: 
 
 0 1 1 2
0 1 1
1
2 1 1
( .......... )
( .......... )( / , ) 1
p p
p p
x x y y
x x y
eP y y x
e
   
   
   
    
                                                                             (39) 
 
where 
0 1, ,. . ., p     and γ are the regressive model parameters. It is noteworthy that γ is the 
parameter associated with the outcome variable 
1Y  such that, H0: γ = 0 indicates a lack of 
dependence between 
1Y  and 2Y . However, one of the major limitations arises from the fact that 
dependence in 
1Y  and 2Y  relies on the dependence between the outcome variables and the 
covariates as well. Hence, in many instances, the regressive model (39) may fail to recognize the 
true nature of the relationship between 
1Y  and 2Y  in the presence of covariates 1 2, ,. . ., pX X X  in 
the model. However, in the proposed model, the generalized bivariate Bernoulli model clearly 
takes account of the association between the response variables and the covariates. 
 
3.3 Test for the model 
To test the overall fit of the joint model as function of the covariates, Islam et al. (2012) suggested 
the following: 
 _ _ _ _
01 11 10 0: , ,HH         
                  against       _
1 0: HH                                              
where 
University of Ghana http://ugspace.ug.edu.gh
 44  
 
 
 
 
011 012 0101
111 112 1111
1 1 1 2 11
, ,
, ,
, ,
_
_
_
p
p
p
   
   
     






 
 
Then  
 
   
010 110 1 0 01 11 12 ( , , ) ( , , ) ,InL InL      
         
 
 which is asymptotically distributed as
23 p
.                                                            
(40) 
 
 For testing  
0
1
_
_
: 0
: 0
H
H
H
H




 
 
Islam et al. (2012) also suggested the use of the Wald test statistic:   
 
^
^ ^
( )
H
H
W
se



                                       (41) 
 
                                                         
 
 
 
 
 
 
University of Ghana http://ugspace.ug.edu.gh
 45  
 
CHAPTER FOUR 
SIMULATION 
4.1 Introduction 
In this chapter, simulation is undertaken to verify the usefulness of the proposed model. First, 
exploratory analysis is done to examine the characteristics of the data simulated .Second, with the 
same simulated data, the regressive, conditional logistic and the GEE models are thoroughly 
examined before the proposed model. Finally, the overall-fit of the proposed model compared to 
the traditional models (i.e. the regressive, conditional logistic and the GEE models) are then 
critically examined. 
Data generation 
For the purposes of investigating the reliability of the proposed model, correlated binary data were 
initially generated. The function rmvbin was used in generating correlated binary data. This 
function creates correlated multivariate binary random variables by thresholding a normal 
distribution. In this study, the correlated binary data was generated by specifying the marginal 
probabilities and   . The Sperman’s   is a non-parametric test used to measure the strength of 
association between two variables. 
Five different sets of models were generated using different marginal probability combinations for 
1Y  and 2Y   as shown under the Approach section (see 1.8). Details of the simulation results are 
shown in Tables A1a to A5e (See Appendix A) 
 
 
 
University of Ghana http://ugspace.ug.edu.gh
 46  
 
4.2 Exploratory analyses 
4.2.1 Odds ratio  
If  
1 0Y   then the odds of  2 0Y   versus 2 1Y   are 00 01/p p . Similarly, if  1Y =1 , then the odds of 
2 0Y   versus 2 1Y   are 10 11/p p . The “odds ratio “is the ratio of the odds of 2Y  when 1 0Y   to the 
odds of  
2Y  when 1 1Y  , hence 11 00
10 01
P P
P P
. If the odds ratio is greater than 1 then concordant responses 
i.e 
1 2( 0,  0)Y Y    or  1 2( 1,  1)Y Y    are more common than the discordant responses i.e 
1 2( 0,  1)Y Y    or 1 2( 1,  0)Y Y   . In contrast, when the odds ratio is less than 1, then discordant 
responses are more common. 
The simulation exercise showed that, models with low marginal probabilities i.e. 
1 2( 0.1 , 0.1)P P   , have common concordant responses than that of the discordant. The odds 
ratio actually increased from 2.57 at sample size 50 to 2.75 at sample size 1000. 
The highest odds ratio was observed when one of the marginal probabilities is very high while the 
other is very low, i.e.
1 2( 0.1 , 0.8)P P . At sample size 50, the odds ratio was 3.87 but decreased 
to 3.77 with sample size 100 and thereafter increased to 3.93 at sample size 200. Even though this 
group of marginal probabilities produced the highest odds ratio, no pattern was formed as the 
sample size increased. 
The lowest odds ratios were identified when the marginal probabilities had average probabilities, 
i.e
1 2( 0.5 , 0.5)P P  .They were in the range of 0.98 to 1.01, indicating that the number of 
concordant responses was either slightly higher than that of the discordant or the discordant 
responses slightly higher than the concordant responses. 
University of Ghana http://ugspace.ug.edu.gh
 47  
 
4.2.2 Marshall–Olkin correlation  
The model generated using low marginal probabilities i.e. 
1 2( 0.1 , 0.1)P P  , had  similar values 
of   and the Marshall–Olkin correlation was approximately 0.2. However, all other models 
produced different Marshall–Olkin correlation and   values. For instance, the model with 
probabilities 
1 2( 0.5 , 0.3)P P  recorded 0.3 for   but 0.03 or 0.04 for the Marshall–Olkin 
correlation depending on the sample size. Also, the model with marginal probabilities 
1 2( 0.1 , 0.8)P P   produced 0.1 for rho but 0.2 for for Marshall–Olkin correlation .This might 
be due to the fact that rho on one hand measures correlation using the ranks of the ordinal variable 
whereas Marshall–Olkin correlation uses the probabilities of concordant and discordant responses. 
The Marshall–Olkin correlations of  
1Y  on X  or 2Y  on X  given that the response variable is 
either 0 or 1 produced very small probabilities, indicating that the concordant as well as discordant 
responses were almost the same.  
4.2.3 
3Dependence value based on Ș 
All but one model had a non-zero value, indicating that there exist dependence between the 
response variables. The model with marginal probability 0.5 for both response variables 
1Y  and 2Y  
had a dependence value of approximately 0. A sample size of 50 for instance, produced a value of 
-0.02 whereas a sample size of 1000 produced a value of 0.0001.This indicates independence when 
the data was simulated with marginal probabilities of 0.5 for both response variables. This is rightly 
so because the marginal probabilities chosen to model the data, add up to one (1) meaning that the 
pairwise probabilities will be zero, hence indicating independence. 
 
 
University of Ghana http://ugspace.ug.edu.gh
 48  
 
4.3 Regressive model 
The fitted model is of the form 1
1 1 1
1
logit( ) log
1
Y   


   
 
 
     
  
, where 
1  represents the 
estimated probability of recording a “1” response at the second measurement. 
11   represents the 
estimated  probability of not recording a “1” response at the second measurement,  
1   represents 
the estimated parameter coefficient for covariate X and   represents the estimated parameter 
coefficient associated with the response variable 
1Y .   If  
  equals zero, then it shows that there exist 
no dependence between 
1Y  and 2Y .   
   denotes the value of the logit when the covariates are zero. 
The regressive model produced more than 90% of parameter estimates that fitted the available data 
when the sample size was 500. Model 1
1 2( 0.1 , 0.1)P P    and Model 4 1 2( 0.5 , 0.3)P P    for 
instance, had all sampled estimates simulated fitting the available data when the sample size was 
500.  
However, lower samples did not produce high percentages of parameter estimates that fitted the 
available data. A case in point is a sample size of 50, where Model 4 
1 2( 0.5 , 0.3)P P   had only 
40% of parameter estimates fitting the available data. Model 3 
1 2( 0.1 , 0.8)P P   on the same 
hand did not have any of its parameter estimates fitting the available data. 
It is worth to note that even with a large sample size of 1000, the regressive model was not able to 
identify that Model 2
1 2( 0.5 , 0.5)P P  had an independent relationship between the response 
variables (the estimated regressive parameter   was -2.63). On the other hand, the dependence 
University of Ghana http://ugspace.ug.edu.gh
 49  
 
value based on 
3
 was 0.0001 clearly showing independence .This might be due to the fact that 
the regressive model only accounted for correlation between two responses only. It did not account 
for possible correlations between responses and covariates.   
4.4 Conditional (
1Y = 0 ) 
The fitted conditional model is of the form 
01
01 01
01
logit( ) log
1
  


  

 
    
  
, where 
01
represents the probability of recording a “0” response at the first measurement and a “1” at the 
second measurement 
011    represents the probability of not recording  a “0” response at the first 
measurement and a “1” at the second measurement. 
01  represents the estimated coefficient 
parameter for covariate X .    is the value of the logit when the covariate is zero. 
The simulation exercise revealed that, with a sample size of 50, only a small percentage of the 
parameter estimates were significant .In addition ,only a small percentage of the models fit the 
data well after running the simulation for 1000 times. Since we are interested in accurate 
predictions as much as possible, we need a model that fits the data well enough. A case in point is 
Model 1 
1 2( 0.1 , 0.1)P P   which had only 22% of parameter estimates being significant and 
33% of all the models run fitted the data well. 
As the size of the data grew larger, the percentage of significant estimated parameter also grew 
bigger as well as the percentage of models that fitted the data well. For instance, with a sample 
size of 500, Model 4 
1 2( 0.5 , 0.3)P P   had all its parameter estimates being 
significant .Moreover, all the models simulated fitted the available data well. 
University of Ghana http://ugspace.ug.edu.gh
 50  
 
In contrast, Model 2 
1 2( 0.5 , 0.5)P P  had all estimated parameters being significant as well as 
all models fitting the available data. We must be cautious not to jump to conclusion here, because 
it has been established that Model 2 
1 2( 0.5 , 0.5)P P has independent response variables; a 
possible reason for the perfect fit is as follows: 
When binary data are randomly generated, the covariance of the outcome variables will follow the 
binomial model (i.e. two possible outcomes; an occurrence of an event of interest or non-
occurrence) with constant probability. However, when binary data are not sampled randomly then 
there is the likelihood that the outcome variances will not follow the binomial model. An example 
is picking samples from clusters; this will result in different probabilities and hence increased 
variance compared to variances observed under the binomial model. This phenomenon is known 
as over dispersion and the effects are that, the standard errors and the conclusions might be affected. 
4.5 Conditional 
1(Y = 1)   
The fitted conditional model is of the form 11
1111
11
logit( ) log
1
  


  

 
    
  
, where 
11
represents the probability of recording a “1” response at the first measurement and a “1” at the 
second measurement . 
111   represents the probability of not recording  a “1” response at the first 
measurement and a “1” at the second measurement. 
11 represents the estimated coefficient 
parameter for covariate X.    denotes the value of the logit when the covariate is zero. 
Just like the previous conditional case where
1( 0)Y  , as the sample size increased, the percentage 
of significant parameter coefficient increased .On the same hand, the percentage of models that 
fitted the available data also increased. Model 3 
1 2( 0.1 , 0.8)P P   for instance, had only 19% 
University of Ghana http://ugspace.ug.edu.gh
 51  
 
of the models fitting the available data well but with a sample size of 500, 94% of all models 
simulated under the stated marginal probabilities fitted the data well. The same model had none of 
the estimated parameters attaining significance at sample size 50. However, with a sample size of 
500, 66% of the estimated parameters were significant. 
Model 5  1 20.2 ,  0.6P P  in general had very low percentage of estimated parameters and well 
fit models respectively, irrespective of sample size. At a sample size of 50, none of the estimated 
parameters was significant and only 5% of the models had a good fit to the simulated data. At a 
sample size of 1000, none of the estimated parameters was significant and none of the models had 
a good fit. 
4.6 The Marginal model 
The fitted marginal model is of the form   1
1 1
1
logit( ) log
1
  


  

 
    
  
, where 
1  represents 
the estimated probability of recording a “1” response at the second measurement. 
11   represents 
the estimated  probability of not recording a “1” response at the second measurement and 
1   
represents the estimated parameter coefficient for covariate X.     denotes the value of the logit 
when the covariate is zero. 
With sample size of 50, only Model 3 
1 2( 0.1 , 0.8)P P  had 14% of its parameter coefficients 
not equal to zero using the wald test (41).All other models had all the estimated parameter 
coefficients equal to zero, meaning that they will not serve as a good predictor model for the 
available data. 
University of Ghana http://ugspace.ug.edu.gh
 52  
 
For a sample size of 100 none of the models had a non-zero estimated parameter coefficient. 
However, with a sample size of  200, Model 1
1 2( 0.1 , 0.1)P P    had 28% of simulated models 
with a non-zero parameter coefficient, Model 5  1 20.2 , 0.6P P  had 14%,all others equaled 
zero. 
For a sample size of 500 none of the models with a non-zero estimated parameter coefficient 
whereas with a sample size of 1000, only Model 3
1 2( 0.1 , 0.8)P P  had 29% of models fitting 
the available data. 
In short, the marginal model was poor at fitting the available data compared to the conditional 
model.  
4.7 Bivariate Bernoulli model 
The process of finding parameter estimates for the bivariate Bernoulli model was very tedious, in 
that an initial estimate for a parameter needs to be provided.Iteration is then done by the Newton-
Raphson and the Nelder-Mead maximization methods to estimate the actual parameter that 
maximizes the likelihood function. To make it easier for the selection of initial parameter estimates, 
the estimates for the conditional and marginal estimates were referred to so as to give an idea of 
where the best-fit parameter might fall.  
The proportion of significant estimates were not calculated, because a lot of choices of the initial 
parameters had to be inputted before arriving at estimated parameter. However, almost all of the 
parameters settled on were significant. 
A more important test was that of the overall fit of the proposed model. 
 
University of Ghana http://ugspace.ug.edu.gh
 53  
 
 
4.8 Overall fit of the proposed model 
The log likelihoods for the regressive, conditional as well as the marginal models were calculated 
and then compared to the log likelihood of the proposed model. Results showed that the bivariate 
Bernoulli model came out as a better model than the rest of the models.  
The likelihood ratio test was employed in comparing the fit of the proposed model to that of the 
regressive, conditional and marginal models respectively. A model with more parameters is likely 
to better fit the data .However, it is important to find out whether the right model fits significantly, 
hence the use of the test. All tests produced highly significant results i.e. significant at 5% level of 
significance, meaning that the proposed model fits better. (See Tables A1a to A5e under Appendix 
A) 
 
 
 
 
 
 
 
 
 
University of Ghana http://ugspace.ug.edu.gh
 54  
 
CHAPTER FIVE 
APPLICATION TO REAL -LIFE DATA 
5.1 Introduction 
In this chapter, real-life data was used to examine the usefulness of the proposed model. First, 
exploratory analysis was conducted .All the models were examined under the same dataset and the 
overall fit of the proposed model was also examined. The study went ahead to estimate parameters 
for all the models under study and compared the results. 
Table 5.1: Transition counts and respective probabilities on the employment status for the 
two periods 
  Wave 2 
 Transition count  Transition probability  
Wave 1 0 1 Total  0 1 Total  
0 4132 707 4839  0.8539 0.1461 1.000  
1 1442 1559 3001  0.4805 0.5195 1.000  
Odds ratio = 6.32                       Marshall-Olkin correlation coefficient = 0.37 
Table 5.1 displays the transition counts and probabilities on the employment status of persons for 
the two periods. It is evident that 85.4% of persons between the ages of 17 and 51 interviewed 
remained unemployed whereas 14.6% gained employment by the second period. Moreover, 
within the same period, 48.1% moved from being employed to not being employed whereas 52% 
remained employed. 
5.2 Odds ratio 
The estimated odds ratio for the two periods was 6.40, meaning that there were more concordant 
responses than discordant responses. Specifically, as many as 4,132 respondents remained 
unemployed whereas 1,559 respondents remained employed. In contrast, 873 respondents were 
University of Ghana http://ugspace.ug.edu.gh
 55  
 
unemployed at the first visit but gained employment at the second visit whereas 576 respondents 
were employed at the first visit but had lost employment by the second visit. 
5.3 Marshall–Olkin correlation 
The Marshall-Olkin correlation coefficient estimate of 0.59 indicates that there is a substantial 
positive correlation between employment statuses between the two periods under study. 
5.4 
3Dependence value based on Ș  
The dependence value is 1.84, clearly indicating dependence between the response variables i.e 
employment status at the two periods. 
Table 5.2a: Fitted models using data from NIDS: Traditional models 
 Conditional Models  Marginal Model 
 
Model 
01 j  
 
Model 
11 j  
 
Model 
1 j -GEE 
 
Covariates 
01 j  
se p-
value 
 
11 j  
se p-value  
1 j  
se p-value 
Constant -4.115 0.436 2e-16 ***  2.517 0.287 2e-16 ***  1.012 0.200 4.44 e-07 
*** 
Age -0.069 0.081 0.386  -0.750 0.069 2e-16 ***  -1.429 0.053 2e-16 *** 
Education 0.826 0.206 6.38 e-
05*** 
 0.026 0.136 0.846  0.589 0.090 5.53 e-11 
*** 
Sex 0.043 0.085 0.615  -0.310 0.062 5.73e-
07*** 
 -0.740 0.047 2e-16 *** 
Log-
likelihood 
-2520  -3240  8755.65 
 
 
 
University of Ghana http://ugspace.ug.edu.gh
 56  
 
Table 5.2b: Fitted models using data from NIDS: Generalized bivariate Bernoulli model 
Generalized bivariate Bernoulli model 
 
Model 
01 j  
 
Model 
11 j  
 
Model 
1 j   
Covariates 
01 j  
se p-
value 
 
11 j  
se p-
value 
 
1 j  
se p-
value 
Constant 0.07 0.00 <2e-16 
*** 
 0.11 0.00 <2e-16 
*** 
 -0.55 0.0004 <2e-16 
*** 
Age -5.558 0.005 <2e-16 
*** 
 0.145 0.0008 <2e-16 
*** 
 -4.892 0.0005 <2e-16 
*** 
Education 0.679 0.002 <2e-16 
*** 
 0.309 0.001 <2e-16 
*** 
 4.663 0.0003 <2e-16 
*** 
Sex -4.092 0.004 <2e-16 
*** 
 -2.006 0.002 <2e-16 
*** 
 -3.250 0.0009 <2e-16 
*** 
Log-
likelihood 
2892940  2786759  10672276 
Likelihood 
ratio test 
( proposed 
model vrs 
traditional 
methods 
p< 0.05  p< 0.05  p< 0.05 
 
 
5.5 Traditional models 
Table 5.2a provides the following details: 
 
5.5.1 Marginal model-GEE 
Using the marginal model, all three covariates were significant, meaning that they contribute 
significantly to the state of employment status.  
The fitted marginal model is 1
1
1
logit( ) log 1.012 1.429 0.589 0.740
1
age edu sex




 
     
  
, 
University of Ghana http://ugspace.ug.edu.gh
 57  
 
 where 
1  represents the estimated probability of recording an “employed” response at the second 
measurement and 
11   represents the estimated  probability of not recording an “employed” 
response at the second measurement . The estimated intercept is 1.012 representing the estimated 
logit when age = 0, edu = 0 and sex = 0. This means that the respondent had no age group, no sex 
and no educational status which does not really make sense in this particular study. The estimated 
coefficient for the variable age is -1.429 meaning that for respondents who are in age range 17-25 
versus those in the age bracket 26-51, the expected change in the log odds is 1.429, given that 
education and sex stays constant. 
We can also claim that based on the odds ratio scale, the odds for an individual in the 17-25 year 
group to be employed is exp(1.012+0.589-0.740) = 2.366 and the odds for an individual in the 26-
51 year age group is exp(1.012-1.429+0.589-0.740 )= 0.567.   
Therefore the odds of 2.37 means that individuals in the 17-25 year group are 2.37 times more 
likely to be employed at the second visit than those in age group 26-51. 
In terms of probabilities, the probability of an individual in the 17-25 year group to be employed 
at the time of the second visit will be 2.366/(1+2.366) = 0.70 and that of an individual in the age 
group 26-51  will be 0.30. 
 All three covariates were significant at a 5% level of significance indicating that if the same 
sample were run for 100 times 95 of them will have all three estimated coefficients being 
significant.  
The interpretation for education status and sex of the respondents was left undone, since that is not 
the main objective for this study. 
University of Ghana http://ugspace.ug.edu.gh
 58  
 
5.5.2 Conditional model 
1(Y = 1)   
Given that a respondent at the first time of visit was employed, only two of the covariates produced 
significant values, i.e age and sex. In other words, only age and sex had significant impact on the 
probability of being employed at the second time of visit given that a respondent was initially 
employed. 
The fitted conditional model is  
11
11
11
logit( ) log 2.517 0.750 0.026 0.310
1
age edu sex




 
     
  
, 
where  
11   represents the estimated probability of recording an “employed” response at the second 
measurement, given that the first response recorded “employed” and
111   represents the 
estimated  probability of not recording an “employed” response at the second measurement . The 
estimated intercept is 2.517 representing the estimated logit when age = 0,edu = 0 and sex = 0. The 
estimated coefficient for the variable age is -0.750, meaning that for respondents who are in the 
age range 17-25 versus those in the age bracket 26-51 , the expected change in the log odds is 
0.750,given that education and sex is constant. 
 Interpreting based on the odds ratio scale, the odds for an individual in the 17-25 year group to be 
employed at the second time of visit is exp(2.517+0.026-0.310) = 9.327 and the odds for an 
individual in the 26-51 year age group is exp(2.517-0.750+0.026-0.310 )= 4.406. 
In probability terms,  the probability of an individual in the 17-25 year group to be employed at 
the time of the second visit will be 9.327/(1+9.327) =0.90 and that of an individual in the age group 
26-51 will be 0.10. 
University of Ghana http://ugspace.ug.edu.gh
 59  
 
 
5.5.3 Conditional model 
1(Y = 0)   
Given that the respondent was unemployed at the first time of visit, the fitted model will be   
01
01
01
logit( ) log 4.115 0.069 0.826 0.043
1
age edu sex




 
      
  
. This time, only the variable 
education turned out to be significant. 
01   represents the estimated probability of recording an “employed” response at the second 
measurement, given that the first response recorded “unemployed”. 
011   represents the estimated  
probability of not recording  “employed” response at the second measurement, given that the first 
response recorded “unemployed”.  
The estimated coefficient for the variable education is 0.826 .Also the odds of the group that had 
never being to school is exp (-4.115-0.069-0.043) = 0.014. This means that the group that had 
never been to school are 0.01 times more likely to be employed  than persons who have some level 
of education or the probability of a person in the group that had never been to school to be 
employed will be  (0.014/1+0.014) = 0.01 
5.6 Generalized bivariate Bernoulli model 
Unifying the marginal and conditional probabilities into a joint distribution as specified by the 
generalized bivariate Bernoulli model, we can estimate parameters by conditioning on the initial 
response or using the marginal model.  Table 5.2b provides the following: 
5.6.1 Marginal 
The fitted logistic model is 
University of Ghana http://ugspace.ug.edu.gh
 60  
 
1
1
1
logit( ) log 0.55 4.892 4.663 3.250
1
age edu sex




 
      
  
. The estimated population 
parameters follow the usual interpretation as stated under the traditional models. All the estimated 
population parameters were significant with very small standard errors. 
On the odds ratio scale, the odds for an individual in the 17-25 year group to be employed is exp 
(-0.55+4.663-3.250) = 2.370 which means that persons in the 17-25 year group are 2.37 times 
more likely to be employed at the second visit than those in the age group 26-51. 
The probability of an individual in the 17-25 year group to be employed at the time of the second 
visit will be 2.37/ (1+2.37) = 0.70 and that of an individual in the age group 26-51 will be 0.30. 
5.6.2 Conditional model 
1(Y = 0)  
Given that a respondent was employed at the first time of visit, the fitted model will be   
11
11
11
logit( ) log 0.11 0.851 1.186 0.954
1
age edu sex




 
     
  
. Again, all the estimated 
parameters were significant with very small standard errors and the estimated population 
parameters follow the usual interpretation. 
5.6.3 Conditional model 
1(Y = 1)  
Given that a respondent was unemployed at the first time of visit, then by the generalized Bernoulli 
model, the logistic model will be  
01
01
01
logit( ) log 0.07 0.2887 0.7805 0.4651
1
age edu sex




 
     
  
. 
University of Ghana http://ugspace.ug.edu.gh
 61  
 
All the estimated parameters were close to zero.  
 
5.7 Overall fit of the model 
Even though the proposed model produced estimated parameters with small very small standard 
errors it is necessarily to find out how good the proposed model fits the available data in 
comparison with the traditional methods. The likelihood ratio test showed that the generalized 
bivariate Bernoulli model fits the available data significantly better at 5% level of significance 
than the traditional methods.  
Table 5.3: Comparison of results: probability of a covariate being employed at the time of 
second visit 
  Conditional  Marginal 
Covariate Categories  Model 
01 j
  
Generalized 
bivariate 
Bernoulli 
  Model 
11 j
  
Generalized 
bivariate 
Bernoulli 
 Model 
1 j
 -
GEE 
Generalized 
bivariate 
Bernoulli 
Age 17-25 0.96 0.97  0.90 0.83  0.70 70 
26-51 0.04 0.03  0.10 0.17  0.30 30 
 
Education 
 
Ever been to 
school 
0.99 
 
0.9999  0.81 0.86  0.77 0.82 
Never been 
to school 
0.01 0.00001  0.19 0.14  0.23 0.18 
Sex Male 0.97 0.992  0.86 0.64  0.54 0.69 
Female 0.03 0.008  0.14 0.36  0.46 0.31 
 
Table 5.3 shows that the probabilities for the traditional as well as the proposed model were 
generally similar, but first, under the marginal models , the GEE estimated that the probability of 
a male  employed at the second time of visit will be 0.54 whereas the proposed model estimated a 
probability of 0.69. 
University of Ghana http://ugspace.ug.edu.gh
 62  
 
Second, given that a respondent was employed at the first time of study, the conditional logistic 
model estimated the probability of a male employed at the second time of visit to be 0.86 whereas 
the proposed model estimated a probability of 0.64. 
Third, the probability of a respondent who has ever gone to school under the GEE was 0.77 
whereas the bivariate Bernoulli model recorded a probability of 0.82. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
University of Ghana http://ugspace.ug.edu.gh
 63  
 
CHAPTER SIX 
RESULTS AND DISCUSSION  
6.1 Introduction  
Key findings from the simulation as well as the application to real life exercise are discussed in 
this chapter. The challenges encountered while undertaking these exercises are also discussed. 
The existence of dependence in repeated studies and how to deal with such cases in analysis has 
been a problem for researchers. This is, because relying on the assumption of independence 
between responses may lead to inefficient results. Marginal as well conditional models have been 
used to address this problem in the past. This study however, uses the joint model (i.e. unifying 
the conditional and marginal)  known as the bivariate Bernoulli model to address the challenge of 
dependence in repeated studies and more importantly  estimate parameter coefficients. 
6.2 Correlation 
The simulation study revealed that the Marshall-Olkin correlation and   produced different 
correlation values. It is likely that the measure of correlation using the Marshall-Olkin correlation 
was better than   because the Marshall-Olkin correlation employs probabilities of concordant and 
discordant responses whereas   measures correlation using the ranks of the ordinal variable in this 
case the response variables.  
6.3 Dependence  
The simulation study also revealed that the measure of dependence (
3 ) that was derived from the 
bivariate Bernoulli model was able to identify independence between two response variables when 
independent response variables were simulated. This is an added plus to the use of the proposed 
model because it helps to confirm dependence or otherwise of response variables before further 
tests are carried out.  
University of Ghana http://ugspace.ug.edu.gh
 64  
 
It is important to state that the regressive model failed to identity independence between response 
variables. However, the dependence measure 
3
 was able to do so. The inability of the regressive 
model to account for independence may be due to the fact that the model only accounted for 
correlation between responses and responses between two periods. It did not account for possible 
correlations between responses and covariates.   
6.4 Investigation of the traditional models 
In this investigation, the marginal model, which represents the probability of obtaining an 
occurrence of interest with no conditionality to previous responses, and the conditional model 
which says that the probability of an event occurring depends on the previous state of employment 
status were examined.  
The simulation exercise revealed that the conditional models fitted the available data far better 
than the marginal. This finding is not surprising, because taking into consideration the response in 
an initial investigation before taking the second response (or saying the probability of an event is 
conditional on a past response) is likely to result in a better finding than disregarding the first 
responses. 
The measure used to find out how well a model fits available data is the Wald test. The Wald test 
determines if the estimated coefficients are simultaneously equal to zero, as against at least one 
estimated coefficient not being zero. 
6.5 The Bivariate Bernoulli model 
The model proposed for this study unified the marginal and conditional models into one model. 
This resulted in a joint model, which was then used to estimate probabilities. The good thing about 
this model is that its distribution is uniquely identified. Hence, finding the likelihood function was 
straightforward, compared to the traditional marginal model which does not employ the use of a 
University of Ghana http://ugspace.ug.edu.gh
 65  
 
known distribution but rather uses the probabilities of success and their correlations of the vector 
of binary responses.  
6.5.1 Deriving parameter estimates  
Estimating parameter coefficients from the estimating equations (derived from the likelihood 
function) was tedious, hence iterative methods of finding roots was employed. Two methods were 
used: the Newton-Raphson and the Nelder-Mead methods of iteration. This study revealed that the 
Nelder-Mead method was more efficient in arriving at the roots than the Newton-Raphson. 
Specifically, with the same set of initial values, the Nelder-Mead method was able to provide all 
the roots as against some of the roots when the Newton –Raphson method used. 
Picking the initial roots was very tedious. First of all, values close to the roots obtained using the 
traditional methods were used as initial values.Then a random selection of initial values was taken 
thereafter, when the earlier values obtained using the traditional models failed to help estimate the 
roots. It must be noted that some of the estimates were arrived at after several selection trials of 
initial values; some values were arrived at after about four hours. 
6.5.2 Parameter estimates  
All the parameter estimates produced were significant under the bivariate Bernoulli model. This 
does not prove that the model is good, but it means that the covariates selected for this study have 
an impact on the probability of the response variable. 
6.6 Comparison of results 
All three covariates had significant impact under the GEE marginal model and the bivariate 
Bernoulli model. However, only some of the covariates had significant impact when the 
conditional logistic models were used.  
Specifically, age and sex significantly impacted the employment status when the probability of 
being employed was conditioned on the respondent being employed at the first time of visit. 
University of Ghana http://ugspace.ug.edu.gh
 66  
 
Also, only education significantly impacted the employment status when the probability of being 
employed was conditioned on the respondent not being employed at the first time of visit.  
In some cases, there seem to be differences in results; for instance, in the probability of being 
employed at the second time of visit. Under the GEE marginal model, whereas the probability of 
a male getting employed at the second time of visit was at 0.54, the bivariate Bernoulli  estimated 
the probability to be 0.69. 
6.7 Structure of models 
It is also important to note that the GEE uses various correlation structures such as “independence”, 
“unstructured”, “exchangeable” and “user defined” to estimate covariate coefficients. This might 
lead to inadequate results because a joint modelling approach is not used. The joint modelling 
provides explicitly the distribution for the available data, they enhance the fit of the data and hence 
enhance the efficiency of parameter estimates.  This study has shown that with the use of the 
proposed model, all three covariates have significant impact on the probability of the response 
variable, irrespective of whether there was conditioning or not .  
6.8 Missing values 
The simulation exercise was done without taking into account cases of missing values. So, 
estimating various statistics was performed without any challenge. However, when real-life 
application was introduced, the presence of missing values made estimating impossible. All the 
missing values therefore had to be expunged from the dataset, making use of only responses given 
at each time of visit. This situation is worrying because valuable data was discarded. Because this 
study used secondary data there was no way of going back to fill the gaps in the data. 
 
 
 
University of Ghana http://ugspace.ug.edu.gh
 67  
 
CHAPTER SEVEN 
                         CONCLUSION AND RECOMMENDATIONS 
7.1 Conclusion 
This study has shown that the bivariate Bernoulli model fits bivariate binary response data 
significantly better than the GEE marginal model and the conditional logistic models. Since the 
joint distribution is explicitly known, the likelihood function was found and then the coefficient of 
the parameters estimated. 
Findings using the bivariate Bernoulli model against the GEE model were generally similar apart 
from one case where the bivariate Bernoulli model recorded a probability of 0.69, whereas the 
marginal model recorded a probability of 0.54. 
Results were also generally similar using the bivariate Bernoulli model against the conditional 
logistic models except for a few cases. For instance, the logistic model estimated the probability 
of a male to be employed at the second time of visit to be 0.86 whereas the proposed model 
estimated the probability to be 0.64.  
Even though selecting the initial value in order to estimate the covariate coefficient using the 
bivariate Bernoulli model was tedious, the bivariate Bernoulli model is highly recommended for  
analyses involving binary response data for two periods i.e. (t and t+i) because it fits bivariate 
binary data significantly better. 
7.2 Recommendations 
Investigating the usefulness of the proposed model based on simulation as well as its application 
to real-life data, the following recommendations should be considered: 
University of Ghana http://ugspace.ug.edu.gh
 68  
 
First, more research should be carried out to find out how to easily select an initial value when 
using the Newton Raphson or the Nelder Mead methods of iteration. 
Second, further research should also be conducted using the multivariate Bernoulli model to handle 
not only two periods of study, but several periods. 
Third, in developing models, the methods of estimating missing values should be incorporated so 
that missing values will be addressed while using the model. This is recommended to prevent the 
deletion of variables containing missing data. 
Fourth, in order to enhance accuracy, Health Institutions such as the School of Public Health, as 
well as other Research Institutions such as the Ghana Statistical Service and the Institute of 
Statistical Social and Economic Research (ISSER) should utilize the use of the bivariate Bernoulli 
model to estimate regression coefficients when the outcome variables involved is bivariate binary.     
 
 
 
 
 
 
 
 
 
 
 
University of Ghana http://ugspace.ug.edu.gh
 69  
 
REFERENCES 
 
Azzalini, A. (1994). Logistic regression for autocorrelated data with application to 
repeated measures. Biometrika, 81(4), 767-775.       
 
Bonney, G. E. (1987). Logistic regression for dependent binary observations. Biometrics, 
951-973. 
 
Bonney, G. E. (1986). Regressive logistic models for familial disease and other binary 
traits. Biometrics, 611-625. 
 
Carey, V., Zeger, S. L., & Diggle, P. (1993). Modelling multivariate binary data with 
alternating logistic regressions. Biometrika, 80(3), 517-526. 
 
Cox, D. R., & Wermuth, N. (1994). A note on the quadratic exponential binary 
distribution. Biometrika, 81(2), 403-408. 
 
Cox, D. R. (1972). The analysis of multivariate binary data. Applied statistics, 113-120. 
 
Fitzmaurice, G. M., & Lipsitz, S. R. (1995). A model for binary time series data with 
serial odds ratio patterns. Applied Statistics, 51-61. 
 
Glonek, G. F. (1996). A class of regression models for multivariate categorical responses. 
Biometrika, 83(1), 15-28. 
 
Glonek, G. F., & McCullagh, P. (1995). Multivariate logistic models. Journal of the royal 
statistical society. Series B (Methodological), 533-546. 
 
Gavin, H. P. (2013). The Nelder-Mead Algorithm in Two Dimensions. CEE 201L. 
Duke U. 
 
Islam, M. A., Alzaid, A. A., Chowdhury, R. I., & Sultan, K. S. (2013). A generalized 
bivariate Bernoulli model with covariate dependence. Journal of Applied Statistics, 
40(5), 1064-1075. 
 
Islam, M. A., & Chowdhury, R. I. (2006). A higher order Markov model for analyzing 
covariate dependence. Applied Mathematical Modelling, 30(6), 477-488. 
 
Islam, M. A., Chowdhury, R. I., & Briollais, L. (2012). A bivariate binary model for 
testing dependence in outcomes. Bull. Malays. Math. Sci. Soc.(2), 35(4), 845-858. 
 
Kaiser, S., Träger, D., & Leisch, F. (2011). Generating correlated ordinal random values. 
 
Le Cessie, S., & Van Houwelingen, J. C. (1994). Logistic regression for correlated binary 
data. Applied Statistics, 95-108. 
 
University of Ghana http://ugspace.ug.edu.gh
 70  
 
Lee, Y., & Nelder, J. A. (2004). Conditional and marginal models: another view. 
Statistical Science, 19(2), 219-238. 
 
Leisch, F., Weingessel, A., & Hornik, K. (1998). On the generation of correlated artificial 
binary data. 
 
Liang, K. Y., & Zeger, S. L. (1986). Longitudinal data analysis using generalized linear 
models. Biometrika, 13-22. 
 
Lin, P. S., & Clayton, M. K. (2005). Analysis of binary spatial data by quasi-likelihood 
estimating equations. The Annals of Statistics, 33(2), 542-555. 
 
Lipsitz, S. R., Laird, N. M., & Harrington, D. P. (1991). Generalized estimating equations 
for correlated binary data: using the odds ratio as a measure of association. Biometrika, 
78(1), 153-160. 
 
Lovison, G. (2006). A matrix-valued Bernoulli distribution. Journal of Multivariate 
Analysis, 97(7), 1573-1585. 
 
Marshall, A. W., & Olkin, I. (1985). A family of bivariate distributions generated by the 
bivariate Bernoulli distribution. Journal of the American Statistical Association, 80(390), 
332-338 
 
McCullagh, P., & Nelder, J. A. (1989). Generalized Linear Models, no. 37 in Monograph 
on Statistics and Applied Probability. 
 
Molenberghs, G., & Lesaffre, E. (1994). Marginal modeling of correlated ordinal data 
using a multivariate Plackett distribution. Journal of the American Statistical 
Association, 89(426), 633-644. 
 
Muenz, L. R., & Rubinstein, L. V. (1985). Markov models for covariate dependence of 
binary sequences. Biometrics, 91-101. 
 
Prentice, R. L. (1988). Correlated binary regression with covariates specific to each 
binary observation. Biometrics, 1033-1048. 
 
Quinn, K. (2001). The newton raphson algorithm for function optimization. University 
of Washington, Seattle. 
 
Southern Africa Labour and Development Research Unit. National Income Dynamics 
Study 2010-2011, Wave 2 [dataset]. Version 3.0. Cape Town: Southern Africa Labour 
and Development Research Unit [producer], 2015. Cape Town: DataFirst [distributor], 
2015 
 
University of Ghana http://ugspace.ug.edu.gh
 71  
 
Southern Africa Labour and Development Research Unit. National Income Dynamics 
Study 2008, Wave 1 [dataset]. Version 6.0. Cape Town: Southern Africa Labour and 
Development Research Unit [producer], 2015. Cape Town: DataFirst [distributor], 2015 
 
Sun, Z., Rosen, O., & Sampson, A. R. (2007). Multivariate Bernoulli mixture models 
with application to postmortem tissue studies in schizophrenia. Biometrics, 63(3), 901-
909. 
 
Wilson, J. R., & Lorenz, K. A. (2015). Modeling Binary Correlated Responses Using 
Sas, Spss and R (Vol. 9). Springer. 
 
Zeger, S. L., & Liang, K. Y. (1986). Longitudinal data analysis for discrete and 
continuous outcomes. Biometrics, 121-130. 
 
Zhao, L. P., & Prentice, R. L. (1990). Correlated binary regression using a quadratic 
exponential model. Biometrika, 77(3), 642-648. 
 
 
       
 
 
 
 
 
 
 
 
 
 
 
University of Ghana http://ugspace.ug.edu.gh
 72  
 
APPENDIX A: Simulation tables 
Table A1a: Results of estimated models from 1000 simulations with sample of size of 50: 
Exploratory statistics  
Models 
                 
 
Estimates 
1 
ȡ  0.2  
1 2( 0.1, 0.1)P P   
2 
ȡ  0.  
1 2( 0.5, 0.5)P P 
 
3 
ȡ  0.1  
1 2( 0.1, 0.8)P P 
 
4 
ȡ  0.  
1 2( 0.5, 0.3)P P 
 
5 
ȡ  0  
1 2( 0.2, 0.6)P P 
 
 1 2Avg.Cor ,Y Y  
 sd in   
0.2 -0.30(0.14) 0.1 0.3 0.01 
 1Avg.Cor ,Y X  
 sd in   
0.2 -0.29 (0.13) 0.1 0.3 0 
 2Avg.Cor ,Y X  
 sd in   
0.2 -0.30(0.14) 0.1 0.31 0 
Avg.Odds Ratio  
 1 2 ,Y Y  
2.57 0.98 3.87 1.32 1.60 
Marshall – Olkin correlation
  1 2 ,Y Y   
0.21 -0.005 0.20 0.03 0.12 
 
Marshall – Olkin correlation   
 2 1, / 0Y X Y   
0.003 -0.03 0.002 0.01 0.003 
Marshall –Olkin correlation
2 1( , / 1)Y X Y 
  
-0.003 -.0002 -0.004 -0.004 -0.04 
Marshall – Olkin correlation
 
 1 2/ 1Y X Y ,    
0.003 -0.06 -0.001 -0.003 -0.005 
Marshall –Olkin correlation
 
 1 2, /  0Y X Y    
5.25^-0.6 -0.07 0.001 1.02^-05 -0.0002 
Dependence value based on 3
 
0.94 -0.02 1.35 0.13 0.47 
 
 
 
University of Ghana http://ugspace.ug.edu.gh
 73  
 
Table A1b: Results of estimated models from 1000 simulations with sample of size of 50: 
Regression model 
Regressive model 
                 
 
Estimates 
1 
ȡ  0.2  
1 2( 0.1, 0.1)P P   
2 
ȡ  0.  
1 2( 0.5, 0.5)P P 
 
3 
ȡ  0.1  
1 2( 0.1, 0.8)P P 
 
4 
ȡ  0.  
1 2( 0.5, 0.3)P P 
 
5 
ȡ  0  
1 2( 0.2, 0.6)P P 
 
. 1Av   
%  0.05of p     
1.17 -2.39 0.40 1.01 0.86 
0.05 60 0 20 100 
.Av   
 
%  0.05of p   
0.99 -2.18 1.40 1.27 1.23 
15 40 0 40 100 
2.Av   
 :  Wald test %  0.05of p   
4.8 16.6 2.28 8.1 5.8 
20 100 0 50 40 
 
Table A1c: Results of estimated models from 1000 simulations with sample of size of 50: 
Conditional models 
Conditional models 
                 
 
Estimates 
1 
ȡ  0.2  
1 2( 0.1, 0.1)P P   
2 
ȡ  0.  
1 2( 0.5, 0.5)P P 
 
3 
ȡ  0.1  
1 2( 0.1, 0.8)P P 
 
4 
ȡ  0.  
1 2( 0.5, 0.3)P P 
 
5 
ȡ  0  
1 2( 0.2, 0.6)P P 
 
Conditional  
1Y = 0  
01
  
%  0.05of p   
1.68 -2.5 0.76 1.08 -0.29 
22 87 0 20 11 
:Wald test %  0.05of p 
 
33 100 20 46 10 
Conditional  
1Y = 1  
11
  1.61 -1.96 -0.23 1.21 -0.11 
University of Ghana http://ugspace.ug.edu.gh
 74  
 
%  0.05of p    22 58 0 30 0 
:Wald test %  0.05of p 
 
40 100 19 56 5 
 
Table A1d: Results of estimated models from 1000 simulations with sample of size of 50: 
Marginal model 
Marginal model 
                 
 
Estimates 
1 
ȡ  0.2  
1 2( 0.1, 0.1)P P   
2 
ȡ  0.  
1 2( 0.5, 0.5)P P 
 
3 
ȡ  0.1  
1 2( 0.1, 0.8)P P 
 
4 
ȡ  0.  
1 2( 0.5, 0.3)P P 
 
5 
ȡ  0  
1 2( 0.2, 0.6)P P 
 
. 1( . )VA s e  
:Wald test %  0.05of p 
 
0.47(0.87) 0.13(0.79) 0.45( 0.64) -0.04(0.43 ) -0.06 
 
0 
 
0 
 
14 
 
0 
 
0 
 
Table A1e: Results of estimated models from 1000 simulations with sample of size of 50: 
Bivariate Bernoulli model 
Bivariate Bernoulli  model 
                 
 
Estimates 
1 
ȡ  0.2  
1 2( 0.1, 0.1)P P   
2 
ȡ  0.  
1 2( 0.5, 0.5)P P 
 
3 
ȡ  0.1  
1 2( 0.1, 0.8)P P 
 
4 
ȡ  0.  
1 2( 0.5, 0.3)P P 
 
5 
ȡ  0  
1 2( 0.2, 0.6)P P 
 
11
  
%  0.05of p   
5.9 0.49 -0.48 0.90 0.68 
90 90 90 90 90 
01
  2.81 1.63 3.66 1.45 2.92 
%  0.05of p        
. 1VA   
1.4 1.19 0.57 1.16 0.97 
%  0.05of p        
 
 Likelihood ratio test (proposed model and regressive model) 
 p<0.00001  (result is significant at 5% level of significance) 
University of Ghana http://ugspace.ug.edu.gh
 75  
 
                                     Likelihood ratio test (proposed model and conditional 
1Y = 0  ) 
 p<0.00001  (result is significant at 5% level of significance) 
                  Likelihood ratio test (proposed model and conditional 
1Y = 1  ) 
 p<0.00001  (result is significant at 5% level of significance) 
      
      
 
Table A2a: Results of estimated models from 1000 simulations with sample of size of 100: 
Exploratory statistics 
Models 
             
 
Estimates 
1 
ȡ  0.2  
1 2( 0.1, 0.1)P P   
2 
ȡ  0.  
1 2( 0.5, 0.5)P P 
 
3 
ȡ  0.1  
1 2( 0.1, 0.8)P P 
 
4 
ȡ  0.  
1 2( 0.5, 0.3)P P 
 
5 
ȡ  0  
 1 20.2, 0.6P P 
 
 1 2Avg.Cor ,Y Y  
 sd in   
0.2 -0.30 0.1 0.3 0 
 1Avg.Cor ,Y X  
 sd in   
0.2 -0.30 0.1 0.3 0 
 2Avg.Cor ,Y X  
 sd in   
0.2 -0.30 0.1 0.3 0 
Avg.Odds Ratio  
 1 2 ,Y Y  
2.67 1.005 3.77 1.13 1.6 
Marshall – Olkin correlation
  1 2 ,Y Y   
0.21 0.002 0.20 0.03 0.12 
 
Marshall – Olkin correlation   
 2 1, / 0Y X Y   
0.003 -0.003 0.001 0.002 0.002 
Marshall –Olkin correlation
2 1( , / 1)Y X Y 
  
0.006 -0.0008 0.003 0.001 -1.61^-0.5 
Marshall – Olkin correlation
 
 1 2/ 1Y X Y ,    
-0.003 0.002 -0.0006 -0.005 0 
Marshall –Olkin correlation
 
 1 2, /  0Y X Y    
0.003 -0.001 0.0007 1.03^-05 0.003 
University of Ghana http://ugspace.ug.edu.gh
 76  
 
 
 
Table A2b: Results of estimated models from 1000 simulations with sample of size of 100: 
Regression model 
 
Table A2c: Results of estimated models from 1000 simulations with sample of size of 100: 
Conditional models 
 
3Dependence value based on 
 
0.98 0.005 1.33 0.12 0.47 
Regressive Model 
             
 
Estimates 
1 
ȡ  0.2  
1 2( 0.1, 0.1)P P   
2 
ȡ  0.  
1 2( 0.5, 0.5)P P 
 
3 
ȡ  0.1  
1 2( 0.1, 0.8)P P 
 
4 
ȡ  0.  
1 2( 0.5, 0.3)P P 
 
5 
ȡ  0  
 1 20.2, 0.6P P 
 
 
. 1VA   
%  0.05of p     
0.70 -2.57 0.64 0.95 1.26 
0.05 95 0 50 22 
.Av   
%  0.05of p   
1.59 -2.61 1.44 1.03 1.17 
55 95 0 50 25 
2.Av   
 :  Wald test %  0.05of p   
7.5 28.7 3.5 18.9 6.5 
53 100 20 100 70 
Conditional models 
             
 
Estimates 
1 
ȡ  0.2  
1 2( 0.1, 0.1)P P   
2 
ȡ  0.  
1 2( 0.5, 0.5)P P 
 
3 
ȡ  0.1  
1 2( 0.1, 0.8)P P 
 
4 
ȡ  0.  
1 2( 0.5, 0.3)P P 
 
5 
ȡ  0  
 1 20.2, 0.6P P 
 
Conditional 
1Y = 0  
01
  
%  0.05of p   
1.36 -2.68 1.08 1.14 -2.77 
33 100 0 50 0 
University of Ghana http://ugspace.ug.edu.gh
 77  
 
 
Table A2d: Results of estimated models from 1000 simulations with sample of size of 100: 
Marginal model 
 
Table A2e: Results of estimated models from 1000 simulations with sample of size of 100: 
Bivariate Bernoulli model 
:Wald test %  0.05of p 
 
53 100 75 86 0 
Conditional 
1Y = 1  
11
  1.03 -1.98 1.22 1.39 -0.09 
%  0.05of p    33 70 0 80 12 
:Wald test %  0.05of p 
 
72 100 16 90 3 
Marginal model 
             
 
Estimates 
1 
ȡ  0.2  
1 2( 0.1, 0.1)P P   
2 
ȡ  0.  
1 2( 0.5, 0.5)P P 
 
3 
ȡ  0.1  
1 2( 0.1, 0.8)P P 
 
4 
ȡ  0.  
1 2( 0.5, 0.3)P P 
 
5 
ȡ  0  
 1 20.2, 0.6P P 
 
. 1( . )Av s e
:Wald test %  0.05of p 
 
0.18(0.56) 0.09(0.57) 0.23(0.43) 0.009(0.29) -0.04(0.55) 
 
0 
 
0 
 
0 
 
0 
 
0 
 
 
 
Bivariate Bernoulli  model 
             
 
Estimates 
1 
ȡ  0.2  
1 2( 0.1, 0.1)P P   
2 
ȡ  0.  
1 2( 0.5, 0.5)P P 
 
3 
ȡ  0.1  
1 2( 0.1, 0.8)P P 
 
4 
ȡ  0.  
1 2( 0.5, 0.3)P P 
 
5 
ȡ  0  
 1 20.2, 0.6P P 
 
11
  
%  0.05of p   
2.10 3.9 1.6 0.10 2.5 
  
 
80 
   
University of Ghana http://ugspace.ug.edu.gh
 78  
 
 
Table A3a: Results of estimated models from 1000 simulations with sample of size of 200: 
Exploratory statistics 
01
  
%  0.05of p   
3.56 2.14 1.8 0.56 1.2 
     
. 1VA   
2.05 1.4 2.3 0.58 -0.8 
%  0.05of p        
                                 Likelihood ratio test (proposed model and regressive model) 
 p<0.00001  (result is significant at 5% level of significance) 
Likelihood ratio test (proposed model and conditional 
1Y = 0  ) 
 p<0.00001  (result is significant at 5% level of significance) 
Likelihood ratio test (proposed model and conditional 
1Y = 1  ) 
 p<0.00001  (result is significant at 5% level of significance) 
 
  
Models 
             
 
Estimates 
1 
ȡ  0.2  
1 2( 0.1, 0.1)P P   
2 
ȡ  0.  
1 2( 0.5, 0.5)P P 
 
3 
ȡ  0.1  
1 2( 0.1, 0.8)P P 
 
4 
ȡ  0.  
1 2( 0.5, 0.3)P P 
 
5 
ȡ  0  
 1 20.2, 0.6P P 
 
 1 2Avg.Cor ,Y Y  
 sd in   
0.2 -0.30(0.07) 0.1 0.3 0 
 1Avg.Cor ,Y X  
 sd in   
0.2 -0.30(0.07) 0.1 0.3 0 
 2Avg.Cor ,Y X  
 sd in   
0.2 -0.30(0.07) 0.1 0.3 0 
Avg.Odds Ratio  
 1 2 ,Y Y  
2.68 0.99 3.93 1.16 1.6 
Marshall – Olkin correlation
  1 2 ,Y Y   
0.22 0.001 0.20 0.04 0.12 
 
Marshall – Olkin correlation   
-0.003 0.002 0.001 0.003 0.002 
University of Ghana http://ugspace.ug.edu.gh
 79  
 
 
Table A3b: Results of estimated models from 1000 simulations with sample of size of 200: 
Regressive model 
 
 
 
 
 
 
 2 1, / 0Y X Y   
Marshall –Olkin correlation
2 1( , / 1)Y X Y 
  
0.0003 -0.002 -0.001 -0.001 -0.001 
Marshall – Olkin correlation
 
 1 2/ 1Y X Y ,    
-0.0009 0.0006 -0.001 0.0003 -3.9^-06 
Marshall –Olkin correlation
 
 1 2, /  0Y X Y    
-0.006 -0.001 -0.003 -0.004 -0.003 
3Dependence value based on 
 
0.99 -0.01 1.37 0.15 0.47 
Regressive model 
             
 
Estimates 
1 
ȡ  0.2  
1 2( 0.1, 0.1)P P   
2 
ȡ  0.  
1 2( 0.5, 0.5)P P 
 
3 
ȡ  0.1  
1 2( 0.1, 0.8)P P 
 
4 
ȡ  0.  
1 2( 0.5, 0.3)P P 
 
5 
ȡ  0  
 1 20.2, 0.6P P 
 
. 1VA   
%  0.05of p     
1.35 -2.60 0.77 0.96 0.13 
60 100 35 75 0 
.Av   
%  0.05of p   
1.22 -1.53 1.39 1.25 0.17 
0.55 100 0 75 0 
2.Av   
 :  Wald test %  0.05of p   
15.9 
 
55.4 8.5 32.3 2.8 
86 100 47 100 15 
University of Ghana http://ugspace.ug.edu.gh
 80  
 
Table A3c: Results of estimated models from 1000 simulations with sample of size of 200- 
Conditional model 
Conditional model 
             
 
Estimates 
1 
ȡ  0.2  
1 2( 0.1, 0.1)P P   
2 
ȡ  0.  
1 2( 0.5, 0.5)P P 
 
3 
ȡ  0.1  
1 2( 0.1, 0.8)P P 
 
4 
ȡ  0.  
1 2( 0.5, 0.3)P P 
 
5 
ȡ  0  
 1 20.2, 0.6P P 
 
 
Conditional 
1Y = 0  
  
01
  
%  0.05of p   
1.43 -2.77 1.18 1.06 0.23 
 
 
80 100 0 100 0 
:Wald test %  0.05of p 
 
69 100 50 100 25 
Conditional 
1Y = 1  
11
  
%  0.05of p    
1.19 -2.42 1.06 1.22 0.23 
 
70 
 
100 
 
0 
 
100 
 
22 
:Wald test %  0.05of p 
 
96 100 50 40 5 
 
Table A3d: Results of estimated models from 1000 simulations with sample of size of 200: 
Marginal model 
Marginal Model 
             
 
Estimates 
1 
ȡ  0.2  
1 2( 0.1, 0.1)P P   
2 
ȡ  0.  
1 2( 0.5, 0.5)P P 
 
3 
ȡ  0.1  
1 2( 0.1, 0.8)P P 
 
4 
ȡ  0.  
1 2( 0.5, 0.3)P P 
 
5 
ȡ  0  
 1 20.2, 0.6P P 
 
. 1( . )Av s e  
0.37(0.35) -0.22(0.4) 0.08( 0.30) 0.02(0.21) 0.35(0.48) 
University of Ghana http://ugspace.ug.edu.gh
 81  
 
:Wald test %  0.05of p 
 
28 0 0 0 14 
 
Table A3e: Results of estimated models from 1000 simulations with sample of size of 200: 
Bivariate Bernoulli  model 
Bivariate Bernoulli  model 
             
 
Estimates 
1 
ȡ  0.2  
1 2( 0.1, 0.1)P P   
2 
ȡ  0.  
1 2( 0.5, 0.5)P P 
 
3 
ȡ  0.1  
1 2( 0.1, 0.8)P P 
 
4 
ȡ  0.  
1 2( 0.5, 0.3)P P 
 
5 
ȡ  0  
 1 20.2, 0.6P P 
 
11
  3.31 2.4 0.6 1.20 0.17 
%  0.05of p        
01
  
%  0.05of p   
1.42 3.17 1.2 0.49 1.69 
     
. 1VA   
1.72 0.81 0.6 0.84 0.72 
     
                                   Likelihood ratio test (proposed model and regressive model) 
 
p<0.00001  (result is significant at 5% level of significance) 
Likelihood ratio test (proposed model and conditional 
1Y = 0  ) 
 
p<0.00001  (result is significant at 5% level of significance) 
Likelihood ratio test (proposed model and conditional 
1Y = 1  ) 
 
p<0.00001  (result is significant at 5% level of significance) 
 
 
 
 
 
 
University of Ghana http://ugspace.ug.edu.gh
 82  
 
Table A4a: Results of estimated models from 1000 simulations with sample of size of 500: 
Exploratory statistics  
 
 
 
Models 
             
 
Estimates 
1 
ȡ  0.2  
1 2( 0.1, 0.1)P P   
2 
ȡ  0.  
1 2( 0.5, 0.5)P P 
 
3 
ȡ  0.1  
1 2( 0.1, 0.8)P P 
 
4 
ȡ  0.  
1 2( 0.5, 0.3)P P 
 
5 
ȡ  0  
 1 20.2, 0.6P P 
 
 1 2Avg.Cor ,Y Y  
 sd in   
0.2 -0.30(0.04) 0.1 0.3 0 
 1Avg.Cor ,Y X  
 sd in   
0.2 -0.30(0.04) 0.1 0.3 0 
 2Avg.Cor ,Y X  
 sd in   
0.2 -0.30(0.04) 0.1 0.3 0 
Avg.Odds Ratio  
 1 2 ,Y Y  
2.74 1.01 3.99 1.16 1.6 
Marshall – Olkin correlation
  1 2 ,Y Y   
0.22 0.002 0.20 0.04 0.12 
 
Marshall – Olkin correlation   
 2 1, / 0Y X Y   
-0.003 0.002 0.0005 0.001 0.002 
Marshall –Olkin correlation
2 1( , / 1)Y X Y 
  
-0.002 -0.0005 0.0001 -0.0005 0.002 
Marshall – Olkin correlation
 
 1 2/ 1Y X Y ,    
-0.001 0.0009 1.61^-0.05 -0.001 0.002 
Marshall –Olkin correlation
 
 1 2, /  0Y X Y    
-0.002 -0.002 -.001 -0.001 0 
3Dependence value based on 
 
1.01 0.01 1.38 0.15 0.47 
University of Ghana http://ugspace.ug.edu.gh
 83  
 
Table A4b: Results of estimated models from 1000 simulations with sample of size of 500: 
Regressive model   
 
Table A4c: Results of estimated models from 1000 simulations with sample of size of 500: 
Conditional models   
Conditional models 
             
 
Estimates 
1 
ȡ  0.2  
1 2( 0.1, 0.1)P P   
2 
ȡ  0.  
1 2( 0.5, 0.5)P P 
 
3 
ȡ  0.1  
1 2( 0.1, 0.8)P P 
 
4 
ȡ  0.  
1 2( 0.5, 0.3)P P 
 
5 
ȡ  0  
 1 20.2, 0.6P P   
Conditional 
1Y = 0  
01
  1.09 -2.33 1.15 1.19 -0.27 
%  0.05of p   77 100 28 100 38 
:Wald test %  0.05of p 
 
100 100 85 100 20 
Conditional 
1Y = 1  
11
  1.34 -2.53 1.51 1.11 1.5 
Regressive model 
             
 
Estimates 
1 
ȡ  0.2  
1 2( 0.1, 0.1)P P   
2 
ȡ  0.  
1 2( 0.5, 0.5)P P 
 
3 
ȡ  0.1  
1 2( 0.1, 0.8)P P 
 
4 
ȡ  0.  
1 2( 0.5, 0.3)P P 
 
5 
ȡ  0  
 1 20.2, 0.6P P 
 
. 1VA   
%  0.05of p     
1.42 -2.63 0.7 0.94 -0.29 
 
90 
 
100 
 
60 
 
100 
 
11 
.Av   
%  0.05of p   
 
1.12 
 
-2.67 
 
1.18 
 
1.19 
 
-0.09 
85 100 25 100 0 
2.Av   
 :  Wald test %  0.05of p   
32.9 146.7 16.7 75.1 1.96 
100 100 93 100 15 
University of Ghana http://ugspace.ug.edu.gh
 84  
 
%  0.05of p    100 100 66 100 10 
:Wald test %  0.05of p 
 
100 100 94 100 0 
 
      
      
 
Table A4d: Results of estimated models from 1000 simulations with sample of size of 500: 
Marginal model  
Marginal Model 
             
 
Estimates 
1 
ȡ  0.2  
1 2( 0.1, 0.1)P P   
2 
ȡ  0.  
1 2( 0.5, 0.5)P P 
 
3 
ȡ  0.1  
1 2( 0.1, 0.8)P P 
 
4 
ȡ  0.  
1 2( 0.5, 0.3)P P 
 
5 
ȡ  0  
 1 20.2, 0.6P P   
. 1( . )Av s e  
-0.65(0.24) -0.02(0.25) -0.04(0.18) 0.03(0.13) 0.28(0.49) 
:Wald test %  0.05of p 
 
0 0 0 0 0 
 
 
Table A4e: Results of estimated models from 1000 simulations with sample of size of 500: 
Bivariate Bernoulli  model  
Bivariate Bernoulli  Model 
             
 
Estimates 
1 
ȡ  0.2  
1 2( 0.1, 0.1)P P   
2 
ȡ  0.  
1 2( 0.5, 0.5)P P 
 
3 
ȡ  0.1  
1 2( 0.1, 0.8)P P 
 
4 
ȡ  0.  
1 2( 0.5, 0.3)P P 
 
5 
ȡ  0  
 1 20.2, 0.6P P   
11
  
2.14 2.99 3.21 1.98 0.99 
%  0.05of p   5 5    
01
  4.2 1.36 0.24 0.60 1.53 
%  0.05of p        
. 1VA   
1.53 1.51 1.86 0.12 0.24 
University of Ghana http://ugspace.ug.edu.gh
 85  
 
%  0.05of p        
                                 Likelihood ratio test (proposed model and regressive model) 
 
p<0.00001  (result is significant at 5% level of significance) 
Likelihood ratio test (proposed model and conditional 
1Y = 0  ) 
 
p<0.00001  (result is significant at 5% level of significance) 
Likelihood ratio test (proposed model and conditional 
1Y = 1  ) 
 
p<0.00001  (result is significant at 5% level of significance) 
 
 
 
Table A5a: Results of estimated models from 1000 simulations with sample of size of 1000: 
Exploratory statistics 
Models 
             
 
Estimates 
1 
ȡ  0.2  
1 2( 0.1, 0.1)P P   
2 
ȡ  0.  
1 2( 0.5, 0.5)P P 
 
3 
ȡ  0.1  
1 2( 0.1, 0.8)P P 
 
4 
ȡ  0.  
1 2( 0.5, 0.3)P P 
 
5 
ȡ  0  
 1 20.2, 0.6P P   
 1 2Avg.Cor ,Y Y  
 sd in   
0.2 -0.30(0.03) 0.1 0.3 0 
 1Avg.Cor ,Y X  
 sd in   
0.2 -0.30(0.03) 0.1 0.3 0 
 2Avg.Cor ,Y X  
 sd in   
0.2 -0.30(0.03) 0.1 0.3 0 
Avg.Odds Ratio  
 1 2 ,Y Y  
2.75 1.00 3.97 1.16 1.62 
Marshall – Olkin correlation
  1 2 ,Y Y   
0.22 0.002 0.20 0.04 0.12 
 
Marshall – Olkin correlation   
 2 1, / 0Y X Y   
0.0008 -0.0004 -0.0008 0.001 0 
Marshall –Olkin correlation
2 1( , / 1)Y X Y 
  
-0.003 -0.0009 -0.0008 -0.001 -0.001 
University of Ghana http://ugspace.ug.edu.gh
 86  
 
Marshall – Olkin correlation
 
 1 2/ 1Y X Y ,    
0.0005 -0.002 0.0001 9.04^-05 0.001 
Marshall –Olkin correlation
 
 1 2, /  0Y X Y    
-0.0005 -0.0003 -0.001 0.0003 0 
3Dependence value based on 
 
1.01 0.0001 1.38 0.15 0.48 
 
Table A5b: Results of estimated models from 1000 simulations with sample of size of 1000: 
Regressive model 
Regressive model 
             
 
Estimates 
1 
ȡ  0.2  
1 2( 0.1, 0.1)P P   
2 
ȡ  0.  
1 2( 0.5, 0.5)P P 
 
3 
ȡ  0.1  
1 2( 0.1, 0.8)P P 
 
4 
ȡ  0.  
1 2( 0.5, 0.3)P P 
 
5 
ȡ  0  
 1 20.2, 0.6P P   
. 1Av   
%  0.05of p     
1.40 2.60 0.63 0.94 (used 800) 0.09 
 
100 
 
100 
 
80 
 
100 
 
20 
.Av   
%  0.05of p   
1.19 -2.63 1.15 1.20 0.03 
 
100 
 
100 
 
100 
 
100 
 
10 
2.Av   
 :  Wald test %  0.05of p   
56.8 Out of 15.7 Out of 2.6 
100  100  15 
 
Table A5c: Results of estimated models from 1000 simulations with sample of size of 1000: 
Conditional models 
Conditional models 
             
 
Estimates 
1 
ȡ  0.2  
1 2( 0.1, 0.1)P P   
2 
ȡ  0.  
1 2( 0.5, 0.5)P P 
 
3 
ȡ  0.1  
1 2( 0.1, 0.8)P P 
 
4 
ȡ  0.  
1 2( 0.5, 0.3)P P 
 
5 
ȡ  0  
 1 20.2, 0.6P P   
 Conditional 
1Y = 0  
  
01
  
1.18 -2.71 1.06 1.14 0.09 
University of Ghana http://ugspace.ug.edu.gh
 87  
 
%  0.05of p   100 100 100 100 20 
      
Conditional 
1Y = 1  
11
  
1.25 -2.67 1.40 1.24( used 800 0.02 
%  0.05of p    100 100 100 100 0 
:Wald test %  0.05of p 
 
 
100 100 100 Run out of 
iterations-Did 
not converge 
 
0 
 
 
 
Table A5d: Results of estimated models from 1000 simulations with sample of size of 1000: 
Marginal model 
Marginal model 
             
 
Estimates 
1 
ȡ  0.2  
1 2( 0.1, 0.1)P P   
2 
ȡ  0.  
1 2( 0.5, 0.5)P P 
 
3 
ȡ  0.1  
1 2( 0.1, 0.8)P P 
 
4 
ȡ  0.  
1 2( 0.5, 0.3)P P 
 
5 
ȡ  0  
 1 20.2, 0.6P P   
. 1( . )Av s e  
-0.2(0.18) -0.06(0.18) 0.16(0.13) 0.03(0.09) -0.01(0.1) 
:Wald test %  0.05of p 
 
0 0 29 0 0 
 
Table A5e: Results of estimated models from 1000 simulations with sample of size of 1000: 
Bivariate Bernoulli model  
Bivariate Bernoulli model 
             
 
Estimates 
1 
ȡ  0.2  
1 2( 0.1, 0.1)P P   
2 
ȡ  0.  
1 2( 0.5, 0.5)P P 
 
3 
ȡ  0.1  
1 2( 0.1, 0.8)P P 
 
4 
ȡ  0.  
1 2( 0.5, 0.3)P P 
 
5 
ȡ  0  
 1 20.2, 0.6P P   
11
  
3.69 1.32 0.38 1.07 1.91 
%  0.05of p   5     
01
  
-0.46 1.50 1..46 0.61 0.22 
%  0.05of p        
. 1VA   
2.74 1.44 -0.03 0.51 1.91 
University of Ghana http://ugspace.ug.edu.gh
 88  
 
%  0.05of p        
                                 Likelihood ratio test (proposed model and regressive model) 
  p<0.00001 (result is significant at 5% level of significance) 
Likelihood ratio test (proposed model and conditional 
1Y = 0  ) 
 p <0.00001 (result is significant at 5% level of significance) 
Likelihood ratio test (proposed model and conditional 
1Y = 1  ) 
  p<0.00001 (result is significant at 5% level of significance) 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
University of Ghana http://ugspace.ug.edu.gh
 89  
 
APPENDIX B: Codes used in data analysis 
B1:Codes used for simulation (Sample size  50 only) 
############################################# 
R CODES ---SIMULATION EXERCISE--- R CODES   
############################################ 
Y1 ~ Bernoulli (p1) 
Y2 ~ Bernoulli (p2) 
###################################### 
Y1 and Y2 are correlated with correlation \rho 
###################################### 
library (bindata) 
set.seed(1234) 
n=50 
p1=0.1 
p2=0.1 
p3=0.5 
rho=0.2 
################################################ 
Generating correlated multivariate binary random variables 
################################################ 
count=0 
corY1.Y2=0 
corY1.X=0 
corY2.X=0 
while (count<=1000) 
{ 
cmdata<-rmvbin(n, c(p1,p2,p3), bincorr=(1-rho)*diag(3)+rho) 
cmdata 
dfcmdata<-data.frame(cmdata) 
column1 <- dfcmdata[,1:1] 
column1 
column2 <- dfcmdata[,2:2] 
column2 
column3 <- dfcmdata[,3:3] 
column3 
corY1.Y2[count]<-cor(column1,column2) 
corY1.X[count]<-cor(column1,column3) 
corY2.X[count]<-cor(column2,column3) 
print(dfcmdata) 
print(corY1.Y2) 
print(corY1.X) 
print(corY2.X) 
count=count+1 
 
corY1.Y2[count]=corY1.Y2[count]+0 
corY1.X[count]=corY1.X[count]+0 
corY2.X[count]=corY1.X[count]+0 
} 
University of Ghana http://ugspace.ug.edu.gh
 90  
 
 
############################################################# 
Calculating the true correlation between Y1 and Y2 for various sample sizes 
############################################### 
corY1.Y2 
require(psych) 
describeBy(corY1.Y2) 
 
############################################################# 
 Calculating the true correlation between Y1 and X for various sample sizes 
############################################################# 
corY1.X 
describeBy(corY1.X) 
 
############################################################# 
 Calculating the true correlation between Y1 and X for various sample sizes 
############################################################# 
corY2.X 
describeBy(corY2.X) 
################################################################ 
Counting number of  (Y1=0,Y2=0),(Y1=0,Y2=1) ,(Y1=1,Y2=0),  (Y1=1,Y2=1)  
################################################################ 
library (bindata) 
set.seed(1234) 
 
n=50 
p1=0.1  
p2=0.1 
p3=0.5 
rho=0.2 
 
############################### 
Counting the number of (Y1=0,Y2=0) 
############################### 
counter=0 
N00=0 
while (counter<=1000) 
{ 
cmdata<-rmvbin(n, c(p1,p2,p3), bincorr=(1-rho)*diag(3)+rho) 
cmdata 
dfcmdata<-data.frame(cmdata) 
 
CountMydata<- function(numrow) { 
mymatrix <- matrix(c(cmdata),ncol=3,byrow=TRUE) 
mymatrix 
count = 0 
numrow <- nrow(mymatrix) 
for (i in 1:numrow) 
{ 
if(mymatrix[i,1]=="0"& mymatrix[i,2]== "0") 
{ 
University of Ghana http://ugspace.ug.edu.gh
 91  
 
count = count + 1 
} 
} 
return(count) 
} 
 
N00[counter]<-CountMydata(numrow) 
print(cmdata) 
print(N00) 
counter=counter+1 
N00[counter]=N00[counter]+0 
 
} 
 
print(N00) 
############################ 
Average number of (Y1=0,Y2=0) 
############################  
print(N00) 
require(psych) 
describeBy(N00) 
 
############################### 
Counting the number of  (Y1=0,Y2=1) 
############################### 
counter=0 
N01=0 
while (counter<=1000) 
{ 
cmdata<-rmvbin(n, c(p1,p2,p3), bincorr=(1-rho)*diag(3)+rho) 
cmdata 
dfcmdata<-data.frame(cmdata) 
 
CountMydata<- function(numrow) { 
mymatrix <- matrix(c(cmdata),ncol=3,byrow=TRUE) 
mymatrix 
count = 0 
numrow <- nrow(mymatrix) 
for (i in 1:numrow) 
{ 
if(mymatrix[i,1]=="0"& mymatrix[i,2]== "1") 
{ 
count = count + 1 
} 
} 
return(count) 
} 
 
N01[counter]<-CountMydata(numrow) 
print(cmdata) 
print(N01) 
University of Ghana http://ugspace.ug.edu.gh
 92  
 
counter=counter+1 
N01[counter]=N01[counter]+0 
 
} 
 
print(N01) 
############################ 
Average number of  (Y1=0,Y2=1) 
############################ 
print(N01) 
require(psych) 
describeBy(N01) 
 
############################### 
Counting the number of  (Y1=0,Y2=1) 
############################### 
counter=0 
N10=0 
while (counter<=1000) 
{ 
cmdata<-rmvbin(n, c(p1,p2,p3), bincorr=(1-rho)*diag(3)+rho) 
cmdata 
dfcmdata<-data.frame(cmdata) 
 
CountMydata<- function(numrow) { 
mymatrix <- matrix(c(cmdata),ncol=3,byrow=TRUE) 
mymatrix 
count = 0 
numrow <- nrow(mymatrix) 
for (i in 1:numrow) 
{ 
if(mymatrix[i,1]=="1"& mymatrix[i,2]== "0") 
{ 
count = count + 1 
} 
} 
return(count) 
} 
 
N10[counter]<-CountMydata(numrow) 
print(cmdata) 
print(N10) 
counter=counter+1 
N10[counter]=N10[counter]+0 
 
} 
 
print(N10) 
 
 
 
University of Ghana http://ugspace.ug.edu.gh
 93  
 
########################### 
Average number of (Y1=0,Y2=1) 
########################### 
print(N10) 
require(psych) 
describeBy(N10) 
 
############################### 
Counting the number of (Y1=1,Y2=1) 
############################### 
counter=0 
N11=0 
while (counter<=1000) 
{ 
cmdata<-rmvbin(n, c(p1,p2,p3), bincorr=(1-rho)*diag(3)+rho) 
cmdata 
dfcmdata<-data.frame(cmdata) 
 
CountMydata<- function(numrow) { 
mymatrix <- matrix(c(cmdata),ncol=3,byrow=TRUE) 
mymatrix 
count = 0 
numrow <- nrow(mymatrix) 
for (i in 1:numrow) 
{ 
if(mymatrix[i,1]=="1"& mymatrix[i,2]== "1") 
{ 
count = count + 1 
} 
} 
return(count) 
} 
 
N11[counter]<-CountMydata(numrow) 
print(cmdata) 
print(N11) 
counter=counter+1 
N11[counter]=N11[counter]+0 
 
} 
 
print(N11) 
 
#################### 
Average number of N11 
#################### 
print(N11) 
require(psych) 
describeBy(N11) 
describeBy(N00) 
describeBy(N01) 
University of Ghana http://ugspace.ug.edu.gh
 94  
 
describeBy(N10) 
describeBy(N11) 
 
N00av<-31.12 
N01av<-7.49 
N10av<-7.04 
N11av<-4.35 
 
N00av/(N00av+N01av) 
N01av/(N00av+N01av) 
 
######################### 
Transition Probabilities matrix 
######################### 
 
Transitionprob<-
matrix(c(N00av/(N00av+N01av),N01av/(N00av+N01av),N10av/(N10av+N11av),N11av/(N10av+N11av)
),nrow=2,byrow=TRUE)  
Transitionprob 
 
############################### 
Labelling the Transitional probabilities 
############################### 
dimnames(Transitionprob) <- list(c("0","1"),c("0","1"))   
Transitionprob 
 
##################### 
Calculating the odds ratio 
##################### 
Transitionprob.test <- prop.test(Transitionprob)  
Transitionprob.test$estimate  
 
odds <- Transitionprob.test$estimate/(1-Transitionprob.test$estimate)  
odds 
odds[1]/odds[2] 
oddsratio <-(Transitionprob[1,1]*Transitionprob[2,2])/(Transitionprob[2,1]*Transitionprob[1,2])  
oddsratio 
 
############################# 
Calculating odds ratio: Alternatively 
############################# 
myoddsratio<-(N00av*N11av)/(N01av*N10av) 
myoddsratio 
 
################################################## 
Calculating Marshall–Olkin correlation estimator (Y1 and Y2) 
################################################## 
 
MO.cor.num<-(N11av/(N10av+N11av))*(N00av/(N00av+N01av))-
(N10av/(N10av+N11av))*(N01av/(N01av+N00av)) 
  
University of Ghana http://ugspace.ug.edu.gh
 95  
 
P00<-(N00av/(N00av+N01av)) 
P01<-(N01av/(N01av+N00av)) 
P10<-(N10av/(N10av+N11av)) 
P11<-(N11av/(N10av+N11av)) 
P1<-P00+P01 
P1 
P2<-P10+P11 
P3<-P00+P10 
P4<-P01+P11 
 
M0.cor.den<-sqrt(P1*P2*P3*P4) 
M0.cor.den 
 
MO.cor<-MO.cor.num/M0.cor.den 
MO.cor 
 
################################### 
Correlation between Y2 and X given Y1=0  
################################### 
 
ww.sb<-subset(dfcmdata,X1==0) 
ww.sb 
 
column2 <- ww.sb[,2:2] 
column2 
column3 <- ww.sb[,3:3] 
column3 
cor(column2,column3) 
 
########################################################## 
Correlation between Y2 and X given Y1=0;Using MO....process begins 
########################################################## 
############################## 
Counting the number of (Y2=0,X=0)  
############################## 
counter=0  
N00=0 
while (counter<=1000) 
{ 
cmdata<-rmvbin(n, c(p1,p2,p3), bincorr=(1-rho)*diag(3)+rho) 
cmdata 
dfcmdata<-data.frame(cmdata) 
ww.sb<-subset(dfcmdata,X1==0) 
ww.sb 
 
mydata.mat<-as.matrix(ww.sb[,c(-1)]) 
mydata.mat 
 
CountMydata1<- function(numrow) { 
mymatrix <- matrix(c(mydata.mat),ncol=2,byrow=TRUE) 
University of Ghana http://ugspace.ug.edu.gh
 96  
 
mymatrix 
count = 0 
numrow <- nrow(mymatrix) 
for (i in 1:numrow) 
{ 
if(mymatrix[i,1]=="0"& mymatrix[i,2]=="0") 
{ 
count = count + 1 
} 
} 
return(count) 
} 
N00[counter]<-CountMydata(numrow) 
print(mymatrix) 
print(N00) 
counter=counter+1 
N00[counter]=N00[counter]+0 
 
} 
 
 
########################### 
Average number of (Y2=0,X=0) 
########################## 
print(N00) 
require(psych) 
describeBy(N00) 
 
 
############################### 
Counting the number of  (Y2=0,X=1)  
############################### 
counter=0 
N01=0 
while (counter<=1000) 
{ 
cmdata<-rmvbin(n, c(p1,p2,p3), bincorr=(1-rho)*diag(3)+rho) 
cmdata 
dfcmdata<-data.frame(cmdata) 
ww.sb<-subset(dfcmdata,X1==0) 
ww.sb 
 
mydata.mat<-as.matrix(ww.sb[,c(-1)]) 
mydata.mat 
 
CountMydata1<- function(numrow) { 
mymatrix <- matrix(c(mydata.mat),ncol=2,byrow=TRUE) 
mymatrix 
count = 0 
numrow <- nrow(mymatrix) 
University of Ghana http://ugspace.ug.edu.gh
 97  
 
for (i in 1:numrow) 
{ 
if(mymatrix[i,1]=="0"& mymatrix[i,2]=="1") 
{ 
count = count + 1 
} 
} 
return(count) 
} 
N01[counter]<-CountMydata(numrow) 
print(mymatrix) 
print(N01) 
counter=counter+1 
N01[counter]=N01[counter]+0 
 
} 
 
########################### 
Average number of (Y2=0,X=1) 
########################## 
print(N01) 
require(psych) 
describeBy(N01) 
 
############################### 
Counting the number of  (Y2=1,X=0)  
############################### 
counter=0 
N10=0 
while (counter<=1000) 
{ 
cmdata<-rmvbin(n, c(p1,p2,p3), bincorr=(1-rho)*diag(3)+rho) 
cmdata 
dfcmdata<-data.frame(cmdata) 
ww.sb<-subset(dfcmdata,X1==0) 
ww.sb 
 
mydata.mat<-as.matrix(ww.sb[,c(-1)]) 
mydata.mat 
 
CountMydata1<- function(numrow) { 
mymatrix <- matrix(c(mydata.mat),ncol=2,byrow=TRUE) 
mymatrix 
count = 0 
numrow <- nrow(mymatrix) 
for (i in 1:numrow) 
{ 
if(mymatrix[i,1]=="1"& mymatrix[i,2]=="0") 
{ 
count = count + 1 
University of Ghana http://ugspace.ug.edu.gh
 98  
 
} 
} 
return(count) 
} 
N10[counter]<-CountMydata(numrow) 
print(mymatrix) 
print(N10) 
counter=counter+1 
N10[counter]=N10[counter]+0 
 
} 
 
 
########################## 
Average number of (Y2=1,X=0)  
########################## 
print(N10) 
require(psych) 
describeBy(N10) 
 
############################## 
Counting the number of  (Y2=1,X=1)  
############################## 
counter=0 
N11=0 
while (counter<=1000) 
{ 
cmdata<-rmvbin(n, c(p1,p2,p3), bincorr=(1-rho)*diag(3)+rho) 
cmdata 
dfcmdata<-data.frame(cmdata) 
ww.sb<-subset(dfcmdata,X1==0) 
ww.sb 
 
mydata.mat<-as.matrix(ww.sb[,c(-1)]) 
mydata.mat 
 
CountMydata1<- function(numrow) { 
mymatrix <- matrix(c(mydata.mat),ncol=2,byrow=TRUE) 
mymatrix 
count = 0 
numrow <- nrow(mymatrix) 
for (i in 1:numrow) 
{ 
if(mymatrix[i,1]=="1"& mymatrix[i,2]=="1") 
{ 
count = count + 1 
} 
} 
return(count) 
} 
University of Ghana http://ugspace.ug.edu.gh
 99  
 
N11[counter]<-CountMydata(numrow) 
print(mymatrix) 
print(N11) 
counter=counter+1 
N11[counter]=N11[counter]+0 
 
} 
 
 
########################## 
Average number of (Y2=1,X=1) 
########################## 
print(N11) 
require(psych) 
describeBy(N11) 
 
describeBy(N00) 
describeBy(N01) 
describeBy(N10) 
describeBy(N11) 
 
N00a<-4.34 
N01a<-4.32 
N10a<-4.34 
N11a<-4.38 
 
######################################################### 
#Correlation between Y2 and X given Y1=0;Using MO....process ends 
######################################################### 
 
MO.cor.numa<-(N11a/(N10a+N11a))*(N00a/(N00a+N01a))-(N10a/(N10a+N11a))*(N01a/(N01a+N00a)) 
P00a<-(N00a/(N00a+N01a)) 
P01a<-(N01a/(N01a+N00a)) 
P10a<-(N10a/(N10a+N11a)) 
P11a<-(N11a/(N10a+N11a)) 
P1a<-P00a+P01a 
P2a<-P10a+P11a 
P3a<-P00a+P10a 
P4a<-P01a+P11a 
 
M0.cor.dena<-sqrt(P1a*P2a*P3a*P4a) 
M0.cor.dena 
 
MO.cora<-MO.cor.numa/M0.cor.dena 
MO.cora 
 
########################################################### 
#Correlation between Y2 and X given Y1=1;Using MO.... process begins 
########################################################### 
 
 
University of Ghana http://ugspace.ug.edu.gh
 100  
 
############################## 
Counting the number of (Y2=0,X=0)  
############################## 
counter=0 
N00=0 
while (counter<=1000) 
{ 
cmdata<-rmvbin(n, c(p1,p2,p3), bincorr=(1-rho)*diag(3)+rho) 
cmdata 
dfcmdata<-data.frame(cmdata) 
ww.sb<-subset(dfcmdata,X1==1) 
ww.sb 
 
mydata.mat<-as.matrix(ww.sb[,c(-1)]) 
mydata.mat 
 
CountMydata1<- function(numrow) { 
mymatrix <- matrix(c(mydata.mat),ncol=2,byrow=TRUE) 
mymatrix 
count = 0 
numrow <- nrow(mymatrix) 
for (i in 1:numrow) 
{ 
if(mymatrix[i,1]=="0"& mymatrix[i,2]=="0") 
{ 
count = count + 1 
} 
} 
return(count) 
} 
N00[counter]<-CountMydata(numrow) 
print(mymatrix) 
print(N00) 
counter=counter+1 
N00[counter]=N00[counter]+0 
 
} 
 
############################ 
#Average number of (Y2=0,X=0) 
############################ 
print(N00) 
require(psych) 
describeBy(N00) 
 
######################## 
Counting the number of N01 
####################### 
counter=0 
N01=0 
while (counter<=1000) 
University of Ghana http://ugspace.ug.edu.gh
 101  
 
{ 
cmdata<-rmvbin(n, c(p1,p2,p3), bincorr=(1-rho)*diag(3)+rho) 
cmdata 
dfcmdata<-data.frame(cmdata) 
ww.sb<-subset(dfcmdata,X1==1) 
ww.sb 
 
mydata.mat<-as.matrix(ww.sb[,c(-1)]) 
mydata.mat 
 
CountMydata1<- function(numrow) { 
mymatrix <- matrix(c(mydata.mat),ncol=2,byrow=TRUE) 
mymatrix 
count = 0 
numrow <- nrow(mymatrix) 
for (i in 1:numrow) 
{ 
if(mymatrix[i,1]=="0"& mymatrix[i,2]=="1") 
{ 
count = count + 1 
} 
} 
return(count) 
} 
N01[counter]<-CountMydata(numrow) 
print(mymatrix) 
print(N01) 
counter=counter+1 
N01[counter]=N01[counter]+0 
 
} 
 
########################### 
#Average number of (Y2=0,X=1)  
########################### 
print(N01) 
require(psych) 
describeBy(N01) 
 
############################## 
Counting the number of (Y2=1,X=0)  
############################## 
counter=0 
N10=0 
while (counter<=1000) 
{ 
cmdata<-rmvbin(n, c(p1,p2,p3), bincorr=(1-rho)*diag(3)+rho) 
cmdata 
dfcmdata<-data.frame(cmdata) 
ww.sb<-subset(dfcmdata,X1==1) 
ww.sb 
University of Ghana http://ugspace.ug.edu.gh
 102  
 
 
mydata.mat<-as.matrix(ww.sb[,c(-1)]) 
mydata.mat 
 
CountMydata1<- function(numrow) { 
mymatrix <- matrix(c(mydata.mat),ncol=2,byrow=TRUE) 
mymatrix 
count = 0 
numrow <- nrow(mymatrix) 
for (i in 1:numrow) 
{ 
if(mymatrix[i,1]=="1"& mymatrix[i,2]=="0") 
{ 
count = count + 1 
} 
} 
return(count) 
} 
N10[counter]<-CountMydata(numrow) 
print(mymatrix) 
print(N10) 
counter=counter+1 
N10[counter]=N10[counter]+0 
 
} 
 
########################## 
Average number of (Y2=1,X=0)  
########################## 
print(N10) 
require(psych) 
describeBy(N10) 
 
############################## 
Counting the number of (Y2=1,X=1)   
############################## 
counter=0 
N11=0 
while (counter<=1000) 
{ 
cmdata<-rmvbin(n, c(p1,p2,p3), bincorr=(1-rho)*diag(3)+rho) 
cmdata 
dfcmdata<-data.frame(cmdata) 
ww.sb<-subset(dfcmdata,X1==1) 
ww.sb 
 
mydata.mat<-as.matrix(ww.sb[,c(-1)]) 
mydata.mat 
 
CountMydata1<- function(numrow) { 
mymatrix <- matrix(c(mydata.mat),ncol=2,byrow=TRUE) 
University of Ghana http://ugspace.ug.edu.gh
 103  
 
mymatrix 
count = 0 
numrow <- nrow(mymatrix) 
for (i in 1:numrow) 
{ 
if(mymatrix[i,1]=="1"& mymatrix[i,2]=="1") 
{ 
count = count + 1 
} 
} 
return(count) 
} 
N11[counter]<-CountMydata(numrow) 
print(mymatrix) 
print(N11) 
counter=counter+1 
N11[counter]=N11[counter]+0 
 
} 
 
########################## 
Average number of (Y2=1,X=1)   
########################## 
print(N11) 
require(psych) 
describeBy(N11) 
 
describeBy(N00) 
describeBy(N01) 
describeBy(N10) 
describeBy(N11) 
 
N00b<-4.46 
N01b<-4.4 
N10b<-4.41 
N11b<-4.3 
 
 
######################################################### 
Correlation between Y2 and X given Y1=1;Using MO..... process ends 
######################################################### 
 
MO.cor.num.b<-(N11b/(N10b+N11b))*(N00b/(N00b+N01b))-
(N10b/(N10b+N11b))*(N01b/(N01b+N00b)) 
P00b<-(N00b/(N00b+N01b)) 
P01b<-(N01b/(N01b+N00b)) 
P10b<-(N10b/(N10b+N11b)) 
P11b<-(N11b/(N10b+N11b)) 
P1b<-P00b+P01b 
P2b<-P10b+P11b 
P3b<-P00b+P10b 
University of Ghana http://ugspace.ug.edu.gh
 104  
 
P4b<-P01b+P11b 
 
M0.cor.den.b<-sqrt(P1b*P2b*P3b*P4b) 
M0.cor.den.b 
MO.cor.b<-MO.cor.num.b/M0.cor.den.b 
MO.cor.b 
########################################################## 
Correlation between Y1 and X given Y2=1;Using MO....process begins 
########################################################## 
 
############################## 
Counting the number of (Y1=0,X=0)   
############################## 
counter=0 
N00=0 
while (counter<=1000) 
{ 
cmdata<-rmvbin(n, c(p1,p2,p3), bincorr=(1-rho)*diag(3)+rho) 
cmdata 
dfcmdata<-data.frame(cmdata) 
ww.sb<-subset(dfcmdata,X2==1) 
ww.sb 
 
mydata.mat<-as.matrix(ww.sb[,c(-2)]) 
mydata.mat 
 
CountMydata1<- function(numrow) { 
mymatrix <- matrix(c(mydata.mat),ncol=2,byrow=TRUE) 
mymatrix 
count = 0 
numrow <- nrow(mymatrix) 
for (i in 1:numrow) 
{ 
if(mymatrix[i,1]=="0"& mymatrix[i,2]=="0") 
{ 
count = count + 1 
} 
} 
return(count) 
} 
N00[counter]<-CountMydata(numrow) 
print(mymatrix) 
print(N00) 
counter=counter+1 
N00[counter]=N00[counter]+0 
 
} 
 
########################## 
Average number of(Y1=0,X=0)   
########################## 
University of Ghana http://ugspace.ug.edu.gh
 105  
 
print(N00) 
require(psych) 
describeBy(N00) 
 
 
############################## 
Counting the number of  (Y1=0,X=1)   
############################## 
counter=0 
N01=0 
while (counter<=1000) 
{ 
cmdata<-rmvbin(n, c(p1,p2,p3), bincorr=(1-rho)*diag(3)+rho) 
cmdata 
dfcmdata<-data.frame(cmdata) 
ww.sb<-subset(dfcmdata,X2==1) 
ww.sb 
 
mydata.mat<-as.matrix(ww.sb[,c(-2)]) 
mydata.mat 
 
CountMydata1<- function(numrow) { 
mymatrix <- matrix(c(mydata.mat),ncol=2,byrow=TRUE) 
mymatrix 
count = 0 
numrow <- nrow(mymatrix) 
for (i in 1:numrow) 
{ 
if(mymatrix[i,1]=="0"& mymatrix[i,2]=="1") 
{ 
count = count + 1 
} 
} 
return(count) 
} 
N01[counter]<-CountMydata(numrow) 
print(mymatrix) 
print(N01) 
counter=counter+1 
N01[counter]=N01[counter]+0 
 
} 
 
 
########################### 
Average number of  (Y1=0,X=1)   
########################### 
print(N01) 
require(psych) 
describeBy(N01) 
 
University of Ghana http://ugspace.ug.edu.gh
 106  
 
 
############################## 
Counting the number of (Y1=0,X=0)   
############################## 
counter=0 
N10=0 
while (counter<=1000) 
{ 
cmdata<-rmvbin(n, c(p1,p2,p3), bincorr=(1-rho)*diag(3)+rho) 
cmdata 
dfcmdata<-data.frame(cmdata) 
ww.sb<-subset(dfcmdata,X2==1) 
ww.sb 
 
mydata.mat<-as.matrix(ww.sb[,c(-2)]) 
mydata.mat 
 
CountMydata1<- function(numrow) { 
mymatrix <- matrix(c(mydata.mat),ncol=2,byrow=TRUE) 
mymatrix 
count = 0 
numrow <- nrow(mymatrix) 
for (i in 1:numrow) 
{ 
if(mymatrix[i,1]=="1"& mymatrix[i,2]=="0") 
{ 
count = count + 1 
} 
} 
return(count) 
} 
N10[counter]<-CountMydata(numrow) 
print(mymatrix) 
print(N10) 
counter=counter+1 
N10[counter]=N10[counter]+0 
 
} 
 
########################### 
Average number of (Y1=0,X=0)   
########################## 
print(N10) 
require(psych) 
describeBy(N10) 
 
############################## 
Counting the number of  (Y1=1,X=1)   
############################## 
counter=0 
N11=0 
University of Ghana http://ugspace.ug.edu.gh
 107  
 
while (counter<=1000) 
{ 
cmdata<-rmvbin(n, c(p1,p2,p3), bincorr=(1-rho)*diag(3)+rho) 
cmdata 
dfcmdata<-data.frame(cmdata) 
ww.sb<-subset(dfcmdata,X2==1) 
ww.sb 
 
mydata.mat<-as.matrix(ww.sb[,c(-2)]) 
mydata.mat 
 
CountMydata1<- function(numrow) { 
mymatrix <- matrix(c(mydata.mat),ncol=2,byrow=TRUE) 
mymatrix 
count = 0 
numrow <- nrow(mymatrix) 
for (i in 1:numrow) 
{ 
if(mymatrix[i,1]=="1"& mymatrix[i,2]=="1") 
{ 
count = count + 1 
} 
} 
return(count) 
} 
N11[counter]<-CountMydata(numrow) 
print(mymatrix) 
print(N11) 
counter=counter+1 
N11[counter]=N11[counter]+0 
 
} 
 
 
 
########################### 
#Average number of (Y1=1,X=1)   
########################### 
print(N11) 
require(psych) 
describeBy(N11) 
 
describeBy(N00) 
describeBy(N01) 
describeBy(N10) 
describeBy(N11) 
 
 
N00c<-4.39 
N01c<-4.38 
N10c<-4.32 
University of Ghana http://ugspace.ug.edu.gh
 108  
 
N11c<-4.37 
######################################## ################ 
Correlation between Y1 and X given Y2=1;Using MO.... process ends 
######################################################### 
 
MO.cor.num.c<-(N11c/(N10c+N11c))*(N00c/(N00c+N01c))-
(N10c/(N10c+N11c))*(N01c/(N01c+N00c)) 
P00c<-(N00c/(N00c+N01c)) 
P01c<-(N01c/(N01c+N00c)) 
P10c<-(N10c/(N10c+N11c)) 
P11c<-(N11c/(N10c+N11c)) 
P1c<-P00c+P01c 
P2c<-P10c+P11c 
P3c<-P00c+P10c 
P4c<-P01c+P11c 
 
M0.cor.den.c<-sqrt(P1c*P2c*P3c*P4c) 
M0.cor.den.c 
MO.cor.c<-MO.cor.num.c/M0.cor.den.c 
MO.cor.c 
 
 
########################################################## 
#Correlation between Y1 and X given Y2=0;Using MO....process begins 
########################################################## 
##############################   
Counting the number of (Y1=0,X=0)    
############################## 
counter=0 
N00=0 
while (counter<=1000) 
{ 
cmdata<-rmvbin(n, c(p1,p2,p3), bincorr=(1-rho)*diag(3)+rho) 
cmdata 
dfcmdata<-data.frame(cmdata) 
ww.sb<-subset(dfcmdata,X2==0) 
ww.sb 
 
mydata.mat<-as.matrix(ww.sb[,c(-2)]) 
mydata.mat 
 
CountMydata1<- function(numrow) { 
mymatrix <- matrix(c(mydata.mat),ncol=2,byrow=TRUE) 
mymatrix 
count = 0 
numrow <- nrow(mymatrix) 
for (i in 1:numrow) 
{ 
if(mymatrix[i,1]=="0"& mymatrix[i,2]=="0") 
{ 
count = count + 1 
University of Ghana http://ugspace.ug.edu.gh
 109  
 
} 
} 
return(count) 
} 
N00[counter]<-CountMydata(numrow) 
print(mymatrix) 
print(N00) 
counter=counter+1 
N00[counter]=N00[counter]+0 
 
} 
 
########################## 
Average number of (Y1=0,X=0)   
########################## 
print(N00) 
require(psych) 
describeBy(N00) 
 
############################## 
Counting the number of (Y1=0,X=1)  
############################## 
counter=0 
N01=0 
while (counter<=1000) 
{ 
cmdata<-rmvbin(n, c(p1,p2,p3), bincorr=(1-rho)*diag(3)+rho) 
cmdata 
dfcmdata<-data.frame(cmdata) 
ww.sb<-subset(dfcmdata,X2==0) 
ww.sb 
 
mydata.mat<-as.matrix(ww.sb[,c(-2)]) 
mydata.mat 
 
CountMydata1<- function(numrow) { 
mymatrix <- matrix(c(mydata.mat),ncol=2,byrow=TRUE) 
mymatrix 
count = 0 
numrow <- nrow(mymatrix) 
for (i in 1:numrow) 
{ 
if(mymatrix[i,1]=="0"& mymatrix[i,2]=="1") 
{ 
count = count + 1 
} 
} 
return(count) 
} 
N01[counter]<-CountMydata(numrow) 
print(mymatrix) 
University of Ghana http://ugspace.ug.edu.gh
 110  
 
print(N01) 
counter=counter+1 
N01[counter]=N01[counter]+0 
 
} 
 
############################ 
#Average number of (Y1=0,X=1)   
########################### 
print(N01) 
require(psych) 
describeBy(N01) 
 
 
############################## 
Counting the number of (Y1=0,X=0)   
############################## 
counter=0 
N10=0 
while (counter<=1000) 
{ 
cmdata<-rmvbin(n, c(p1,p2,p3), bincorr=(1-rho)*diag(3)+rho) 
cmdata 
dfcmdata<-data.frame(cmdata) 
ww.sb<-subset(dfcmdata,X2==0) 
ww.sb 
 
mydata.mat<-as.matrix(ww.sb[,c(-2)]) 
mydata.mat 
 
CountMydata1<- function(numrow) { 
mymatrix <- matrix(c(mydata.mat),ncol=2,byrow=TRUE) 
mymatrix 
count = 0 
numrow <- nrow(mymatrix) 
for (i in 1:numrow) 
{ 
if(mymatrix[i,1]=="1"& mymatrix[i,2]=="0") 
{ 
count = count + 1 
} 
} 
return(count) 
} 
N10[counter]<-CountMydata(numrow) 
print(mymatrix) 
print(N10) 
counter=counter+1 
N10[counter]=N10[counter]+0 
 
} 
University of Ghana http://ugspace.ug.edu.gh
 111  
 
 
############################ 
#Average number of  (Y1=0,X=0)   
############################ 
print(N10) 
require(psych) 
describeBy(N10) 
 
############################### 
Counting the number of  (Y1=1,X=1)  
############################### 
counter=0 
N11=0 
while (counter<=1000) 
{ 
cmdata<-rmvbin(n, c(p1,p2,p3), bincorr=(1-rho)*diag(3)+rho) 
cmdata 
dfcmdata<-data.frame(cmdata) 
ww.sb<-subset(dfcmdata,X2==0) 
ww.sb 
 
mydata.mat<-as.matrix(ww.sb[,c(-2)]) 
mydata.mat 
 
CountMydata1<- function(numrow) { 
mymatrix <- matrix(c(mydata.mat),ncol=2,byrow=TRUE) 
mymatrix 
count = 0 
numrow <- nrow(mymatrix) 
for (i in 1:numrow) 
{ 
if(mymatrix[i,1]=="1"& mymatrix[i,2]=="1") 
{ 
count = count + 1 
} 
} 
return(count) 
} 
N11[counter]<-CountMydata(numrow) 
print(mymatrix) 
print(N11) 
counter=counter+1 
N11[counter]=N11[counter]+0 
 
} 
 
########################## 
Average number of (Y1=1,X=1)    
########################## 
print(N11) 
require(psych) 
University of Ghana http://ugspace.ug.edu.gh
 112  
 
describeBy(N11) 
 
describeBy(N00) 
describeBy(N01) 
describeBy(N10) 
describeBy(N11) 
 
 
N00d<-4.38 
N01d<-4.34 
N10d<-4.39 
N11d<-4.35  
######################################################## 
Correlation between Y2 and X given Y1=0;Using MO.... process ends 
######################################################## 
MO.cor.numd<-(N11d/(N10d+N11d))*(N00d/(N00d+N01d))-
(N10d/(N10d+N11d))*(N01d/(N01d+N00d)) 
P00d<-(N00d/(N00d+N01d)) 
P01d<-(N01d/(N01d+N00d)) 
P10d<-(N10d/(N10d+N11d)) 
P11d<-(N11d/(N10d+N11d)) 
P1d<-P00d+P01d 
P2d<-P10d+P11d 
P3d<-P00d+P10d 
P4d<-P01d+P11d 
 
M0.cor.dend<-sqrt(P1d*P2d*P3d*P4d) 
M0.cor.dend 
MO.cor.d<-MO.cor.numd/M0.cor.dend 
MO.cor.d 
 
######################################################### 
#Checking dependence betwwen Y1 and Y2 (if 0,then no dependence) 
######################################################### 
 
n<-(P00*P11)/(P01*P10) 
dependence.value<-log(n) 
dependence.value 
 
############################################ 
#Conditional logistic regression Y2 conditioned on Y1 
############################################ 
 
library (bindata) 
n=50 
p1=0.1 
p2=0.1 
p3=0.5 
rho=0.2 
counter=0 
University of Ghana http://ugspace.ug.edu.gh
 113  
 
while (counter<=5) 
{ 
 
cmdata<-rmvbin(n, c(p1,p2,p3), bincorr=(1-rho)*diag(3)+rho) 
cmdata 
dfcmdata<-data.frame(cmdata) 
counter=counter+0 
library(survival) 
rmodel<-clogit(X1~X2+X3,data=dfcmdata) 
print(model) 
 
} 
logLik(rmodel) 
############### 
Conditional Y1=0  
############### 
library (bindata) 
n=50 
p1=0.1 
p2=0.1 
p3=0.5 
rho=0.2 
counter=0 
while (counter<=5) 
{ 
 
cmdata<-rmvbin(n, c(p1,p2,p3), bincorr=(1-rho)*diag(3)+rho) 
cmdata 
dfcmdata<-data.frame(cmdata) 
counter=counter+0 
 
numrow <- nrow(dfcmdata) 
for (i in 1:numrow) 
{ 
if(dfcmdata[i,1]=="0"& dfcmdata[i,2]== "1") 
{ 
b<-dfcmdata[dfcmdata[,1]=="0"& dfcmdata[,2]== "1",] 
library(survival) 
c1model<-clogit(X1~X2+X3,data=dfcmdata) 
print(c1model) 
} 
} 
} 
 
logLik(c1model) 
 
############## 
Conditional Y1=1 
############## 
library (bindata) 
n=50 
University of Ghana http://ugspace.ug.edu.gh
 114  
 
p1=0.1 
p2=0.1 
p3=0.5 
rho=0.2 
counter=0 
while (counter<=5) 
{ 
 
cmdata<-rmvbin(n, c(p1,p2,p3), bincorr=(1-rho)*diag(3)+rho) 
cmdata 
dfcmdata<-data.frame(cmdata) 
counter=counter+0 
 
numrow <- nrow(dfcmdata) 
for (i in 1:numrow) 
{ 
if(dfcmdata[i,1]=="1"& dfcmdata[i,2]== "1") 
{ 
b<-dfcmdata[dfcmdata[,1]=="0"& dfcmdata[,2]== "1",] 
library(survival) 
c2model<-clogit(X1~X2+X3,data=dfcmdata) 
print(c2model) 
} 
} 
} 
logLik(c2model) 
 
########################## 
Marginal Model: process begins 
########################## 
library (bindata) 
library(reshape) 
n=50 
p1=0.1 
p2=0.1 
p3=0.5 
rho=0.2 
counter=0 
while (counter<=5) 
{ 
 
cmdata<-rmvbin(n, c(p1,p2,p3), bincorr=(1-rho)*diag(3)+rho) 
cmdata 
ys<-cmdata[,-3] 
s<-data.frame(t(ys)) 
melt(s) 
counter=counter+0 
 
 
############################# 
Generate random binary variable(x) 
University of Ghana http://ugspace.ug.edu.gh
 115  
 
############################# 
rnum<-rnorm(100, mean = 0, sd = 1) 
rnum 
 
##################### 
#Convert to binary values 
##################### 
X<-ra2ba(rnum) 
X 
clmx<-melt(X) 
dfclmx<-data.frame(clmx) 
 
################## 
#combine sample data 
################## 
sampledata<-cbind(melt(s),dfclmx) 
sampledata 
 
colnames(sampledata)[1]<-"ys" 
colnames(sampledata)[2]<-"biresp" 
colnames(sampledata)[3]<-"cvr" 
sampledata 
tm<-1:2 
melt(tm) 
time<-replicate(10,tm) 
melt(time) 
t2<-melt(time)[-1] 
ftime<-t2[-1] 
colnames(ftime)<-"time" 
 
fsampledata<-cbind(sampledata,ftime) 
counter=counter+0 
 
########################### 
Marginal modelling the GEE way 
########################### 
library(MASS) 
library(geepack) 
geeglm.out<-geeglm(formula=biresp~cvr+time,data=fsampledata,family=binomial,id=ys) 
summary(geeglm.out) 
} 
p<-print(summary(geeglm.out)) 
 
logLik(geeglm.out) 
 
####################################### 
Fitting the generalized bivariate Bernoulli model  
####################################### 
library(maxLik) 
library (bindata) 
n=50 
University of Ghana http://ugspace.ug.edu.gh
 116  
 
p1=0.1 
p2=0.1 
p3=0.5 
rho=0.2 
 
cmdata<-rmvbin(n, c(p1,p2,p3), bincorr=(1-rho)*diag(3)+rho) 
cmdata 
dfcmdata<-data.frame(cmdata) 
 
colnames(dfcmdata)[1]<-"Y1" 
colnames(dfcmdata)[2]<-"Y2" 
colnames(dfcmdata)[3]<-"X" 
head(dfcmdata) 
log.likelihood <- 0 
 
log.likelihood <- function(param) 
{ 
B01<-param[1] 
B11<-param[2] 
B1<-param[3] 
N <- length(dfcmdata) 
N1 <- length("Y1") 
N2 <- length("Y2") 
 
Y1<-dfcmdata[,1] 
Y2<-dfcmdata[,2] 
X<-dfcmdata[,3] 
 
for (i in 1:length(dfcmdata)) 
{ 
log.likelihood<- -sum(log(1+(X*exp(B01))))-
sum(log(1+(X*exp(B1))))+sum((Y1*X*B1))+sum(Y1*log(1+(X*exp(B01))))-
sum(Y1*log(1+(X*exp(B11))))+sum(Y2*X*B01)+sum((Y1*Y2*X*B11)-sum(Y1*Y2*X*B01)) 
 
result<-return(log.likelihood) 
} 
} 
 
mle <- maxLik(logLik =log.likelihood , start = c(B11=0.03,B1=0.013,B01=0.006)) 
summary(mle) 
 
mleNM <- maxLik(logLik = log.likelihood , start = c(B1=0.1,B11=4.03,B01=0.6), method = "NM") 
summary(mleNM) 
 
 
############################################# 
Likelihood ratio test.....Fixing the estimated parameters 
############################################# 
library (bindata) 
n=50 
p1=0.1 
University of Ghana http://ugspace.ug.edu.gh
 117  
 
p2=0.1 
p3=0.5 
rho=0.2 
 
cmdata<-rmvbin(n, c(p1,p2,p3), bincorr=(1-rho)*diag(3)+rho) 
cmdata 
dfcmdata<-data.frame(cmdata) 
 
colnames(dfcmdata)[1]<-"Y1" 
colnames(dfcmdata)[2]<-"Y2" 
colnames(dfcmdata)[3]<-"X" 
 
N1 <- length("Y1") 
N2 <- length("Y2") 
 
Y1<-dfcmdata[,1] 
Y2<-dfcmdata[,2] 
X<-dfcmdata[,3] 
 
for (i in 1:length(dfcmdata)) 
{ 
log.likelihood<- -sum(log(1+(X*2.81)))-
sum(log(1+(X*1.4)))+sum((Y1*X*1.4))+sum(Y1*log(1+(X*2.81)))-
sum(Y1*log(1+(X*5.9)))+sum(Y2*X*2.81)+sum((Y1*Y2*X*5.9)-sum(Y1*Y2*X*2.81)) 
 
} 
log.likelihood 
 
####################################################################### 
With the regressive Model:2In(loglikelihood(proposed model)-loglikelihood(regressive) 
####################################################################### 
 
test.stat<-2*((log.likelihood)-(logLik(rmodel))) 
 
################################################################################ 
With the Conditional Model:2In(loglikelihood(proposed model)-loglikelihood(conditional model1)) 
################################################################################ 
 
test.stat<- 2*((log.likelihood)-(logLik(c1model))) 
 
################################################################################ 
With the Conditional Model:2In(loglikelihood(proposed model)-loglikelihood(conditional model2)) 
################################################################################ 
 
test.stat<- 2*((log.likelihood)-(logLik(c2model))) 
 
 
 
 
 
 
University of Ghana http://ugspace.ug.edu.gh
 118  
 
 
B2:Codes used for real-life-application  
################################## 
R CODES :REAL-LIFE-APPLICATION  
################################## 
 
N00<-4132 
N01<-707 
N10<-1442 
N11<-1559 
 
N00/(N00+N01) 
N01/(N00+N01) 
 
######################### 
Transition Probalilities matrix 
######################## 
 
Transitionprob<- 
matrix(c(N00/(N00+N01),N01/(N00+N01),N10/(N10+N11),N11/(N10+N11)),nrow=2,byrow=TRUE)  
Transitionprob 
 
################################ 
Labelling the Transitional probabilities 
############################### 
dimnames(Transitionprob) <- list(c("0","1"),c("0","1"))   
Transitionprob 
 
##################### 
Calculating the odds ratio 
##################### 
Transitionprob.test <- prop.test(Transitionprob)  
Transitionprob.test$estimate  
 
odds <- Transitionprob.test$estimate/(1-Transitionprob.test$estimate)  
odds 
 
odds[1]/odds[2] 
oddsratio <-(Transitionprob[1,1]*Transitionprob[2,2])/(Transitionprob[2,1]*Transitionprob[1,2])  
 
oddsratio 
 
#################### 
Odds ratio:Alternatively 
#################### 
myoddsratio<-(N00av*N11av)/(N01av*N10av) 
myoddsratio 
 
 
 
University of Ghana http://ugspace.ug.edu.gh
 119  
 
 
################################################## 
Calculating Marshall–Olkin correlation estimator (Y1 and Y2) 
################################################## 
MO.cor.num<-(N11/(N10+N11))*(N00/(N00+N01))-(N10/(N10+N11))*(N01/(N01+N00)) 
  
P00<-(N00/(N00+N01)) 
P01<-(N01/(N01+N00)) 
P10<-(N10/(N10+N11)) 
P11<-(N11/(N10+N11)) 
P1<-P00+P01 
P1 
P2<-P10+P11 
P3<-P00+P10 
P4<-P01+P11 
 
M0.cor.den<-sqrt(P1*P2*P3*P4) 
M0.cor.den 
 
MO.cor<-MO.cor.num/M0.cor.den 
MO.cor 
 
######################################################## 
Checking dependence betwwen Y1 and Y2 (if 0,then no dependence) 
######################################################## 
 
n<-(P00*P11)/(P01*P10) 
dependence.value<-log(n) 
dependence.value 
 
############### 
Conditional Y1=0  
############### 
library(foreign) 
file.choose() 
 
dataset = read.spss("C:\\Users\\user\\Desktop\\Dataset.Final\\B01.FINAL.FINAL..22.05.16.sav" 
 
,to.data.frame=TRUE) 
View(dataset) 
 
library(survival) 
glm.out1=glm(formula= es ~ age+edu+sex,family=binomial(link=logit),data=dataset)  
summary(glm.out1)  
 
logLik(glm.out1) 
 
############### 
Conditional Y1=1  
############### 
library(foreign) 
University of Ghana http://ugspace.ug.edu.gh
 120  
 
file.choose() 
 
dataset = read.spss("C:\\Users\\user\\Desktop\\Dataset.Final\\B11.FINAL.FINAL.22.05.16.INSTsav.sav" 
, to.data.frame=TRUE) 
View(dataset) 
 
 
library(survival) 
glm.out2=glm(formula= es ~ age+edu+sex,family=binomial(link=logit),data=dataset)  
summary(glm.out2)  
 
logLik(glm.out2) 
 
################### 
Marginal the GEE way 
################### 
 
library(foreign) 
file.choose() 
 
dataset = read.spss("C:\\Users\\user\\Desktop\\Dataset.Final\\B1.FINALFINAL.22.05.16..sav" 
 
,to.data.frame=TRUE) 
View(dataset) 
 
 
library(MASS) 
library(geepack) 
geeglm.out<-geeglm(formula=es~age+edu+sex,data=dataset,family=binomial,id=id) 
summary(geeglm.out) 
 
 
logLik(geeglm.out)...spss-quasi-likelihood:8755.65 
 
######### 
B1 estimates 
######### 
library(maxLik) 
library(foreign) 
file.choose() 
 
dataset = read.spss("C:\\Users\\user\\Desktop\\Dataset.Final\\proposed.model.22.05.16.sav" 
,to.data.frame=TRUE) 
View(dataset) 
colnames(dataset)[1]<-"Y1" 
colnames(dataset)[2]<-"Y2" 
colnames(dataset)[3]<-"X1" 
colnames(dataset)[5]<-"X2" 
colnames(dataset)[7]<-"X3" 
 
head(dataset) 
University of Ghana http://ugspace.ug.edu.gh
 121  
 
log.likelihood <- 0 
log.likelihood <- function(param) 
{ 
B1.a<-param[1] 
B1.b<-param[2] 
B1.c<-param[3] 
N <- length(dataset) 
N1 <- length("Y1") 
N2 <- length("Y2") 
Y1<-dataset[,1] 
Y2<-dataset[,2] 
X1<-dataset[,3] 
X2<-dataset[,5] 
X3<-dataset[,7] 
 
for (i in 1:length(dataset)) 
{ 
log.likelihood<- -
sum((X1+X2+X3)*exp((X1*B1.a)+(X2*B1.b)+(X3*B1.c)))/(1+exp((X1*B1.a)+(X2*B1.b)+(X3*B1.c)))
+sum((Y1*(X1+X2+X3))) 
 
result<-return(log.likelihood) 
} 
} 
 
mleNM <- maxLik(logLik = log.likelihood , start =c(B1.a=11.43,B1.b=-13.99,B1.c=10.74), method = 
"NM") 
 
summary(mleNM) 
 
 mle <- maxLik(logLik =log.likelihood , start =c(B1.a=-10.32,B1.b=14.3,B1.c=10.02)) 
 
summary(mle) 
 
############ 
B11 estimates 
############ 
 
library(maxLik) 
library(foreign) 
file.choose() 
 
dataset = read.spss("C:\\Users\\user\\Desktop\\Dataset.Final\\proposed.model.B11.NEW.sav" 
,to.data.frame=TRUE) 
View(dataset) 
colnames(dataset)[1]<-"Y1" 
colnames(dataset)[2]<-"Y2" 
colnames(dataset)[3]<-"X1" 
colnames(dataset)[5]<-"X2" 
colnames(dataset)[7]<-"X3" 
head(dataset) 
University of Ghana http://ugspace.ug.edu.gh
 122  
 
log.likelihood <- 0 
 
library(formatR) 
tidy_source() 
 
log.likelihood <- function(param) 
{ 
B11.a<-param[1] 
B11.b<-param[2] 
B11.c<-param[3] 
N <- length(dataset) 
N1 <- length("Y1") 
N2 <- length("Y2") 
Y1<-dataset[,1] 
Y2<-dataset[,2] 
X1<-dataset[,3] 
X2<-dataset[,5] 
X3<-dataset[,7] 
 
head(dataset) 
for (i in 1:length(dataset)) 
{ 
log.likelihood<- -
sum((X1+X2+X3)*exp((X1*B11.a)+(X2*B11.b)+(X3*B11.c)))/(1+exp((X1*B11.a)+(X2*B11.b)+(X3*
B11.c)))+sum((Y1*Y2*(X1+X2+X3))) 
 
result<-return(log.likelihood) 
} 
} 
  
mleNM <- maxLik(logLik = log.likelihood , start =c(B01.a=-14.11,B01.b=11.31,B01.c=10.02), method = 
"NM") 
summary(mleNM) 
 
mle <- maxLik(logLik =log.likelihood , start = c(B11.a=6.32,B11.b=10.3,B11.c=12.02)) 
 
summary(mle) 
 
########### 
B01 estimates 
########### 
 
library(maxLik) 
library(foreign) 
file.choose() 
 
dataset = read.spss("C:\\Users\\user\\Desktop\\Dataset.Final\\proposed.model.B01.NEW.sav" 
,to.data.frame=TRUE) 
View(dataset) 
 
colnames(dataset)[1]<-"Y1" 
University of Ghana http://ugspace.ug.edu.gh
 123  
 
colnames(dataset)[2]<-"Y2" 
colnames(dataset)[3]<-"X1" 
colnames(dataset)[5]<-"X2" 
colnames(dataset)[7]<-"X3" 
 
head(dataset) 
log.likelihood <- 0 
 
library(formatR) 
tidy_source() 
 
log.likelihood <- function(param) 
{ 
B01.a<-param[1] 
B01.b<-param[2] 
B01.c<-param[3] 
N <- length(dataset) 
N1 <- length("Y1") 
N2 <- length("Y2") 
Y1<-dataset[,1] 
Y2<-dataset[,2] 
X1<-dataset[,3] 
X2<-dataset[,5] 
X3<-dataset[,7] 
 
for (i in 1:length(dataset)) 
{ 
log.likelihood<- -
sum((X1+X2+X3)*exp((X1*B01.a)+(X2*B01.b)+(X3*B01.c)))/(1+exp((X1*B01.a)+(X2*B01.b)+(X3*
B01.c)))+sum(Y1*(X1+X2+X3)*exp((X1*B01.a)+(X2*B01.b)+(X3*B01.c)))/(1+exp((X1*B01.a)+(X2*
B01.b)+(X3*B01.c)))+sum((Y2*(X1+X2+X3)))+sum((Y1*Y2*(X1+X2+X3))) 
 
result<-return(log.likelihood) 
} 
} 
  
mle <- maxLik(logLik =log.likelihood , start = c(B01.a=-10.2,B01.b=11.3,B01.c=10.02)) 
 
summary(mle) 
 
mleNM <- maxLik(logLik = log.likelihood , start =c(B01.a=-11.1,B01.b=11.31,B01.c=10.02), method = 
"NM") 
summary(mleNM) 
 
########################################################################## 
Conditional Model.1:2In(loglikelihood(proposed model)-loglikelihood(conditional model1)) 
########################################################################## 
 
test.stat<-2*((-15454010)-(logLik(glm.out1))) 
 
 
University of Ghana http://ugspace.ug.edu.gh
 124  
 
 
########################################################################## 
Conditional Model.2:2In(loglikelihood(proposed model)-loglikelihood(conditional model1)) 
########################################################################## 
 
test.stat<-2*((3860420)-(logLik(glm.out2))) 
 
######################################################################### 
 the Marginal Model.2:2In(loglikelihood(proposed model)-loglikelihood(marginal.model)) 
######################################################################### 
 
test.stat<-2*((23652232)-(8755.65)) 
 
 
##################### 
Intercept for B1 estimates 
##################### 
library(maxLik) 
library(foreign) 
file.choose() 
 
dataset = read.spss("C:\\Users\\user\\Desktop\\Dataset.Final\\proposed.model.22.05.16.sav" 
,to.data.frame=TRUE) 
View(dataset) 
 
colnames(dataset)[1]<-"Y1" 
colnames(dataset)[2]<-"Y2" 
colnames(dataset)[3]<-"X1" 
colnames(dataset)[5]<-"X2" 
colnames(dataset)[7]<-"X3" 
 
head(dataset) 
log.likelihood <- 0 
count=0 
 
N <- length(dataset) 
N1 <- length("Y1") 
N2 <- length("Y2") 
Y1<-dataset[,1] 
Y2<-dataset[,2] 
X1<-dataset[,3] 
X2<-dataset[,5] 
X3<-dataset[,7] 
 
while (count<=7840) 
{ 
log.likelihood<- -
sum((X1+X2+X3)*exp((X1*0)+(X2*0)+(X3*0)))/(1+exp((X1*0)+(X2*0)+(X3*0)))+sum((Y1*(X1*0)+(
X2*0)+(X3*0))) 
log.likelihood[count]=log.likelihood[count]+0 
count=count+1 
University of Ghana http://ugspace.ug.edu.gh
 125  
 
} 
 
require(psych) 
describeBy(log.likelihood/7840) 
 
 
###################### 
Intercept for B11 estimates 
###################### 
 
library(maxLik) 
library(foreign) 
file.choose() 
 
dataset = read.spss("C:\\Users\\user\\Desktop\\Dataset.Final\\proposed.model.B11.NEW.sav" 
,to.data.frame=TRUE) 
View(dataset) 
 
colnames(dataset)[1]<-"Y1" 
colnames(dataset)[2]<-"Y2" 
colnames(dataset)[3]<-"X1" 
colnames(dataset)[5]<-"X2" 
colnames(dataset)[7]<-"X3" 
 
head(dataset) 
log.likelihood <- 0 
count=0 
 
N <- length(dataset) 
N1 <- length("Y1") 
N2 <- length("Y2") 
Y1<-dataset[,1] 
Y2<-dataset[,2] 
X1<-dataset[,3] 
X2<-dataset[,5] 
X3<-dataset[,7] 
 
while (count<=3001) 
{ 
log.likelihood<- -
sum((X1+X2+X3)*exp((X1*0)+(X2*0)+(X3*0)))/(1+exp((X1*0)+(X2*0)+(X3*0)))+sum((Y1*Y2*(X1
+X2+X3))) 
 
log.likelihood[count]=log.likelihood[count]+0 
count=count+1 
 
} 
 
require(psych) 
describeBy(log.likelihood/3001) 
 
University of Ghana http://ugspace.ug.edu.gh
 126  
 
###################### 
Intercept for B01 estimates 
###################### 
 
library(maxLik) 
library(foreign) 
file.choose() 
 
dataset = read.spss("C:\\Users\\user\\Desktop\\Dataset.Final\\proposed.model.B01.NEW.sav" 
,to.data.frame=TRUE) 
View(dataset) 
 
head(dataset) 
log.likelihood <- 0 
count=0 
 
colnames(dataset)[1]<-"Y1" 
colnames(dataset)[2]<-"Y2" 
colnames(dataset)[3]<-"X1" 
colnames(dataset)[5]<-"X2" 
colnames(dataset)[7]<-"X3" 
 
head(dataset) 
log.likelihood <- 0 
 
N <- length(dataset) 
N1 <- length("Y1") 
N2 <- length("Y2") 
Y1<-dataset[,1] 
Y2<-dataset[,2] 
X1<-dataset[,3] 
X2<-dataset[,5] 
X3<-dataset[,7] 
 
while (count<=4839) 
{ 
log.likelihood<- -
sum((X1+X2+X3)*exp((X1*0)+(X2*0)+(X3*0)))/(1+exp((X1*0)+(X2*0)+(X3*0))+sum(Y1*(X1+X2+
X3)*exp((X1*0)+(X2*0)+(X3*0)))/(1+exp((X1*0)+(X2*0)+(X3*0)))+sum((Y2*(X1+X2+X3)))+sum((
Y1*Y2*(X1+X2+X3))) 
 
log.likelihood[count]=log.likelihood[count]+0 
count=count+1 
} 
} 
require(psych) 
describeBy(log.likelihood/4839) 
 
 
University of Ghana http://ugspace.ug.edu.gh