Iddi et al. Brain Inf.             (2019) 6:6  
https://doi.org/10.1186/s40708-019-0099-0 Brain Informatics
RESEARCH Open Access
Predicting the course of Alzheimer’s 
progression
Samuel Iddi1,3,4 , Dan Li1, Paul S. Aisen1, Michael S. Rafii1, Wesley K. Thompson2, Michael C. Donohue1*  
and for the Alzheimer’s Disease Neuroimaging Initiative
Abstract 
Alzheimer’s disease is the most common neurodegenerative disease and is characterized by the accumulation of 
amyloid-beta peptides leading to the formation of plaques and tau protein tangles in brain. These neuropathological 
features precede cognitive impairment and Alzheimer’s dementia by many years. To better understand and predict 
the course of disease from early-stage asymptomatic to late-stage dementia, it is critical to study the patterns of 
progression of multiple markers. In particular, we aim to predict the likely future course of progression for individuals 
given only a single observation of their markers. Improved individual-level prediction may lead to improved clinical 
care and clinical trials. We propose a two-stage approach to modeling and predicting measures of cognition, func-
tion, brain imaging, fluid biomarkers, and diagnosis of individuals using multiple domains simultaneously. In the first 
stage, joint (or multivariate) mixed-effects models are used to simultaneously model multiple markers over time. 
In the second stage, random forests are used to predict categorical diagnoses (cognitively normal, mild cognitive 
impairment, or dementia) from predictions of continuous markers based on the first-stage model. The combination 
of the two models allows one to leverage their key strengths in order to obtain improved accuracy. We characterize 
the predictive accuracy of this two-stage approach using data from the Alzheimer’s Disease Neuroimaging Initiative. 
The two-stage approach using a single joint mixed-effects model for all continuous outcomes yields better diagnos-
tic classification accuracy compared to using separate univariate mixed-effects models for each of the continuous 
outcomes. Overall prediction accuracy above 80% was achieved over a period of 2.5 years. The results further indicate 
that overall accuracy is improved when markers from multiple assessment domains, such as cognition, function, and 
brain imaging, are used in the prediction algorithm as compared to the use of markers from a single domain only.
Keywords: Alzheimer’s disease, Biomakers, Classification Clinical diagnosis, Disease trajectories, Joint mixed-effects 
models, Latent time shift, Model averaging, Multi-level Bayesian models, Multi-cohort longitudinal data, Predictions, 
Random forest
1 Introduction researches in the field contend that preventative strate-
Prediction of future Alzheimer’s disease (AD)-related gies initiated prior to the appearance of advanced symp-
progression is extremely valuable in clinical practice toms are most likely to be successful  [2–4]. Therefore 
and in medical research. In clinical practice, the ability identifying candidates for therapies while they are still 
to accurately predict the diagnosis of a patient can help cognitively normal (CN) or mildly cognitively impaired 
physicians make more informed clinical decisions on (MCI) is key for clinical trials, and eventually clinical 
treatment strategies  [1]. Clinical trials are more likely practice.
to be successful if the individuals selected for the trials The pathology of AD is characterized by the accumu-
are those most likely to benefit from the therapy. Many lation of amyloid plaques and neurofibrillary tangles in 
the brain beginning as early as middle age. The amyloid 
*Correspondence:  mdonohue@usc.edu hypothesis posits that plaques caused by the gradual 
1 Alzheimer’s Therapeutic Research Institute, Keck School of Medicine, buildup of beta-amyloid ( Aβ ) peptides damage brain 
University of Southern California, San Diego, USA
Full list of author information is available at the end of the article regions responsible for cognition thereby leading to 
© The Author(s) 2019. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License 
(http://creati veco mmons. org/licens es/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, 
provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, 
and indicate if changes were made.
Iddi et al. Brain Inf.             (2019) 6:6 Page 2 of 18
impairment. Recent studies have shown that the pathol- only a single assessment (i.e., “baseline”). This empha-
ogy of the disease occurs several years before the onset of sis on subject-level prediction from a single timepoint 
clinical symptoms, making the disease difficult to detect is distinct from much of the literature which focuses on 
at an early stage  [5, 6]. In addition, prediction of the group-level prediction and the relative importance of 
future diagnosis of an individual (CN, MCI, or dementia) various predictors. In the first stage, we model continu-
is very challenging due to high subjectivity and individ- ous disease markers using joint mixed-effects models.
ual-level variability in cognitive assessments and levels of In the first stage, the joint mixed-effect model allows 
biomarkers, which have typically been used for staging of the simultaneous modeling and prediction of multiple 
AD. The assessment of an individual’s current diagnosis modalities such as cognitive and functional assessments, 
can vary from one clinician to the next, or from one day brain imaging, and biofluid assays with fixed effects for 
to the next. covariates like age, sex, and genetic risk. Joint models 
Classification and prediction based on expert knowl- have the advantage of modeling the correlation among 
edge, machine learning algorithms  [7, 8], regression- outcomes to improve prediction and precision of esti-
based prediction models [9, 10] and some combinations mates [19, 20].
of these  [11] have been proposed. Beheshti et al[12] In the second stage of prediction, a random forest algo-
recently developed a computer-aided diagnosis system rithm is used to categorize the panel of predicted contin-
to predict conversion from MCI to AD using magnetic uous markers into a diagnosis of CN, MCI, or dementia. 
resonance imaging (MRI) data. Zheng et al[13] sur- Random forests combine many decision trees created 
veyed other automated techniques for classifying and from random sampling of the data and predictors  [21]. 
predicting diagnosis with reasonable reliability using Each decision tree recursively partitions the predictors 
data from different imaging modalities. The reliability of to classify individuals into one of the three diagnoses. 
these approaches is often assessed by the sensitivity and While an alternative approach might view diagnosis as a 
specificity of the methods, accuracy rate, and absolute random variable correlated with other disease markers, 
error rates, among other criteria. Approaches with high we view diagnosis as a deterministic categorization of the 
accuracy rates and precision are desirable. The diagno- clinical presentation of each individual. That is, diagnosis 
sis of CN, MCI, or mild dementia by expert clinicians should be algorithmically determined for given presenta-
has traditionally relied on cognitive assessments such as tion of the continuous markers. The random forest model 
the Mini-Mental State Examination (MMSE)  [14], Logi- gives us an estimate of this algorithmic categorization. 
cal Memory  [15] and structured clinical assessments Overall performance is assessed using an independent 
such as the Clinical Dementia Rating (CDR)  [16]. How- validation set.
ever, including multiple domains might help explain and 
more accurately predict the varying rates of decline that 2  Data description
are typical. For example, it is common to find individuals The two-stage approach is applied to data from the Alz-
who present with symptoms consistent with MCI or mild heimer’s Disease Neuroimaging Initiative (ADNI). ADNI 
AD dementia, but who lack biomarker evidence of AD is a prospective observational cohort study, which began 
pathology. Such an individual might have other pathol- in 2004 and continues to this day. The study is carried out 
ogy that will exhibit a different rate of progression. Going across 55 research centers in the USA and Canada. Over 
beyond the cognitive domain to multi-domain analysis is 1900 volunteers with normal cognition or impairment 
therefore appealing. Longitudinal cognitive assessments consistent with MCI or AD dementia were recruited for 
combined with neuroimaging and biomarkers can more this study. The first cohort, referred to as ADNI-1, con-
easily facilitate diagnosis and increase prediction accu- sists of 800 individuals: 200 CN, 400 with late MCI, and 
racy  [3, 17]. While multi-domain analyses are interest- 200 with mild dementia. ADNI-GO, the second cohort, 
ing, intuitive and potentially more informative, they have added about 200 additional individuals with early MCI. 
been relatively uncommon due to modeling challenges. In ADNI-2, more participants at different stages of AD 
The Alzheimer’s Disease Prediction Of Longitudinal were recruited to monitor AD progression. ADNI-3 is 
Evolution (TADPOLE) Challenge [18] is a challenge that presently enrolling additional individuals with CN, MCI, 
compares performance of algorithms at making future and dementia. At each new phase, prior cohorts were 
predictions of AD disease markers and clinical diagno- invited back for continued follow-up, with the exception 
sis using historical data form the Alzheimer’s Disease of individuals enrolled with dementia, who were followed 
Neuroimaging Initiative (ADNI) study. Motivated by this for a maximum of 2  years. Some ADNI-1 individuals 
challenge, we aim to propose a two-stage approach that have now been followed in excess of 10 years. Key objec-
can reliably predict an individual’s future course of dis- tives of ADNI are to validate the use of markers of AD for 
ease, including transition to MCI and dementia, using diagnosis and clinical trials, and to study rates of change 
Iddi et al. Brain Inf.             (2019) 6:6 Page 3 of 18
in cognitive and functional assessments, brain imaging individual at time j. The joint mixed-effect model is 
and a number of biomarkers. The inclusion and exclusion defined
criteria, schedule of assessments, and other details can be 
found at http://adni.loni.usc.edu/. We focus on the fol- yijk = ′xijkβk + α0ik + α1ik tij + εijk (1)
lowing assessments: Alzheimer’s Disease Assessment—
Cognitive 13-item scale (ADAS13), Clinical Dementia where βk; k = 1, 2, . . . , p , are sets of fixed-effect regres-
Rating—Sum of Boxes (CDRSB), Mini-Mental State sion coefficients, α0ik and α1ik are outcome- and individ-
Examination (MMSE), Montreal Cognitive Assessment ual-specific random intercepts and slopes, respectively. 
(MOCA), Rey Auditory Verbal Learning Test Immediate The random intercepts and slopes are assumed to fol-
(RAVLT Immediate), Everyday Cognition (ECog)—total low a multivariate normal distribution with mean vec-
by participant (ECogPtTotal) and study partner (ECogSP- tor, 0 and variance–covariance matrix, D for the entire 
Total) and Functional Assessment Questionnaire (FAQ). 2p-dimensional vector of random effects for each sub-
Brain imaging measures include volumetric Magnetic ject. The error term follows ε 2ijk ∼ N (0, σk ) . The assumed 
Resonance Imaging (MRI) summaries of entorhinal cor- homogeneity is over time of the error term for a given 
tical thickness, and ventricular and hippocampal volume outcome and across all subjects. We assume that the 
normalized to intracranial volume (ICV); and fluorode- random components αik and εijk for k = 1, 2, . . . , p are 
oxyglucose positron emission tomography (FDG-PET) independent. The random effects allow the model to 
summaries of glucose metabolism. Baseline diagnosis, accommodate both the temporal correlation and cor-
age, gender, and carriage of APOE e4 allele were included relation among the markers. A special case of this joint 
as covariates. model is the independent mixed-effects model (IMM), 
We also focus on a second set of analyses among indi- which does not explicitly model the correlation among 
viduals where beta-amyloid data were available. The outcomes. This is similar to fitting separate mixed-effects 
buildup of beta-amyloid in the brain and in cerebrospinal model per outcome.
fluid (CSF) is known to be strongly involved in AD [22, We also consider the latent time joint mixed-effects 
23]. For some patients in the ADNI study, florbetapir PET model (LTJMM) [25]:
scans or CSF Aβ42 was acquired to detect amyloid lev-
′
els in brain. We classified individuals as having elevated yijk = xijkβk + γk(tij + δi)+ α0ik + α1ik tij + εijk .
amyloid (“amyloid positive”) if florbetapir PET standard- (2)
ized uptake value ratio (SUVR) was above 1.10  [22, 24] The model is similar to  1, but introduces individual-
or if CSF Aβ was less than 909.6 pg/ml; and as amyloid specific latent time shifts, δi , representing “long-term” 
negative otherwise. The CSF Aβ cutoff was determined disease time. The model also includes outcome-specific 
so that it yielded the same proportion of amyloid posi- slopes γk > 0 with respect to δi . The δi are assumed 
tives as the florbetapir cutoff. Amyloid elevation status to be normally distributed with zero mean and vari-
2
was included as a predictor in this second set of analysis. ance, σδ  . The random components, δi , αik and εijk for 
k = 1, 2, . . . , p are also assumed to be independent. An 
extension of this model to allow heterogeneous latent-
3 M ethodology time (i.e., the variability of the latent-time is made to vary 
We propose a two-stage approach for prediction of con- across individuals) is described in [26].
tinuous disease markers and categorical diagnosis. For Estimation of the joint models is by Markov Chain 
the first stage, we propose the traditional joint, or mul- Monte Carlo (MCMC). Posterior draws are obtained 
tivariate outcome, mixed-effects model; but we also from the posterior distributions of the joint models given 
consider two alternative approaches. We also consider respectively by:
a latent-time joint mixed-effects model and a Bayesian 
model averaging combining posterior estimates of the P(θ |Y) ∝ P(Y|θ)P(θ |τ )
aforementioned joint models. In the second stage, the P(βk ,αi,k |yijk) ∝ P(yijk |βk ,αi,k ,D, σ 2k )P(βk)
predicted markers are submitted to a random forest to × P(α |D)P(D)P(σ 2)
further predict diagnosis. We next describe the first-stage ik k
model in greater detail. where the variance–covariance matrix, D is decomposed 
as D = VV . For numerical stability, the Cholesky fac-
′
3.1 M ethods for predicting continuous markers torization is applied to the correlation matrix,  = LL  , 
Suppose y  represents k outcomes (k = 1, . . . , p) where L is a lower triangular matrix. For the latent time ijk
observed at time t (j = 1, . . . , q ) for each individual, joint mixed-effects model, θ = (β
′
ij i k
,αi,k , γk , δi)  and 
2 ′
i (i = 1, . . . , n) , and  is a set of covariates for the ith τ = (D, σk )  . The component, V is a diagonal matrix of xijk
Iddi et al. Brain Inf.             (2019) 6:6 Page 4 of 18
standard deviations (square-root of diagonal entries of in the test dataset. However, in fitting these models to 
D ). Furthermore, the random component, αik is stand- the training data, we propose to include baseline data 
ardized to z ∼ N (0, I) , where I is the identity matrix and for subjects in the test data to allow for the estimation of 
the random effects are then calculated as VLz . Prior dis- random effects for these subject. The estimated outcome-
tributions are placed on the hyperparameters. A weakly specific random intercepts and slopes for each subject 
informative normal prior, N (0, 102) is placed on βk , and a are required to make the subject-level predictions. The 
weakly informative half-Cauchy prior, Cauchy(0, 2.5) , is resulting follow-up predictions are then used as inputs 
assumed for the components of V, σk , γk and σδ . Finally, in the random forest for the next stage of algorithmically 
the LKJ prior is placed on the Cholesky factors of  [27]. predicting diagnosis status.
MCMC sampling is done using the R software package, 
RStan  [28]. We used 5000 iterations, and the first 2500 3.2 M ethod for predicting clinical diagnosis
warmup iterations are discarded. Two MCMC chains The random forest algorithm is an ensemble learning 
were used and thinned by a factor of 5. Predictions of method for classification and regression. It operates by 
biomarkers and their corresponding credible intervals generating several classification or regression trees and 
were based on posterior draws. We apply Bayesian model aggregating them. Each tree in the forest is constructed 
averaging to the multivariate mixed models for the using bootstrap samples of the data. The algorithm, imple-
selected continuous biomarkers [29, 30]. The predictions mented in the R package “randomForest” [30], is fitted to 
of future values of biomarkers and the corresponding the training dataset using 100 trees. In particular, diagnosis 
credible intervals are obtained after combining all poste- which was re-evaluated at every visit by clinicians was used 
rior prediction estimates of all the models (model averag- as the target feature for the random forest, and predicted 
ing). Suppose y∗ijk is the prediction of outcome k for follow-up continuous markers and baseline predictors 
individual i at future time j. The posterior distribution of of subjects as input features. Observation times are also 
the prediction given the data, D is the average of poste- included as a continuous predictor. A number of individu-
rior distribution of the models weighted by the posterior als had incomplete assessments at some study visits, which 
model probabilities and is given by the random forest algorithm is not able to accommodate. 
To avoid discarding these incomplete visits entirely when 
S
∑ fitting the random forest, we apply an imputation method, 
P(y∗ijk |D) = P(y
∗
ijk |Ms,D)P(Ms|D). the “MissForest” algorithm [32], to impute the missing val-
s=1 ues. This algorithm, implemented in the R package “miss-
where M , s = 1, 2, . . . , S represents the models. The pos- Forest”, imputes missing values for mixed-type data (e.g., s
terior distribution of the models is expressed as continuous and categorical) using a nonparametric ran-
dom forest methodology. The method can flexibly accom-
P(Ms|D) ∝ P(D|Ms)P(Ms) modate mixed-type outcomes, complex interactions and 
∫
where P(D|Ms) = P(D|θ s,Ms)P(θ s|Ms)dθ s and θ s is nonlinear relationships among variables. In addition, it 
the vector of parameters under model s. The predicted does not require the specification of a parametric model or 
mean and variance are obtained from the posterior dis- distributional assumptions. To determine variables which 
tribution of the predictions. are important for predicting the response, we use the 
The JMM, and LTJMM were fit to training data variable importance plot, which depicts the influence of 
described in Sect. 4. To demonstrate the benefit of joint each variable characterized by the mean decrease in node 
modeling, single or independent mixed-effects (IMM) impurity (Gini Index [21]).
model were fit to the data for comparison. For the JMM 
and IMM models, age, gender, APOEe4, and baseline 3.3  Model performance metrics
diagnosis were included as covariates. The latent-time To evaluate the quality of the predictions of the continu-
models did not include baseline diagnosis since includ- ous markers, we use two performance metrics. The first 
ing this would make the model parameters uninterpret- metric, the mean absolute error (MAE), is calculated as
able due to the presence of the latent-time component 
(see  [25] for details). Two common model selection cri- N1 ∑
teria are applied, the widely applicable information crite- MAE = |P̂i − Pi|,N
rion (WAIC) or the leave-one-out information criterion i=1
(LOOIC)  [31]. Models with lower values of WAIC and where N is the observation count, P̂i represent the pre-
LOOIC are preferred. dicted or forecasted future values, and Pi is the observed 
The models described above are fitted to the training value of the marker for an individual i in the test data. 
dataset in order to make follow-up prediction for subjects The second metric, which takes confidence interval 
Iddi et al. Brain Inf.             (2019) 6:6 Page 5 of 18
widths into account, is the weighted error score (WES). About 44.9% are females, and 55.1% are males. All fol-
It is the weighted sum of the absolute difference between low-up data on ADNI-1 and ADNI-GO participants who 
the predicted and actual values for each continuous did not continue into the ADNI-2 phase, form part of the 
marker in the test data at each time point. That is, training dataset. In addition, baseline data from individu-
als in ADNI-2 are included in the training data to allow 
∑N
i=1 Ĉi|P̂i − Pi| estimation of their random effects for individual-specific WES = ,
∑N Ĉ predictions. The training data consist of 273 ADs, 154 i=1 i CNs and 414 MCIs. The validation dataset consisted of 
where the weights, Ĉ  , is the inverse of the width of the currently available longitudinal data for ADNI-2 (i.e., the i
confidence interval of predicted estimates for each indi- ADNI-1 and ADNI-GO who continued into ADNI-2, 
vidual. High values of MAE and WES denote poor pre- and additional newly enrolled subjects). This validation 
dictive performance of the model. data consist of 7.7% ADs, 41.2% CNs and 51.1% MCIs. 
The diagnoses provided by site clinicians is used as Figure 7a, b, in “Appendix”, shows the number of individ-
the ‘gold standard’ in assessing the accuracy of the pre- uals at each visit in the training and test sets, respectively. 
dictions of diagnosis from the random forest algorithm. To impose a minimum standard for visit completion, 
Performance is assessed on the basis of the overall accu- time points where CDRSB was not observed are omit-
racy and balanced classification accuracy (BCA). Overall ted from the analysis dataset. As expected, the number 
accuracy is defined as the percentage of correct predic- of observations decreases over time from baseline due to 
tions out of all the predictions made. This metric tends attrition and administrative censoring. Summary meas-
to work better for data with balanced classes (e.g., equal ures of baseline outcomes for each diagnosis group are 
number of CN, MCI, or dementia) but can provide a mis- presented in Table 1.
leading assessment of performance for data with imbal- Figure  8a depicts the individual observed trajectories 
anced classes. To account for possible class imbalance, per outcome and also shows the length of years of fol-
we also use the overall BCA. The balanced classification low-up. Figure 8b shows the individual trajectories after 
accuracy for class, ℓ = 1, 2, . . . , L is obtained from missing values have been imputed. It can be seen that the 
imputation algorithm appears to generate plausible val-
[ ]
1 TPℓ TNℓ
BCA = + , ues of missing data. Before fitting the models to the data, ℓ
2 TPℓ + FNℓ TNℓ + FPℓ the original values of the outcomes were transformed 
where  is the number of true positives,  is the into percentiles using a weighted empirical cumula-TPℓ FNℓ
number of false negatives, TNℓ is the number of true 
tive distribution function so that all outcomes are on a 
negatives, and  is the number of false positives. That common scale. The weights were constructed using the FPℓ
is, for each class,  , TP is the number of cases that are inverse of the proportion of disease category for each ℓ
correctly predicted by the model and  is the number outcome. The predicted values on the transformed scale TNℓ
of cases in class, ℓ , which are incorrectly classified into are then back transformed into the original scale.
any of the other classes. Similarly,  for class,  repre- Next, we apply the two-stage approach to the data. Fig-TNℓ ℓ
sents the number of cases in the other classes correctly ure  1 shows a schematic diagram depicting the inputs 
labeled as belonging to class,  , and  is the number of and outputs at each modeling stage.ℓ FPℓ
cases which actually belong to the other classes but are 
wrongly classified to class, ℓ . These balanced accuracies 4.2  Stage 1
are aggregated to obtain the overall BCA score as follows: The joint mixed-effects models were trained on longi-
tudinal data from ADNI-1, ADNI-GO, and only base-
L
1 ∑ line data from ADNI-2. We then assessed the ability of 
BCA = BCA ℓ.
L the proposed methodology to accurately predict follow-
ℓ up observations of individuals in ADNI-2. Table 2 sum-
Higher value of overall accuracy or BCA is indicative of marizes WAIC and LOOIC. Based on these results, the 
good performance. JMM model seems to be the best fitting model, followed 
closely by the LTJMM model. Figure 2 shows the corre-
lations between random intercepts (above anti-diagonal) 
4  Application and model validation and random slopes (below anti-diagonal) from the JMM. 
4.1 D escriptive statistics and data preparation Cognitive outcomes share strong correlations [0.7–0.9) 
The ADNI data consist of 1737 individuals enrolled in with other cognitive measures except for Everyday Cog-
ADNI-1, ADNI-GO and ADNI-2, 19.7% of whom have nition (ECog) by participant. There are generally moder-
dementia, 30.1% are CN and 50.2% are MCI at baseline. ate correlations [0.5–0.7) among cognitive measures and 
Iddi et al. Brain Inf.             (2019) 6:6 Page 6 of 18
Table 1 Summary measures at baseline for raw and imputed data
Diagnosis category Outcomes Imputed data Raw data
n Mean SE n Mean SE
Dementia ADAS13 342 29.91 0.43 330 29.87 0.44
CDRSB 342 4.39 0.09 338 4.41 0.09
EcogPtTotal 342 1.91 0.02 144 1.90 0.05
EcogSPTotal 342 2.75 0.03 145 2.74 0.05
MMSE 342 23.22 0.11 338 23.22 0.11
MOCA 342 17.52 0.20 142 17.12 0.38
RAVLT immediate 342 22.81 0.41 335 22.85 0.41
FAQ 342 13.14 0.38 337 13.18 0.38
FDG 342 1.07 0.01 242 1.07 0.01
Hippocampus/ICV(× 100) 342 0.38 0.00 272 0.38 0.00
Ventricles/ICV 342 0.03 0.00 315 0.03 0.00
Entorhinal (mm) 342 2829.35 33.72 254 2819.26 42.54
MCI ADAS13 872 16.53 0.23 862 16.53 0.23
CDRSB 872 1.52 0.03 866 1.52 0.03
EcogPtTotal 872 1.84 0.01 468 1.79 0.02
EcogSPTotal 872 1.84 0.02 465 1.72 0.03
MMSE 872 27.59 0.06 866 27.59 0.06
MOCA 872 22.66 0.09 465 23.41 0.15
RAVLT immediate 872 34.24 0.36 866 34.24 0.36
FAQ 872 3.18 0.14 862 3.17 0.14
FDG 872 1.23 0.00 665 1.25 0.01
Hippocampus/ICV(× 100) 872 0.44 0.00 737 0.44 0.00
Ventricles/ICV 872 0.03 0.00 836 0.03 0.00
Entorhinal (mm) 872 3497.43 24.18 733 3497.38 27.67
CN ADAS13 523 9.24 0.19 520 9.24 0.19
CDRSB 523 0.04 0.01 520 0.04 0.01
EcogPtTotal 523 1.41 0.01 290 1.41 0.02
EcogSPTotal 523 1.22 0.01 288 1.21 0.02
FAQ 523 0.24 0.04 520 0.24 0.04
MMSE 523 29.06 0.05 520 29.06 0.05
MOCA 523 25.54 0.09 287 25.76 0.14
RAVLT immediate 523 44.66 0.43 518 44.67 0.43
FDG 523 1.31 0.00 391 1.31 0.01
Hippocampus/ICV(× 100) 523 0.49 0.00 471 0.49 0.00
Ventricles/ICV 523 0.02 0.00 494 0.02 0.00
Entorhinal (mm) 523 3828.36 26.57 468 3840.29 29.09
ADAS13 Alzheimer’s Disease Assessment Scale, CDRSB Clinical Dementia Rating—Sum of Boxes, EcogPtTotal everyday cognition participant, EcogSPTotal everyday 
cognition study partner, FAQ Functional Assessment Questionnaire, FDG FluoroDeoxyGlucose, MMSE Mini-Mental State Examination, MOCA Montreal Cognitive 
Assessment, RAVLT Rey Auditory Verbal Learning Test, ICV intracranial volume, CN control, MCI mild cognitive impairment, n number of observations, SE standard error
FDG-PET but weaker correlation [0.3–0.5) between cog- and imaging markers (JMMImage) to demonstrate 
nition and structural MRI measures. There are generally how these marker domains perform individually. Lon-
moderate correlations among slopes for structural MRI gitudinal predictions on the validation dataset were 
measures. obtained from these fitted models. Figure  3 shows the 
We also performed a Bayesian model averaging to observed data and predicted trajectories for five ran-
combine predictions from the JMM, LTJMM and IMM. domly selected individuals for each model (in Fig. 9, we 
Furthermore, the joint mixed-effect model was fitted show plots for subject #315 and subject # 4263 where 
to cognitive and function outcomes (JMMCognitive), the models are all in the same panel, and subjects are 
Iddi et al. Brain Inf.             (2019) 6:6 Page 7 of 18
Steps in the Two-Stage Approach
Stage 1 Stage 2
(Longitudinal models) (Random Forest Classification)
      Inputs                       Outputs       Inputs                       Output 
Baseline Continuous Baseline
characteristics: longitudinal characteristics
outcomes: 
Age, Gender, Diagnosis at
Baseline ADAS13, CDRSB all time points
diagnosis, Ecog, MMSE (CN, MCI, AD)
Years from MOCA, RAVLT Stage 1 Outputs
baseline FAQ, FDG,
Hippocampus,
Ventricles
Entorhinal
On training data On training data
WAIC, LOOIC OOB, Estimate
On test data On test data
MAE,WES BCA, Accuracy
Fig. 1 Schematic diagram showing the inputs and outputs of the two-stage approach
Table 2 Model selection criteria performance over time with the JMMs occasionally 
Model WAIC LOOIC out-performing the other models. The JMM that com-
bined both cognitive and imaging outcomes performed 
IMM 59,687.42 64,136.45 similar to the JMM from cognitive/functional outcomes 
LTJMM 57,953.63 62,535.86 (JMMCognitive) and JMM from imaging markers 
JMM 56,576.79 59,728.43 (JMMImage) in terms of weighted error scores. How-
WAIC widely applicable information criterion, LOOIC leave-one-out information ever, at time points where the models differed, JMM 
criterion, IMM independent mixed-effects model, JMM joint mixed-effects with both cognitive and imaging outcomes was gener-
model, LTJMM latent-time joint mixed-effects model ally more accurate than JMMCognitive and JMMIm-
age. The IMM performed worse for MCI and dementia 
in different panels for easy comparison). The graph subgroups.
shows that the models’ predicted profiles appear to dif-
fer only slightly. It is worth noting that, the predicted 4.3  Stage 2
values appear nonlinear because the models were fitted Table  3 shows the confusion matrix summarizing the 
to transformed values of the outcome and back trans- within-sample classification accuracy of the random 
formed to the original scale. forest using observed continuous markers and baseline 
We evaluated the performance of our model predic- predictors in the training set. Predictors in the random 
tions using metrics on both the continuous markers forest classification algorithm included all continuous 
and the multi-class diagnosis. The metrics described markers, years from baseline, and baseline characteris-
in Sect. 3.3 are used. From Figs. 10 and 4, we observed tics such as age, education, marital status, APOE4 status 
that predictions from all the joint models performed and gender. An overall out-of-bag (OOB) estimated error 
quite well over 2  years, yielding lower mean abso- rate of 4.55% was achieved. The variable importance plot 
lute errors and weighted error scores as compared in Fig. 5 shows the influence of each variable in predict-
to the other models. As expected, the MAE and WES ing clinical status. The baseline diagnosis, CDR Sum of 
increased beyond 2 years. All models yielded consistent Boxes, Study Partner Everyday Cognition, Functional 
Performance Modeling 
Metric
Iddi et al. Brain Inf.             (2019) 6:6 Page 8 of 18
Ventricles_ICV 0.1533 0.1401 0.0481 0.1339 0.1492 0.0624 0.1877 0.1385 0.3272 0.1351 0.4717 1
Hippocampus_ICV 0.2975 0.1967 0.0992 0.2353 0.2339 0.231 0.2564 0.2362 0.3324 0.4463 1 0.6071
Entorhinal 0.3484 0.2501 −0.0315 0.1835 0.2147 0.2698 0.2841 0.2776 0.252 1 0.6449 0.2587
FDG 0.3901 0.2514 0.0874 0.2623 0.3065 0.3048 0.4481 0.3184 1 0.4296 0.5273 0.3933
RAVLT_immediate 0.8245 0.2603 0.1662 0.2067 0.2665 0.5074 0.7177 1 0.6682 0.3749 0.401 0.3464
ρ
1.0
MOCA 0.7679 0.3879 0.2045 0.298 0.3118 0.5705 1 0.7758 0.6498 0.3928 0.372 0.3076 0.5
0.0
MMSE 0.5544 0.2936 0.1812 0.1255 0.2036 1 0.7254 0.7698 0.6674 0.3917 0.3844 0.329
−0.5
−1.0
FAQ 0.341 0.7395 0.1002 0.5843 1 0.7154 0.7493 0.6589 0.6893 0.5322 0.5236 0.3309
EcogSPTotal 0.2579 0.5747 0.3598 1 0.7499 0.563 0.5398 0.5332 0.5796 0.5299 0.5083 0.3534
EcogPtTotal 0.1699 0.2372 1 0.2037 0.1453 0.3399 0.3167 0.369 0.1655 −0.0666 −0.1249 0.1383
CDRSB 0.3554 1 0.2495 0.6693 0.7545 0.6896 0.5896 0.5713 0.535 0.5069 0.4412 0.2479
ADAS13 1 0.6466 0.3065 0.5785 0.7443 0.8164 0.8508 0.9073 0.7576 0.4307 0.4452 0.3597
13 B tal lS E A e
l V V
AS R tTo To
ta Q t G a
D P F
A S C ia D n C C
D P M
M MO ed F orh
i _Is s_
I
A C ogc og
S m nt pu e
E c _im
l
E m ric
E LT oc
a nt
Ve
RA
V ippH
Fig. 2 For each pair of outcomes, the correlations among random intercepts are above the anti-diagonal, and the correlations among random 
slopes are below the anti-diagonal. ADAS13 Alzheimer’s Disease Assessment Scale, CDRSB Clinical Dementia Rating—Sum of Boxes, EcogPtTotal 
everyday cognition participant, EcogSPTotal everyday cognition study partner, FAQ Functional Assessment Questionnaire, FDG FluoroDeoxyGlucose, 
MMSE Mini-Mental State Examination, MOCA Montreal Cognitive Assessment, RAVLT Rey Auditory Verbal Learning Test, ICV intracranial volume
Table 3 Confusion matrix
Actual Predicted Row total Overall 
class 
CN MCI Dementia error
CN 1218 (96.51%) 44 (3.49%) 0 (0.00%) 1262 0.035
MCI 29 (1.16%) 2382 (95.17%) 92 (3.68%) 2503 0.048
Dementia 0 (0.00%) 107 (4.84%) 2105 (95.16%) 2212 0.048
Column total 1247 2533 2197 5977 0.046
This confusion matrix summarizes the performance of the random forest algorithm for classifying diagnoses based on contemporaneous observations. The table 
compares actual diagnoses observed in the training set with diagnoses predicted by the random forest based on observed continuous data and baseline predictors
CN control, MCI mild cognitive impairment
Iddi et al. Brain Inf.             (2019) 6:6 Page 9 of 18
ADAS13 CDRSB EcogPtTotal EcogSPTotal ADAS13 CDRSB EcogPtTotal EcogSPTotal
30   2.1 30    2.1  

  3.0   3.0
  
4 1.8     4 1.8  
 
20    2.5
 20      2.5   
 
      
 1.5  2.0     1.5       2.0 
 2      2 
         
  
 
10           10       
  
    
     1.5       1.5 
  
 1.2 

   

 1.2          
              
 
 
   
        
   0              1.0        0          1.0      
FAQ MMSE MOCA RAVLT_immediate FAQ MMSE MOCA RAVLT_immediate
20 30            30                 
         
  50  SubjectID            SubjectID      25.0  50
 
            
15     315 25.010 28   31528     
 
  
 
    
     1261    1261
 22.5 40 
    
 40 10            22.5    
 4036  26   4036      
 26    5 


20.0   4263
 
 4263
5     

   
30     20.0     30


    4275  
  24
 4275
  
  
0         24 17.5 0        
FDG Entorhinal Hippocampus_ICV Ventricles_ICV FDG Entorhinal Hippocampus_ICV Ventricles_ICV
1.6        1.6 
 0.04  0.04  5000  5000
 
1.5  0.005    1.5


0.005     
       
          

1.4   
 0.03       0.03    1.4                      
  4000              4000        
     
 
   
      

     
             1.3                       1.3  0.02   
     
   0.004      0.004
0.02
  

 
      1.2     1.2   

 3000   3000   
   
  0.01
1.1 
   0.01      
   
1.1
      
0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0
Years from baseline Years from baseline
a IMM. b LTJMM.
ADAS13 CDRSB EcogPtTotal EcogSPTotal ADAS13 CDRSB EcogPtTotal EcogSPTotal
30
  2.1     2.1  


  3.0   3.0
 
 4 1.8     4 1.8
20  
 
 2.5  20        2.5   
      
 
   
 1.5 2.0 
 
  1.5  2.0 
 2    2                      10        10       
       
     1.5       1.5 
  
 1.2 

    
      1.2     
          
     
 

 
   
   

0 1.0 
       
 
                0          1.0      
FAQ MMSE MOCA RAVLT_immediate FAQ MMSE MOCA RAVLT_immediate
30             30           
       
    
        50   SubjectID  
   SubjectID
         50  
 25.0   9    
10    315  25.0    
 

28  315   
       
 28     

  1261   1261     40 
 22.5     6    40


   
 
26    4036 22.5
   
 
  4036
5             
  4263 3 26   4263 
20.0     30
   
 30   
  24  4275
 
  20.0      4275
     
0         0         24
FDG Entorhinal Hippocampus_ICV Ventricles_ICV FDG Entorhinal Hippocampus_ICV Ventricles_ICV
1.6     1.6    
 0.04   0.04  
5000   5000
 
1.5  0.005    1.5  0.005     
       
         
1.4 
    0.03          0.03           
   

 4000    
  1.4       
         4000                            
       

       1.3           

    
       0.004 0.02
 1.3       
  
  0.004 0.02   
 
   
1.2 

   

 
 3000    1.2  3000   
   
  
   0.01

 0.01
1.1    
 1.1  
         
0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0
Years from baseline Years from baseline
c JMM. d BMA.
Fig. 3 Observed values (points) versus predicted lines (lines) of markers for five randomly selected individuals for each of the four modeling 
approaches. IMM Independent mixed-effects model, JMM joint mixed-effects model, LTJMM latent-time joint mixed-effects model, BMA Bayesian 
model averaging, ADAS13 Alzheimer’s Disease Assessment Scale, CDRSB Clinical Dementia Rating—Sum of Boxes, EcogPtTotal everyday cognition 
participant, EcogSPTotal everyday cognition study partner, FAQ Functional Assessment Questionnaire, FDG FluoroDeoxyGlucose, MMSE Mini-Mental 
State Examination, MOCA Montreal Cognitive Assessment, RAVLT Rey Auditory Verbal Learning Test, ICV intracranial volume
Assessment Questionnaire, and Mini-Mental State along with time-varying age, APOEe4 status and gender, 
Examination are the features with the highest impor- achieve overall accuracy and balanced classification accu-
tance. The random forest predictions using predicted racy above 80% for periods less than 2 years (see Fig. 6). 
longitudinal markers from the joint models as inputs Between 2 and 5  years, we achieve an overall accuracy 
Predicted response Predicted response
Predicted response Predicted response
Iddi et al. Brain Inf.             (2019) 6:6 Page 10 of 18
ADAS13 CDRSB EcogPtTotal EcogSPTotal ADAS13 CDRSB EcogPtTotal EcogSPTotal
0.5
3 1.00 20 0.6
0.5
10 0.4 60.75 15 1.0
2
0.3 0.4
0.50 410
0.3
5 1 0.2 0.5
0.25 5 2 0.2
0.1
0 0.00 0 0.1 0.0
FAQ MMSE MOCA RAVLT_immediat FAQ MMSE MOCA RAVLT_immediate
10.0 4
6 16 Model Model
15
7.5 JMM
9 7.5 JMM
3 12 JMMCognitive 10 JMMCognitive4 10
5.0 JMMImage 6 5.0 JMMImage
2 8 IMM IMM
2.5 2 LTJMM 5 3 2.5 5 LTJMM
1 4
BMA BMA
0.0 0
FDG Entorhinal Hippocampus_ICV Ventricles_ICV FDG Entorhinal Hippocampus_ICV Ventricles_ICV
700
0.16
600 6e−04 0.00075 0.0150.0075 700
0.12 500 0.2
400 4e−04 0.0050
0.010
0.08 500 0.00050
300 0.1 0.005
0.04 0.00252e−04 300 0.00025
200
0.000
0.0 2.5 5.0 7.510.0 0.0 2.5 5.0 7.510.0 0.0 2.5 5.0 7.510.0 0.0 2.5 5.0 7.510.0 0.0 2.5 5.0 7.510.0 0.0 2.5 5.0 7.510.0 0.0 2.5 5.0 7.510.0 0.0 2.5 5.0 7.510.0
Years from baseline Years from baseline
a CN. b MCI.
ADAS13 CDRSB EcogPtTotal EcogSPTotal ADAS13 CDRSB EcogPtTotal EcogSPTotal
2.0
8 4 0.5 1.00
0.4 0.3
1.5 310 0.4 0.75
6
0.3 2 0.3 0.50
0.2
4 1.0 5 0.2
0.2 1 0.25
0.1
2 0.5 0.10 0.00
FAQ MMSE MOCA RAVLT_immediat FAQ MMSE MOCA RAVLT_immediate
3.5 5 15
4 4 Model 10.0 Model6
6 3.0 JMM 4 JMM
7.5
3 JMMCognitive 10 JMMCognitive2.5 3 3JMMImage 4 JMMImage
4 5.0
2.0 2 IMM 2 IMM
2 LTJMM 2.5 2 5 LTJMM1.5
2 1 BMA 1 BMA
1.0 0.0
FDG Entorhinal ippocampus_IC Ventricles_ICV FDG Entorhinal Hippocampus_ICV Ventricles_ICV
350 0.00030 0.16
600
0.005
0.075 300 6e−040.00025 0.0090.12 500
0.004
250
0.00020 400 4e−04 0.006
0.050 0.003 0.08
200 300
0.00015 0.002 0.04 0.0032e−04
0.025 150 200
0.00010 0.001
0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 0.0 2.5 5.0 7.510.0 0.0 2.5 5.0 7.510.0 0.0 2.5 5.0 7.510.0 0.0 2.5 5.0 7.510.0
Years from baseline Years from baseline
c Dementia. d All.
Fig. 4 Validation set weighted error scores over time for each model by diagnosis. CN Control, MCI mild cognitive impairment, IMM independent 
mixed-effects model, JMM joint mixed-effects model, LTJMM latent-time joint mixed-effects model, BMA Bayesian model averaging, JMMCognitive 
JMM fitted to cognitive and function outcomes only, JMMImage JMM fitted to imaging markers only
of between 60–80%. To facilitate overall comparisons, IMM, LTJMM, BMA, JMMCognitive and JMMimage, 
we computed BCA aggregated across all the time points respectively. This reinforces the interpretation that the 
and weighted according to the amount of data available JMM with both cognitive and imaging markers performs 
at each time point. These weighted aggregate BCAs were better than the models with either cognitive or imaging 
88.9%, 85.2%, 86.6%, 87.4%, 87.7% and 85.7% for JMM, markers only.
Weighted error score Weighted error score
Weighted error score Weighted error score
Iddi et al. Brain Inf.             (2019) 6:6 Page 11 of 18
DX_bl
CDRSB
EcogSPTotal
FAQ
MMSE
ADAS13
MOCA
RAVLT_immediate
EcogPtTotal
Hippocampus_ICV
Years_bl
Entorhinal
FDG
AGE
Ventricles_ICV
PTEDUCAT
PTMARRY
APOE4
PTGENDER
0 200 400 600 800
MeanDecreaseGini
Fig. 5 Random forest variable importance for categorical diagnosis (cognitively normal, mild cognitive impairment, or dementia). ADAS13 
Alzheimer’s Disease Assessment Scale, CDRSB Clinical Dementia Rating—Sum of Boxes, EcogPtTotal everyday cognition participant, EcogSPTotal 
everyday cognition study partner, FAQ Functional Assessment Questionnaire, FDG FluoroDeoxyGlucose, MMSE Mini-Mental State Examination, 
MOCA Montreal Cognitive Assessment, RAVLT Rey Auditory Verbal Learning Test, ICV intracranial volume, PTGENDER participant’s gender, PTMARRY 
participant’s marital status, PTEDUCAT participant’s education, Year_bl years from baseline, DX_bl baseline diagnosis, APOE4 APOE e4 allele
4.4 S ub‑analysis for subjects with amyloid pathology added benefit with the inclusion of amyloid elevation sta-
information tus. This is not too surprising as the diagnostic classifica-
To explore the role of amyloid pathology, we applied our tion in ADNI is based solely on the clinical presentation 
approach to a subset of the original data involving only done without the clinicians’ knowledge of any biomark-
individuals with amyloid information in both the training ers. Figure  11a, b shows the predictive performance of 
and test dataset as described in Sect. 2. Baseline amyloid the continuous longitudinal markers under each of the 
elevation status was included as a predictor in both the joint models for groups of elevated and non-elevated 
random forest and multivariate mixed-effects models. amyloid individuals, respectively. We observed that the 
To highlight the important role of amyloid status in the models predict follow-up biomarkers outcomes better 
models, we compare the out-of-bag accuracy of the ran- for the individuals with non-elevated amyloid, owing to 
dom forest with versus without including baseline amy- the fact that these individuals are likely to be more sta-
loid status as a predictor on the subset of the training set ble over time. The joint mixed-effects model continues to 
with observed amyloid status. The OOB estimate of error outperform the other models in terms of accuracy. Clas-
rates were 4.99% and 5.13% for analysis with and without sification accuracy of clinical diagnosis is also depicted in 
amyloid information, respectively. Thus, there is a modest Fig. 12. The random forest based on predictions from the 
Iddi et al. Brain Inf.             (2019) 6:6 Page 12 of 18
100 896 100 896
845 845
766 766
734 734
506 498
80 506 80498 259
181
154
259
Model Model
181 126
a JMM a JMM
154 a JMMCognitive 104 a JMMCognitive
60 126 a JMMImage 60 68 a JMMImage
a IMM a IMM
a LTJMM a LTJMM
104 68 a BMA a BMA
40 40
20 20
0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0
Years from baseline Years from baseline
a Diagnosis accuracy. b Balanced classification accuracy.
Fig. 6 Comparison of performance metrics for categorical diagnosis. Note that only the LTJMM did not include baseline diagnosis as a covariate. 
The numbers on the graph represent the number of subjects at each of the time points. IMM Independent mixed-effects model, JMM joint 
mixed-effects model, LTJMM latent-time joint mixed-effects model, BMA Bayesian model averaging, JMMCognitive JMM fitted to cognitive and 
function outcomes only, JMMImage JMM fitted to imaging markers only
joint models and baseline characteristics again yields bal- using a univariate logistic regression model, their approach 
ance classification accuracy of above 80% for the first two yielded accuracies of 82% at 5 years and 71% at 10 years. 
and a half years and declined over time. Again, the joint Using a survival regression approach, Tabert et al. [34] pre-
mixed-effects model combined with the random forest dicted conversion from MCI to AD based on neurologi-
algorithm consistently outperformed the others. cal batteries used as inputs and adjusted for other study 
participants’ characteristics. Their approach resulted in 
5  Discussion and conclusion a 3-year predictive accuracy of 86%. Time-to-event out-
In this study, we have investigated the use of a two-stage comes generally have the ability to improve predictions 
data-driven approach to modeling and predicting the over univariate logistic regression models. A more recent 
progression of AD markers and clinical diagnosis. Lon- review by Rathore et al. [35] details how different classifi-
gitudinal data were jointly modeled to take advantage cation frameworks have been used as an effective tool for 
of correlations among outcomes and within individuals. making individualized diagnosis and prediction. Classifi-
Random forests were used to derive an algorithm to cat- cation accuracies ranged from 70 to 95% for binary classi-
egorize diagnoses. Predictions were assessed on an inde- fication. These accuracies are impressive, but might not be 
pendent validation set. The approach achieved overall comparable to the accuracies that we have reported. One 
accuracy and balanced classification accuracy of above reason for the incomparability is that the accuracies that 
80% for the first 2 years, but accuracy diminished precipi- we report are based on a held-out test that was not used to 
tously beyond 2 years. This finding supports the utility of fit models. The accuracies we report also blend initial diag-
our two-stage method for predicting disease course over noses and consider all possible transitions (multinomial 
a limited time frame. The findings also support the use outcome) of disease status rather than the binary approach 
of machine learning methods to derive algorithms which adopted by these authors. For example, the classifica-
might help avoid subjectivity in diagnostic categorization. tion approach by Tierney et al. [33] does not include MCI 
A number of publications have addresses diagnostic pre- patients. However, it is generally more difficult to discrimi-
diction at various stages of AD. For example,  Tierney  et nate between adjacent diagnoses (e.g., cognitively normal 
al. [33] attempted to predict the onset of dementia at 5 and and MCI) compared to non-adjacent diagnoses (e.g., cog-
10  years based on an initial neurological test battery. By nitively normal and dementia).
Overall diagnosis accuracy
Overall balanced 
 classification accuracy
Iddi et al. Brain Inf.             (2019) 6:6 Page 13 of 18
The different approaches we considered for the “stage Abbreviations
one” modeling each have their own strengths and weak- AD: Alzheimer’s disease; ADAS13: Alzheimer’s Disease Assessment—Cogni-
tive 13-item scale; ADNI: Alzheimer’s Disease Neuroimaging Initiative; APOE: 
nesses. The independent mixed model, for example, is apolipoprotein E gene; BCA: balanced classification accuracy; BMA: Bayesian 
easier to fit than the joint mixed-effects models and is model averaging; CN: cognitively normal; CDRSB: Clinical Dementia Rating—
also less cumbersome to interpret. However, this model Sum of Boxes; CSF: cerebrospinal fluid; ECog: everyday cognition; ECogPtTotal: 
ECog participant total; ECogSPTotal: ECog study partner total; FAQ: Functional 
ignores the correlations among outcomes which are Assessment Questionnaire; FDG: fluorodeoxyglucose; ICV: intracranial 
generally known to be mild to strong for some pairs of volume; IMM: independent mixed-effects model; JMM: joint mixed-effects 
AD markers. The correlation matrix of the random model; JMMCognitive: JMM fitted to cognitive and function outcomes only; 
JMMImage: JMM fitted to imaging markers only; LTJMM: latent time joint 
effects estimated in this study provides evidence of these mixed-effects model; LOOIC: leave-one-out information criterion; MAE: 
between-outcome associations. On the other hand, joint mean absolute error; MCMC: Markov Chain Monte Carlo; MMSE: Mini-Mental 
models are complex, take more computational time, State Examination; MOCA: Montreal Cognitive Assessment; MRI: magnetic 
resonance imaging; PET: positron emission tomography; RAVLT Immediate: 
and can be challenging to interpret. In the presence of Rey Auditory Verbal Learning Test Immediate; SUVR: standardized uptake value 
baseline diagnosis, the conventional joint mixed-effects ratio; WAIC: widely applicable information criterion; WES: weighted error score.
model was preferred by the model selection criteria we Acknowledgements
considered. The latent-time joint mixed-effects model, We are grateful to the ADNI study volunteers and their families.
motivated by the desire to predict long-term trajecto- The Alzheimer’s Disease Neuroimaging Initiative: Data used in prepara-
ries with short-term follow-up data, may be useful when tion of this article were obtained from the Alzheimer’s Disease Neuroimaging 
Initiative (ADNI) database (http://adni.loni.usc.edu). As such, the investigators 
baseline diagnosis is unknown. The Bayesian model aver- within the ADNI contributed to the design and implementation of ADNI and/
aging, which aggregates the other models, is probably the or provided data but did not participate in analysis or writing of this report. 
most complex but helps to account for model uncertainty A complete listing of ADNI investigators can be found at: http://adni.loni.usc.
edu/wp-conte nt/uploa ds/how_to_apply /ADNI_Acknow ledg ement _List.pdf.
in the estimation of parameters and prediction.
Some modifications might improve the prediction Authors’ contributions
accuracy of the proposed two-stage algorithm. Instead SI, DL, WKT, MCD conceived the methodological idea for the study. SI, DL and 
MCD contributed to the writing of the computer codes and performed the 
of relying on a single time point to predict future course, analysis. PSA and MSR provided expertise in the selection of markers for inclu-
one could utilize run-in data from multiple time points, sion and the clinical interpretations of the findings. SI drafted the manuscript 
which would likely improve estimates of subject-specific with contributions, comments and editing from DL, WKT, MCD, PSA and MSR. 
All authors read and approved the final manuscript.
trajectories. Also, our models only considered a simple 
linear time trend. And while nonlinear trends were not Funding
supported by the data at hand, it is possible that a more This work was supported by Biomarkers Across Neurodegenerative Disease 
(BAND-14-338179) Grant from the Alzheimer’s Association, Michael J. Fox 
flexible mean structure might improve model perfor- Foundation, and Weston Brain Institute; and National Institute on Aging 
mance. Larger datasets and/or improved disease markers Grant R01-AG049750. Data collection and sharing for this project was funded 
might also serve to enhance the quality of predictions in by the ADNI (National Institutes of Health Grant U01 AG024904) and DOD 
ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is 
the future. funded by the National Institute on Aging, the National Institute of Biomedi-
The approach can be applied to sharpen clinical trial cal Imaging and Bioengineering, and through generous contributions from 
inclusion and exclusion criteria to provide target popu- the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery 
Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb 
lations with desired predicted longitudinal characteris- Company; CereSpir, Inc.; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and 
tics, e.g., a cognitively normal population with increased Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company 
risk of imminent progression to MCI. However, such an Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immu-
notherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical 
application might complicate and prolong the recruit- Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso 
ment process and eventual drug labeling. Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis 
In the clinic, these methods can be applied to improve Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Phar-
maceutical Company; and Transition Therapeutics. The Canadian Institutes of 
the accuracy of prognosis. Improved prognostic accuracy Health Research is providing funds to support ADNI clinical sites in Canada. 
can help physicians, patients, and families make more Private sector contributions are facilitated by the Foundation for the National 
informed decisions regarding therapies and care through Institutes of Health (www.fnih.org). The grantee organization is the Northern 
California Institute for Research and Education, and the study is coordinated 
the transitions from healthy cognition, to mild impair- by the Alzheimer’s Therapeutic Research Institute at the University of Southern 
ment, to dementia. Once effective therapies have been California.
discovered, the proposed two-stage approach could be Availability of data and materials
fit to clinical trial data to provide a more sophisticated ADNI data are disseminated by the Laboratory for Neuro Imaging at the 
model of treatment response. Such a treatment response University of Southern California. This work used the TADPOLE data sets https 
model, would provide personalized “theragnoses,” or pre- ://tadpo le.grand -chall enge.org constructed by the EuroPOND consortium 
http://europ ond.eu funded by the European Union’s Horizon 2020 research 
dictions of treatment response; and help make decisions and innovation programme under Grant Agreement No. 666992.
on when, and to whom, to prescribe therapies.
Competing interests
The authors declare that they have no competing interests.
Iddi et al. Brain Inf.             (2019) 6:6 Page 14 of 18
Appendix: Supplementary appendix
See Figs. 7, 8, 9, 10, 11 and 12.
Fig. 7 Number of individuals observed at each visit by initial diagnosis. CN Control, MCI mild cognitive impairment
Fig. 8 Observed and imputed values. The MissForest algorithm was used to impute missing values which appear to be plausible when compared 
to observed values at other visits. CN Control, MCI mild cognitive impairment
Iddi et al. Brain Inf.             (2019) 6:6 Page 15 of 18
Fig. 9 Observed values (points) versus predicted lines (lines) based on only their baseline data for each of the four modeling approaches for 
subject#314 and for subject#4263. IMM Independent mixed-effects model, JMM joint mixed-effects model, LTJMM latent-time joint mixed-effects 
model, BMA Bayesian model averaging, ADAS13 Alzheimer’s Disease Assessment Scale, CDRSB Clinical Dementia Rating—Sum of Boxes, EcogPtTotal 
everyday cognition participant, EcogSPTotal everyday cognition study partner, FAQ Functional Assessment Questionnaire, FDG FluoroDeoxyGlucose, 
MMSE Mini-Mental State Examination, MOCA Montreal Cognitive Assessment, RAVLT Rey Auditory Verbal Learning Test, ICV intracranial volume
Iddi et al. Brain Inf.             (2019) 6:6 Page 16 of 18
Fig. 10 Mean absolute error. CN Control, MCI mild cognitive impairment
Iddi et al. Brain Inf.             (2019) 6:6 Page 17 of 18
Fig. 11 Weighted error score for subset of the population with amyloid burden information. IMM Independent mixed-effects model, JMM joint 
mixed-effects model, LTJMM latent-time joint mixed-effects model
Fig. 12 Comparison of performance metrics on clinical status for subset of the population with amyloid burden information. Note that only the 
LTJMM did not include baseline diagnosis as a covariate. The numbers on the graph represent the number of subjects at each of the occasions. 
CN Control, MCI mild cognitive impairment, IMM independent mixed-effects model, JMM joint mixed-effects model, LTJMM latent-time joint 
mixed-effects model
Iddi et al. Brain Inf.             (2019) 6:6 Page 18 of 18
Author details TADPOLE challenge: prediction of longitudinal evolution in Alzheimer’s 
1 Alzheimer’s Therapeutic Research Institute, Keck School of Medicine, Univer- disease. arXiv: 1805.03909 
sity of Southern California, San Diego, USA. 2 Department of Family Medicine  19. Tsiatis AA, Davidian M (2004) A joint modeling of longitudinal and time-
and Public Health, University of California, San Diego, USA. 3 Department to-event data: an overview. Stat Sin 14:809–834
of Statistics and Actuarial Science, University of Ghana, Legon-Accra, Ghana.  20. Andrinopoulou ER, Eilers PHC, Takkenberg JJM, Rizopoulos D (2017) 
4 African Population and Health Research Center, APHRC Campus, Manga Improved dynamic predictions from joint models of longitudinal and 
Close, Off Kirawa Road, P.O. Box 10787-00100, Nairobi, Kenya. survival data with time-varying effects using p-splines. Biometrics. https 
://doi.org/10.1111/biom.12814 
Received: 9 February 2019   Accepted: 17 June 2019  21. Breiman L (2001) Random forests. Mach Learn 45(1):5–32
 22. Johnson KA, Sperling RA, Gidicsin CM, Carmasin JS, Maye JE, Coleman 
RE, Reiman EM, Sabbagh MN, Sadowsky CH, Fleisher AS, Doraiswamy 
M, Carpenter AP, Clark CM, Joshi AD, Lu M, Grundman M, Mintun MA, 
Pontecorvo MJ, Skovronsky DM (2013) Florbetapir (f18-av-45) pet to 
References assess amyloid burden in Alzheimer’s disease dementia, mild cognitive 
 1. Steyerberg WE (2009) Clinical prediction models: a practical approach to impairment, and normal aging. Alzheimer’s Dement 9(5):72–83
development, validation and updating. Springer, New York  23. Tapiola T, Alafuzoff I, Herukka S-K, Parkkinen L, Hartikainen P, Soininen H, 
 2. Petersen RC (2004) Mild cognitive impairment as a diagnostic entity. J Pirttila T (2009) Cerebrospinal fluid β-amyloid 42 and tau proteins as bio-
Intern Med 256(3):183–194 markers of Alzheimer-type pathologic changes in the brain. Arch Neurol 
 3. Chong MS, Sahadevan S (2005) Preclinical Alzheimer’s disease diagnosis 66(3):382–389
and prediction of progression. Lancet Neurol 4:576–579  24. Joshi AD, Pontecorvo MJ, Clark CM, Carpenter AP, Jennings DL, Sadowsky 
 4. Sperling RA, Rentz DM, Johnson KA, Karlawish J, Donohue M, Salmon DP, CH, Adler LP, Kovnat KD, Seiby JP, Arora A, Saha K, Burns JD, Lowrey MJ, 
Aisen P (2014) The a4 study: stopping ad before symptoms begin? Sci Mintun MA, Skovronsky DM, the Florbetapir F18 Study Investigators 
Transl Med 6(228):228-1322813 (2012) Performance characteristics of amyloid pet with florbetapir f18 in 
 5. Rowe CC, Ellis KA, Rimajova M, Bourgeat P, Pike KE, Jones G, Fripp J, patients with Alzheimer’s disease and cognitively normal subjects. J Nucl 
Tochon-Danguy H, Morandeau L, O’Keefe G et al (2010) Amyloid imaging Med 53(3):378–384
results from the Australian Imaging, Biomarkers and Lifestyle (AIBL) study  25. Li D, Iddi S, Thompson WK, Donohue MC (2017) Bayesian latent time joint 
of aging. Neurobiol Aging 31(8):1275–1283 mixed effect models for multicohort longitudinal data. Stat Methods Med 
 6. Donohue MC, Sperling RA, Petersen R, Sun C, Weiner MW, Aisen PS (2017) Res 28(3):835–845
Association between elevated brain amyloid and subsequent cognitive  26. Iddi S, Li D, Aisen P, Rafii M, Thompson WK, Litvan I, Donohue MC (2018) 
decline among cognitively normal persons. JAMA 317(22):2305–2316 Estimating the evolution of disease in the Parkinson’s Progression Markers 
 7. Gray KR, Aljabar P, Heckemann RA, Hammers A, Rueckert D, for the Initiative. Neurodegener Dis (Accepted)
Alzheimer’s Disease Neuroimaging Initiative (2013) Random forest-based  27. Stan Development Team (2016) Stan modeling language users guide and 
similarity measures for multi-modal classification of Alzheimer’s disease. reference manual, Version 2.12.0. http://mc-stan.org/
Neuroimage 65:167–175  28. Stan Development Team (2016) RStan: the R interface to Stan, Version 
 8. Ortiz A, Gorriz JM, Ramirez J, Martinez-Murcia FJ, for the Alzheimer’s 2.10.1. http://mc-stan.org
Disease Neuroimaging Initiative (2013) LVQ-SVM based CAD tool applied  29. Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999) Bayesian model 
to structural MRI for the diagnosis of the Alzheimer’s disease. Pattern averaging: a tutorial. Stat Sci 14(4):382–417
Recognit Lett 34:1725–1733  30. Liaw A, Wiener M (2002) Classification and regression by randomforest. R 
 9. Stefano FD, Epelbaum S, Coley N, Cantet C, Ousset P-J, Hampel H, Bakard- News 2(3):18–22
jian H, Lista S, Vellas B, Dubois B, Andrieu S, for the GuidAge Study Group  31. Vehtari A, Gelman A, Gabry J (2017) Practical Bayesian model evaluation 
(2015) Prediction of Alzheimer’s disease dementia: data from the guidage using leave-one-out cross-validation and WAIC. Stat Comput 27:1413–
prevention trial. J Alzheimer’s Dis 48:793–804 1432. https ://doi.org/10.1007/s1122 2-016-9696-4
 10. Buckley RF, Maruff P, Ames D, Bourgeat P, Martins RN, Masters CL, Rainey-  32. Stekhoven DJ, Buhlmann P (2012) Missforest—nonparametric missing 
Smith S, Lautenschlager N, Rowe CC, Savage G, Villemagne VL, Ellis KA, value imputation for mixed-type data. Bioinformatics 28(1):112118
on behalf of the AIBL Study (2016) Subjective memory decline predicts  33. Tierney MC, Yao C, Kiss A, McDowell I (2005) Neuropsychological test 
greater rates of clinical progression in preclinical Alzheimer’s disease. accurately predict incident Alzheimer disease after 5 and 10 years. Neu-
Alzheimer’s Dement 12:776–785 rology 64:1853–1859
 11. Seixas FL, Zadrozny B, Laks J, Conci A, Saade DCM (2014) A Bayesian  34. Tabert MH, Manly JJ, Liu X, Pelton GH, Rosenblum S, Jacobs M, Zamora D, 
network decision model for supporting the diagnosis of dementia, Goodkind M, Bell K, Stern Y, Devanand DP (2006) Neuropsychological pre-
Alzheimer’s disease and mild cognitive impairment. Comput Biol Med diction of conversion to Alzheimer disease in patients with mild cognitive 
51:140–158 impairment. Arch Gen Psychiatry 63:916–924
 12. Beheshti I, Demirel H, Matsuda H, for the Alzheimer’s Disease Neuroimag-  35. Rathore S, Habes M, Iftikhar MA, Shacklett A, Davatzikos C (2017) A review 
ing Initiative (2017) Classification of Alzheimer’s disease and prediction on neuroimaging-based classification studies and associated feature 
of mild cognitive impairment-to-Alzheimer’s conversion from structural extraction methods for Alzheimer’s disease and its prodromal stages. 
magnetic resource imaging using feature ranking and a genetic algo- Neuroimage. https ://doi.org/10.1016/j.neuroi mage .2017.03.057
rithm. Comput Biol Med 83:109–119
 13. Zheng C, Xia Y, Pan Y, Chen J (2016) Automated identification of dementia 
using medical imaging: a survey from a pattern classification perspective. Publisher’s Note
Brain Inform 3:17–27 Springer Nature remains neutral with regard to jurisdictional claims in pub-
 14. Folstein MF, Folstein SE, McHugh PR (1975) Mini-mental state: a practical lished maps and institutional affiliations.
method for grading the cognitive state of patients for the clinician. J 
Psychiatr Res 12(3):189–198
 15. Wechsler D (1987) WMS-R: Wechsler Memory Scale-revised. Psychological 
Corporation, New York
 16. Morris JC (1993) The clinical dementia rating (CDR): current version and 
scoring rules. Neurology 43(11):2412–2414
 17. Tang BL, Kumor R (2008) Biomakers of mild cognitive impairment and 
Alzheimer’s disease. Ann Acad Med Singapore 37:406–410
 18. Marinescu RV, Oxtoby NP, Young AL, Bron EE, Toga AW, Weiner MW, Bark-
hof F, Fox NC, Klein S, Alexander DC, the EuroPOND Consortium (2018)