Iddi et al. Brain Inf. (2019) 6:6 https://doi.org/10.1186/s40708-019-0099-0 Brain Informatics RESEARCH Open Access Predicting the course of Alzheimer’s progression Samuel Iddi1,3,4 , Dan Li1, Paul S. Aisen1, Michael S. Rafii1, Wesley K. Thompson2, Michael C. Donohue1* and for the Alzheimer’s Disease Neuroimaging Initiative Abstract Alzheimer’s disease is the most common neurodegenerative disease and is characterized by the accumulation of amyloid-beta peptides leading to the formation of plaques and tau protein tangles in brain. These neuropathological features precede cognitive impairment and Alzheimer’s dementia by many years. To better understand and predict the course of disease from early-stage asymptomatic to late-stage dementia, it is critical to study the patterns of progression of multiple markers. In particular, we aim to predict the likely future course of progression for individuals given only a single observation of their markers. Improved individual-level prediction may lead to improved clinical care and clinical trials. We propose a two-stage approach to modeling and predicting measures of cognition, func- tion, brain imaging, fluid biomarkers, and diagnosis of individuals using multiple domains simultaneously. In the first stage, joint (or multivariate) mixed-effects models are used to simultaneously model multiple markers over time. In the second stage, random forests are used to predict categorical diagnoses (cognitively normal, mild cognitive impairment, or dementia) from predictions of continuous markers based on the first-stage model. The combination of the two models allows one to leverage their key strengths in order to obtain improved accuracy. We characterize the predictive accuracy of this two-stage approach using data from the Alzheimer’s Disease Neuroimaging Initiative. The two-stage approach using a single joint mixed-effects model for all continuous outcomes yields better diagnos- tic classification accuracy compared to using separate univariate mixed-effects models for each of the continuous outcomes. Overall prediction accuracy above 80% was achieved over a period of 2.5 years. The results further indicate that overall accuracy is improved when markers from multiple assessment domains, such as cognition, function, and brain imaging, are used in the prediction algorithm as compared to the use of markers from a single domain only. Keywords: Alzheimer’s disease, Biomakers, Classification Clinical diagnosis, Disease trajectories, Joint mixed-effects models, Latent time shift, Model averaging, Multi-level Bayesian models, Multi-cohort longitudinal data, Predictions, Random forest 1 Introduction researches in the field contend that preventative strate- Prediction of future Alzheimer’s disease (AD)-related gies initiated prior to the appearance of advanced symp- progression is extremely valuable in clinical practice toms are most likely to be successful  [2–4]. Therefore and in medical research. In clinical practice, the ability identifying candidates for therapies while they are still to accurately predict the diagnosis of a patient can help cognitively normal (CN) or mildly cognitively impaired physicians make more informed clinical decisions on (MCI) is key for clinical trials, and eventually clinical treatment strategies  [1]. Clinical trials are more likely practice. to be successful if the individuals selected for the trials The pathology of AD is characterized by the accumu- are those most likely to benefit from the therapy. Many lation of amyloid plaques and neurofibrillary tangles in the brain beginning as early as middle age. The amyloid *Correspondence: mdonohue@usc.edu hypothesis posits that plaques caused by the gradual 1 Alzheimer’s Therapeutic Research Institute, Keck School of Medicine, buildup of beta-amyloid ( Aβ ) peptides damage brain University of Southern California, San Diego, USA Full list of author information is available at the end of the article regions responsible for cognition thereby leading to © The Author(s) 2019. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creati veco mmons. org/licens es/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Iddi et al. Brain Inf. (2019) 6:6 Page 2 of 18 impairment. Recent studies have shown that the pathol- only a single assessment (i.e., “baseline”). This empha- ogy of the disease occurs several years before the onset of sis on subject-level prediction from a single timepoint clinical symptoms, making the disease difficult to detect is distinct from much of the literature which focuses on at an early stage  [5, 6]. In addition, prediction of the group-level prediction and the relative importance of future diagnosis of an individual (CN, MCI, or dementia) various predictors. In the first stage, we model continu- is very challenging due to high subjectivity and individ- ous disease markers using joint mixed-effects models. ual-level variability in cognitive assessments and levels of In the first stage, the joint mixed-effect model allows biomarkers, which have typically been used for staging of the simultaneous modeling and prediction of multiple AD. The assessment of an individual’s current diagnosis modalities such as cognitive and functional assessments, can vary from one clinician to the next, or from one day brain imaging, and biofluid assays with fixed effects for to the next. covariates like age, sex, and genetic risk. Joint models Classification and prediction based on expert knowl- have the advantage of modeling the correlation among edge, machine learning algorithms  [7, 8], regression- outcomes to improve prediction and precision of esti- based prediction models [9, 10] and some combinations mates [19, 20]. of these  [11] have been proposed. Beheshti et al[12] In the second stage of prediction, a random forest algo- recently developed a computer-aided diagnosis system rithm is used to categorize the panel of predicted contin- to predict conversion from MCI to AD using magnetic uous markers into a diagnosis of CN, MCI, or dementia. resonance imaging (MRI) data. Zheng et al[13] sur- Random forests combine many decision trees created veyed other automated techniques for classifying and from random sampling of the data and predictors  [21]. predicting diagnosis with reasonable reliability using Each decision tree recursively partitions the predictors data from different imaging modalities. The reliability of to classify individuals into one of the three diagnoses. these approaches is often assessed by the sensitivity and While an alternative approach might view diagnosis as a specificity of the methods, accuracy rate, and absolute random variable correlated with other disease markers, error rates, among other criteria. Approaches with high we view diagnosis as a deterministic categorization of the accuracy rates and precision are desirable. The diagno- clinical presentation of each individual. That is, diagnosis sis of CN, MCI, or mild dementia by expert clinicians should be algorithmically determined for given presenta- has traditionally relied on cognitive assessments such as tion of the continuous markers. The random forest model the Mini-Mental State Examination (MMSE)  [14], Logi- gives us an estimate of this algorithmic categorization. cal Memory  [15] and structured clinical assessments Overall performance is assessed using an independent such as the Clinical Dementia Rating (CDR)  [16]. How- validation set. ever, including multiple domains might help explain and more accurately predict the varying rates of decline that 2 Data description are typical. For example, it is common to find individuals The two-stage approach is applied to data from the Alz- who present with symptoms consistent with MCI or mild heimer’s Disease Neuroimaging Initiative (ADNI). ADNI AD dementia, but who lack biomarker evidence of AD is a prospective observational cohort study, which began pathology. Such an individual might have other pathol- in 2004 and continues to this day. The study is carried out ogy that will exhibit a different rate of progression. Going across 55 research centers in the USA and Canada. Over beyond the cognitive domain to multi-domain analysis is 1900 volunteers with normal cognition or impairment therefore appealing. Longitudinal cognitive assessments consistent with MCI or AD dementia were recruited for combined with neuroimaging and biomarkers can more this study. The first cohort, referred to as ADNI-1, con- easily facilitate diagnosis and increase prediction accu- sists of 800 individuals: 200 CN, 400 with late MCI, and racy  [3, 17]. While multi-domain analyses are interest- 200 with mild dementia. ADNI-GO, the second cohort, ing, intuitive and potentially more informative, they have added about 200 additional individuals with early MCI. been relatively uncommon due to modeling challenges. In ADNI-2, more participants at different stages of AD The Alzheimer’s Disease Prediction Of Longitudinal were recruited to monitor AD progression. ADNI-3 is Evolution (TADPOLE) Challenge [18] is a challenge that presently enrolling additional individuals with CN, MCI, compares performance of algorithms at making future and dementia. At each new phase, prior cohorts were predictions of AD disease markers and clinical diagno- invited back for continued follow-up, with the exception sis using historical data form the Alzheimer’s Disease of individuals enrolled with dementia, who were followed Neuroimaging Initiative (ADNI) study. Motivated by this for a maximum of 2  years. Some ADNI-1 individuals challenge, we aim to propose a two-stage approach that have now been followed in excess of 10 years. Key objec- can reliably predict an individual’s future course of dis- tives of ADNI are to validate the use of markers of AD for ease, including transition to MCI and dementia, using diagnosis and clinical trials, and to study rates of change Iddi et al. Brain Inf. (2019) 6:6 Page 3 of 18 in cognitive and functional assessments, brain imaging individual at time j. The joint mixed-effect model is and a number of biomarkers. The inclusion and exclusion defined criteria, schedule of assessments, and other details can be found at http://adni.loni.usc.edu/. We focus on the fol- yijk = ′xijkβk + α0ik + α1ik tij + εijk (1) lowing assessments: Alzheimer’s Disease Assessment— Cognitive 13-item scale (ADAS13), Clinical Dementia where βk; k = 1, 2, . . . , p , are sets of fixed-effect regres- Rating—Sum of Boxes (CDRSB), Mini-Mental State sion coefficients, α0ik and α1ik are outcome- and individ- Examination (MMSE), Montreal Cognitive Assessment ual-specific random intercepts and slopes, respectively. (MOCA), Rey Auditory Verbal Learning Test Immediate The random intercepts and slopes are assumed to fol- (RAVLT Immediate), Everyday Cognition (ECog)—total low a multivariate normal distribution with mean vec- by participant (ECogPtTotal) and study partner (ECogSP- tor, 0 and variance–covariance matrix, D for the entire Total) and Functional Assessment Questionnaire (FAQ). 2p-dimensional vector of random effects for each sub- Brain imaging measures include volumetric Magnetic ject. The error term follows ε 2ijk ∼ N (0, σk ) . The assumed Resonance Imaging (MRI) summaries of entorhinal cor- homogeneity is over time of the error term for a given tical thickness, and ventricular and hippocampal volume outcome and across all subjects. We assume that the normalized to intracranial volume (ICV); and fluorode- random components αik and εijk for k = 1, 2, . . . , p are oxyglucose positron emission tomography (FDG-PET) independent. The random effects allow the model to summaries of glucose metabolism. Baseline diagnosis, accommodate both the temporal correlation and cor- age, gender, and carriage of APOE e4 allele were included relation among the markers. A special case of this joint as covariates. model is the independent mixed-effects model (IMM), We also focus on a second set of analyses among indi- which does not explicitly model the correlation among viduals where beta-amyloid data were available. The outcomes. This is similar to fitting separate mixed-effects buildup of beta-amyloid in the brain and in cerebrospinal model per outcome. fluid (CSF) is known to be strongly involved in AD [22, We also consider the latent time joint mixed-effects 23]. For some patients in the ADNI study, florbetapir PET model (LTJMM) [25]: scans or CSF Aβ42 was acquired to detect amyloid lev- ′ els in brain. We classified individuals as having elevated yijk = xijkβk + γk(tij + δi)+ α0ik + α1ik tij + εijk . amyloid (“amyloid positive”) if florbetapir PET standard- (2) ized uptake value ratio (SUVR) was above 1.10  [22, 24] The model is similar to  1, but introduces individual- or if CSF Aβ was less than 909.6 pg/ml; and as amyloid specific latent time shifts, δi , representing “long-term” negative otherwise. The CSF Aβ cutoff was determined disease time. The model also includes outcome-specific so that it yielded the same proportion of amyloid posi- slopes γk > 0 with respect to δi . The δi are assumed tives as the florbetapir cutoff. Amyloid elevation status to be normally distributed with zero mean and vari- 2 was included as a predictor in this second set of analysis. ance, σδ . The random components, δi , αik and εijk for k = 1, 2, . . . , p are also assumed to be independent. An extension of this model to allow heterogeneous latent- 3 M ethodology time (i.e., the variability of the latent-time is made to vary We propose a two-stage approach for prediction of con- across individuals) is described in [26]. tinuous disease markers and categorical diagnosis. For Estimation of the joint models is by Markov Chain the first stage, we propose the traditional joint, or mul- Monte Carlo (MCMC). Posterior draws are obtained tivariate outcome, mixed-effects model; but we also from the posterior distributions of the joint models given consider two alternative approaches. We also consider respectively by: a latent-time joint mixed-effects model and a Bayesian model averaging combining posterior estimates of the P(θ |Y) ∝ P(Y|θ)P(θ |τ ) aforementioned joint models. In the second stage, the P(βk ,αi,k |yijk) ∝ P(yijk |βk ,αi,k ,D, σ 2k )P(βk) predicted markers are submitted to a random forest to × P(α |D)P(D)P(σ 2) further predict diagnosis. We next describe the first-stage ik k model in greater detail. where the variance–covariance matrix, D is decomposed as D = VV . For numerical stability, the Cholesky fac- ′ 3.1 M ethods for predicting continuous markers torization is applied to the correlation matrix,  = LL , Suppose y represents k outcomes (k = 1, . . . , p) where L is a lower triangular matrix. For the latent time ijk observed at time t (j = 1, . . . , q ) for each individual, joint mixed-effects model, θ = (β ′ ij i k ,αi,k , γk , δi) and 2 ′ i (i = 1, . . . , n) , and is a set of covariates for the ith τ = (D, σk ) . The component, V is a diagonal matrix of xijk Iddi et al. Brain Inf. (2019) 6:6 Page 4 of 18 standard deviations (square-root of diagonal entries of in the test dataset. However, in fitting these models to D ). Furthermore, the random component, αik is stand- the training data, we propose to include baseline data ardized to z ∼ N (0, I) , where I is the identity matrix and for subjects in the test data to allow for the estimation of the random effects are then calculated as VLz . Prior dis- random effects for these subject. The estimated outcome- tributions are placed on the hyperparameters. A weakly specific random intercepts and slopes for each subject informative normal prior, N (0, 102) is placed on βk , and a are required to make the subject-level predictions. The weakly informative half-Cauchy prior, Cauchy(0, 2.5) , is resulting follow-up predictions are then used as inputs assumed for the components of V, σk , γk and σδ . Finally, in the random forest for the next stage of algorithmically the LKJ prior is placed on the Cholesky factors of  [27]. predicting diagnosis status. MCMC sampling is done using the R software package, RStan  [28]. We used 5000 iterations, and the first 2500 3.2 M ethod for predicting clinical diagnosis warmup iterations are discarded. Two MCMC chains The random forest algorithm is an ensemble learning were used and thinned by a factor of 5. Predictions of method for classification and regression. It operates by biomarkers and their corresponding credible intervals generating several classification or regression trees and were based on posterior draws. We apply Bayesian model aggregating them. Each tree in the forest is constructed averaging to the multivariate mixed models for the using bootstrap samples of the data. The algorithm, imple- selected continuous biomarkers [29, 30]. The predictions mented in the R package “randomForest” [30], is fitted to of future values of biomarkers and the corresponding the training dataset using 100 trees. In particular, diagnosis credible intervals are obtained after combining all poste- which was re-evaluated at every visit by clinicians was used rior prediction estimates of all the models (model averag- as the target feature for the random forest, and predicted ing). Suppose y∗ijk is the prediction of outcome k for follow-up continuous markers and baseline predictors individual i at future time j. The posterior distribution of of subjects as input features. Observation times are also the prediction given the data, D is the average of poste- included as a continuous predictor. A number of individu- rior distribution of the models weighted by the posterior als had incomplete assessments at some study visits, which model probabilities and is given by the random forest algorithm is not able to accommodate. To avoid discarding these incomplete visits entirely when S ∑ fitting the random forest, we apply an imputation method, P(y∗ijk |D) = P(y ∗ ijk |Ms,D)P(Ms|D). the “MissForest” algorithm [32], to impute the missing val- s=1 ues. This algorithm, implemented in the R package “miss- where M , s = 1, 2, . . . , S represents the models. The pos- Forest”, imputes missing values for mixed-type data (e.g., s terior distribution of the models is expressed as continuous and categorical) using a nonparametric ran- dom forest methodology. The method can flexibly accom- P(Ms|D) ∝ P(D|Ms)P(Ms) modate mixed-type outcomes, complex interactions and ∫ where P(D|Ms) = P(D|θ s,Ms)P(θ s|Ms)dθ s and θ s is nonlinear relationships among variables. In addition, it the vector of parameters under model s. The predicted does not require the specification of a parametric model or mean and variance are obtained from the posterior dis- distributional assumptions. To determine variables which tribution of the predictions. are important for predicting the response, we use the The JMM, and LTJMM were fit to training data variable importance plot, which depicts the influence of described in Sect. 4. To demonstrate the benefit of joint each variable characterized by the mean decrease in node modeling, single or independent mixed-effects (IMM) impurity (Gini Index [21]). model were fit to the data for comparison. For the JMM and IMM models, age, gender, APOEe4, and baseline 3.3 Model performance metrics diagnosis were included as covariates. The latent-time To evaluate the quality of the predictions of the continu- models did not include baseline diagnosis since includ- ous markers, we use two performance metrics. The first ing this would make the model parameters uninterpret- metric, the mean absolute error (MAE), is calculated as able due to the presence of the latent-time component (see  [25] for details). Two common model selection cri- N1 ∑ teria are applied, the widely applicable information crite- MAE = |P̂i − Pi|,N rion (WAIC) or the leave-one-out information criterion i=1 (LOOIC)  [31]. Models with lower values of WAIC and where N is the observation count, P̂i represent the pre- LOOIC are preferred. dicted or forecasted future values, and Pi is the observed The models described above are fitted to the training value of the marker for an individual i in the test data. dataset in order to make follow-up prediction for subjects The second metric, which takes confidence interval Iddi et al. Brain Inf. (2019) 6:6 Page 5 of 18 widths into account, is the weighted error score (WES). About 44.9% are females, and 55.1% are males. All fol- It is the weighted sum of the absolute difference between low-up data on ADNI-1 and ADNI-GO participants who the predicted and actual values for each continuous did not continue into the ADNI-2 phase, form part of the marker in the test data at each time point. That is, training dataset. In addition, baseline data from individu- als in ADNI-2 are included in the training data to allow ∑N i=1 Ĉi|P̂i − Pi| estimation of their random effects for individual-specific WES = , ∑N Ĉ predictions. The training data consist of 273 ADs, 154 i=1 i CNs and 414 MCIs. The validation dataset consisted of where the weights, Ĉ , is the inverse of the width of the currently available longitudinal data for ADNI-2 (i.e., the i confidence interval of predicted estimates for each indi- ADNI-1 and ADNI-GO who continued into ADNI-2, vidual. High values of MAE and WES denote poor pre- and additional newly enrolled subjects). This validation dictive performance of the model. data consist of 7.7% ADs, 41.2% CNs and 51.1% MCIs. The diagnoses provided by site clinicians is used as Figure 7a, b, in “Appendix”, shows the number of individ- the ‘gold standard’ in assessing the accuracy of the pre- uals at each visit in the training and test sets, respectively. dictions of diagnosis from the random forest algorithm. To impose a minimum standard for visit completion, Performance is assessed on the basis of the overall accu- time points where CDRSB was not observed are omit- racy and balanced classification accuracy (BCA). Overall ted from the analysis dataset. As expected, the number accuracy is defined as the percentage of correct predic- of observations decreases over time from baseline due to tions out of all the predictions made. This metric tends attrition and administrative censoring. Summary meas- to work better for data with balanced classes (e.g., equal ures of baseline outcomes for each diagnosis group are number of CN, MCI, or dementia) but can provide a mis- presented in Table 1. leading assessment of performance for data with imbal- Figure  8a depicts the individual observed trajectories anced classes. To account for possible class imbalance, per outcome and also shows the length of years of fol- we also use the overall BCA. The balanced classification low-up. Figure 8b shows the individual trajectories after accuracy for class, ℓ = 1, 2, . . . , L is obtained from missing values have been imputed. It can be seen that the imputation algorithm appears to generate plausible val- [ ] 1 TPℓ TNℓ BCA = + , ues of missing data. Before fitting the models to the data, ℓ 2 TPℓ + FNℓ TNℓ + FPℓ the original values of the outcomes were transformed where is the number of true positives, is the into percentiles using a weighted empirical cumula-TPℓ FNℓ number of false negatives, TNℓ is the number of true tive distribution function so that all outcomes are on a negatives, and is the number of false positives. That common scale. The weights were constructed using the FPℓ is, for each class, , TP is the number of cases that are inverse of the proportion of disease category for each ℓ correctly predicted by the model and is the number outcome. The predicted values on the transformed scale TNℓ of cases in class, ℓ , which are incorrectly classified into are then back transformed into the original scale. any of the other classes. Similarly, for class, repre- Next, we apply the two-stage approach to the data. Fig-TNℓ ℓ sents the number of cases in the other classes correctly ure  1 shows a schematic diagram depicting the inputs labeled as belonging to class, , and is the number of and outputs at each modeling stage.ℓ FPℓ cases which actually belong to the other classes but are wrongly classified to class, ℓ . These balanced accuracies 4.2 Stage 1 are aggregated to obtain the overall BCA score as follows: The joint mixed-effects models were trained on longi- tudinal data from ADNI-1, ADNI-GO, and only base- L 1 ∑ line data from ADNI-2. We then assessed the ability of BCA = BCA ℓ. L the proposed methodology to accurately predict follow- ℓ up observations of individuals in ADNI-2. Table 2 sum- Higher value of overall accuracy or BCA is indicative of marizes WAIC and LOOIC. Based on these results, the good performance. JMM model seems to be the best fitting model, followed closely by the LTJMM model. Figure 2 shows the corre- lations between random intercepts (above anti-diagonal) 4 Application and model validation and random slopes (below anti-diagonal) from the JMM. 4.1 D escriptive statistics and data preparation Cognitive outcomes share strong correlations [0.7–0.9) The ADNI data consist of 1737 individuals enrolled in with other cognitive measures except for Everyday Cog- ADNI-1, ADNI-GO and ADNI-2, 19.7% of whom have nition (ECog) by participant. There are generally moder- dementia, 30.1% are CN and 50.2% are MCI at baseline. ate correlations [0.5–0.7) among cognitive measures and Iddi et al. Brain Inf. (2019) 6:6 Page 6 of 18 Table 1 Summary measures at baseline for raw and imputed data Diagnosis category Outcomes Imputed data Raw data n Mean SE n Mean SE Dementia ADAS13 342 29.91 0.43 330 29.87 0.44 CDRSB 342 4.39 0.09 338 4.41 0.09 EcogPtTotal 342 1.91 0.02 144 1.90 0.05 EcogSPTotal 342 2.75 0.03 145 2.74 0.05 MMSE 342 23.22 0.11 338 23.22 0.11 MOCA 342 17.52 0.20 142 17.12 0.38 RAVLT immediate 342 22.81 0.41 335 22.85 0.41 FAQ 342 13.14 0.38 337 13.18 0.38 FDG 342 1.07 0.01 242 1.07 0.01 Hippocampus/ICV(× 100) 342 0.38 0.00 272 0.38 0.00 Ventricles/ICV 342 0.03 0.00 315 0.03 0.00 Entorhinal (mm) 342 2829.35 33.72 254 2819.26 42.54 MCI ADAS13 872 16.53 0.23 862 16.53 0.23 CDRSB 872 1.52 0.03 866 1.52 0.03 EcogPtTotal 872 1.84 0.01 468 1.79 0.02 EcogSPTotal 872 1.84 0.02 465 1.72 0.03 MMSE 872 27.59 0.06 866 27.59 0.06 MOCA 872 22.66 0.09 465 23.41 0.15 RAVLT immediate 872 34.24 0.36 866 34.24 0.36 FAQ 872 3.18 0.14 862 3.17 0.14 FDG 872 1.23 0.00 665 1.25 0.01 Hippocampus/ICV(× 100) 872 0.44 0.00 737 0.44 0.00 Ventricles/ICV 872 0.03 0.00 836 0.03 0.00 Entorhinal (mm) 872 3497.43 24.18 733 3497.38 27.67 CN ADAS13 523 9.24 0.19 520 9.24 0.19 CDRSB 523 0.04 0.01 520 0.04 0.01 EcogPtTotal 523 1.41 0.01 290 1.41 0.02 EcogSPTotal 523 1.22 0.01 288 1.21 0.02 FAQ 523 0.24 0.04 520 0.24 0.04 MMSE 523 29.06 0.05 520 29.06 0.05 MOCA 523 25.54 0.09 287 25.76 0.14 RAVLT immediate 523 44.66 0.43 518 44.67 0.43 FDG 523 1.31 0.00 391 1.31 0.01 Hippocampus/ICV(× 100) 523 0.49 0.00 471 0.49 0.00 Ventricles/ICV 523 0.02 0.00 494 0.02 0.00 Entorhinal (mm) 523 3828.36 26.57 468 3840.29 29.09 ADAS13 Alzheimer’s Disease Assessment Scale, CDRSB Clinical Dementia Rating—Sum of Boxes, EcogPtTotal everyday cognition participant, EcogSPTotal everyday cognition study partner, FAQ Functional Assessment Questionnaire, FDG FluoroDeoxyGlucose, MMSE Mini-Mental State Examination, MOCA Montreal Cognitive Assessment, RAVLT Rey Auditory Verbal Learning Test, ICV intracranial volume, CN control, MCI mild cognitive impairment, n number of observations, SE standard error FDG-PET but weaker correlation [0.3–0.5) between cog- and imaging markers (JMMImage) to demonstrate nition and structural MRI measures. There are generally how these marker domains perform individually. Lon- moderate correlations among slopes for structural MRI gitudinal predictions on the validation dataset were measures. obtained from these fitted models. Figure  3 shows the We also performed a Bayesian model averaging to observed data and predicted trajectories for five ran- combine predictions from the JMM, LTJMM and IMM. domly selected individuals for each model (in Fig. 9, we Furthermore, the joint mixed-effect model was fitted show plots for subject #315 and subject # 4263 where to cognitive and function outcomes (JMMCognitive), the models are all in the same panel, and subjects are Iddi et al. Brain Inf. (2019) 6:6 Page 7 of 18 Steps in the Two-Stage Approach Stage 1 Stage 2 (Longitudinal models) (Random Forest Classification) Inputs Outputs Inputs Output Baseline Continuous Baseline characteristics: longitudinal characteristics outcomes: Age, Gender, Diagnosis at Baseline ADAS13, CDRSB all time points diagnosis, Ecog, MMSE (CN, MCI, AD) Years from MOCA, RAVLT Stage 1 Outputs baseline FAQ, FDG, Hippocampus, Ventricles Entorhinal On training data On training data WAIC, LOOIC OOB, Estimate On test data On test data MAE,WES BCA, Accuracy Fig. 1 Schematic diagram showing the inputs and outputs of the two-stage approach Table 2 Model selection criteria performance over time with the JMMs occasionally Model WAIC LOOIC out-performing the other models. The JMM that com- bined both cognitive and imaging outcomes performed IMM 59,687.42 64,136.45 similar to the JMM from cognitive/functional outcomes LTJMM 57,953.63 62,535.86 (JMMCognitive) and JMM from imaging markers JMM 56,576.79 59,728.43 (JMMImage) in terms of weighted error scores. How- WAIC widely applicable information criterion, LOOIC leave-one-out information ever, at time points where the models differed, JMM criterion, IMM independent mixed-effects model, JMM joint mixed-effects with both cognitive and imaging outcomes was gener- model, LTJMM latent-time joint mixed-effects model ally more accurate than JMMCognitive and JMMIm- age. The IMM performed worse for MCI and dementia in different panels for easy comparison). The graph subgroups. shows that the models’ predicted profiles appear to dif- fer only slightly. It is worth noting that, the predicted 4.3 Stage 2 values appear nonlinear because the models were fitted Table  3 shows the confusion matrix summarizing the to transformed values of the outcome and back trans- within-sample classification accuracy of the random formed to the original scale. forest using observed continuous markers and baseline We evaluated the performance of our model predic- predictors in the training set. Predictors in the random tions using metrics on both the continuous markers forest classification algorithm included all continuous and the multi-class diagnosis. The metrics described markers, years from baseline, and baseline characteris- in Sect. 3.3 are used. From Figs. 10 and 4, we observed tics such as age, education, marital status, APOE4 status that predictions from all the joint models performed and gender. An overall out-of-bag (OOB) estimated error quite well over 2  years, yielding lower mean abso- rate of 4.55% was achieved. The variable importance plot lute errors and weighted error scores as compared in Fig. 5 shows the influence of each variable in predict- to the other models. As expected, the MAE and WES ing clinical status. The baseline diagnosis, CDR Sum of increased beyond 2 years. All models yielded consistent Boxes, Study Partner Everyday Cognition, Functional Performance Modeling Metric Iddi et al. Brain Inf. (2019) 6:6 Page 8 of 18 Ventricles_ICV 0.1533 0.1401 0.0481 0.1339 0.1492 0.0624 0.1877 0.1385 0.3272 0.1351 0.4717 1 Hippocampus_ICV 0.2975 0.1967 0.0992 0.2353 0.2339 0.231 0.2564 0.2362 0.3324 0.4463 1 0.6071 Entorhinal 0.3484 0.2501 −0.0315 0.1835 0.2147 0.2698 0.2841 0.2776 0.252 1 0.6449 0.2587 FDG 0.3901 0.2514 0.0874 0.2623 0.3065 0.3048 0.4481 0.3184 1 0.4296 0.5273 0.3933 RAVLT_immediate 0.8245 0.2603 0.1662 0.2067 0.2665 0.5074 0.7177 1 0.6682 0.3749 0.401 0.3464 ρ 1.0 MOCA 0.7679 0.3879 0.2045 0.298 0.3118 0.5705 1 0.7758 0.6498 0.3928 0.372 0.3076 0.5 0.0 MMSE 0.5544 0.2936 0.1812 0.1255 0.2036 1 0.7254 0.7698 0.6674 0.3917 0.3844 0.329 −0.5 −1.0 FAQ 0.341 0.7395 0.1002 0.5843 1 0.7154 0.7493 0.6589 0.6893 0.5322 0.5236 0.3309 EcogSPTotal 0.2579 0.5747 0.3598 1 0.7499 0.563 0.5398 0.5332 0.5796 0.5299 0.5083 0.3534 EcogPtTotal 0.1699 0.2372 1 0.2037 0.1453 0.3399 0.3167 0.369 0.1655 −0.0666 −0.1249 0.1383 CDRSB 0.3554 1 0.2495 0.6693 0.7545 0.6896 0.5896 0.5713 0.535 0.5069 0.4412 0.2479 ADAS13 1 0.6466 0.3065 0.5785 0.7443 0.8164 0.8508 0.9073 0.7576 0.4307 0.4452 0.3597 13 B tal lS E A e l V V AS R tTo To ta Q t G a D P F A S C ia D n C C D P M M MO ed F orh i _Is s_ I A C ogc og S m nt pu e E c _im l E m ric E LT oc a nt Ve RA V ippH Fig. 2 For each pair of outcomes, the correlations among random intercepts are above the anti-diagonal, and the correlations among random slopes are below the anti-diagonal. ADAS13 Alzheimer’s Disease Assessment Scale, CDRSB Clinical Dementia Rating—Sum of Boxes, EcogPtTotal everyday cognition participant, EcogSPTotal everyday cognition study partner, FAQ Functional Assessment Questionnaire, FDG FluoroDeoxyGlucose, MMSE Mini-Mental State Examination, MOCA Montreal Cognitive Assessment, RAVLT Rey Auditory Verbal Learning Test, ICV intracranial volume Table 3 Confusion matrix Actual Predicted Row total Overall class CN MCI Dementia error CN 1218 (96.51%) 44 (3.49%) 0 (0.00%) 1262 0.035 MCI 29 (1.16%) 2382 (95.17%) 92 (3.68%) 2503 0.048 Dementia 0 (0.00%) 107 (4.84%) 2105 (95.16%) 2212 0.048 Column total 1247 2533 2197 5977 0.046 This confusion matrix summarizes the performance of the random forest algorithm for classifying diagnoses based on contemporaneous observations. The table compares actual diagnoses observed in the training set with diagnoses predicted by the random forest based on observed continuous data and baseline predictors CN control, MCI mild cognitive impairment Iddi et al. Brain Inf. (2019) 6:6 Page 9 of 18 ADAS13 CDRSB EcogPtTotal EcogSPTotal ADAS13 CDRSB EcogPtTotal EcogSPTotal 30   2.1 30    2.1      3.0   3.0    4 1.8     4 1.8     20    2.5  20      2.5              1.5  2.0     1.5       2.0   2      2                 10           10                     1.5       1.5      1.2         1.2                                              0              1.0        0          1.0       FAQ MMSE MOCA RAVLT_immediate FAQ MMSE MOCA RAVLT_immediate 20 30            30                              50  SubjectID            SubjectID      25.0  50                15     315 25.010 28   31528                       1261    1261  22.5 40        40 10            22.5      4036  26   4036        26    5    20.0   4263    4263 5           30     20.0     30       4275     24  4275       0         24 17.5 0         FDG Entorhinal Hippocampus_ICV Ventricles_ICV FDG Entorhinal Hippocampus_ICV Ventricles_ICV 1.6        1.6   0.04  0.04  5000  5000   1.5  0.005    1.5   0.005                          1.4     0.03       0.03    1.4                         4000              4000                                                1.3                       1.3  0.02             0.004      0.004 0.02             1.2     1.2      3000   3000          0.01 1.1     0.01           1.1        0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 Years from baseline Years from baseline a IMM. b LTJMM. ADAS13 CDRSB EcogPtTotal EcogSPTotal ADAS13 CDRSB EcogPtTotal EcogSPTotal 30   2.1     2.1       3.0   3.0    4 1.8     4 1.8 20      2.5  20        2.5                  1.5 2.0      1.5  2.0   2    2                      10        10                     1.5       1.5      1.2              1.2                                     0 1.0                            0          1.0       FAQ MMSE MOCA RAVLT_immediate FAQ MMSE MOCA RAVLT_immediate 30             30                                 50   SubjectID      SubjectID          50    25.0   9     10    315  25.0        28  315             28         1261   1261     40   22.5     6    40         26    4036 22.5         4036 5                4263 3 26   4263  20.0     30      30      24  4275     20.0      4275       0         0         24 FDG Entorhinal Hippocampus_ICV Ventricles_ICV FDG Entorhinal Hippocampus_ICV Ventricles_ICV 1.6     1.6      0.04   0.04   5000   5000   1.5  0.005    1.5  0.005                        1.4      0.03          0.03                  4000       1.4                 4000                                             1.3                         0.004 0.02  1.3             0.004 0.02          1.2           3000    1.2  3000              0.01   0.01 1.1      1.1             0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 Years from baseline Years from baseline c JMM. d BMA. Fig. 3 Observed values (points) versus predicted lines (lines) of markers for five randomly selected individuals for each of the four modeling approaches. IMM Independent mixed-effects model, JMM joint mixed-effects model, LTJMM latent-time joint mixed-effects model, BMA Bayesian model averaging, ADAS13 Alzheimer’s Disease Assessment Scale, CDRSB Clinical Dementia Rating—Sum of Boxes, EcogPtTotal everyday cognition participant, EcogSPTotal everyday cognition study partner, FAQ Functional Assessment Questionnaire, FDG FluoroDeoxyGlucose, MMSE Mini-Mental State Examination, MOCA Montreal Cognitive Assessment, RAVLT Rey Auditory Verbal Learning Test, ICV intracranial volume Assessment Questionnaire, and Mini-Mental State along with time-varying age, APOEe4 status and gender, Examination are the features with the highest impor- achieve overall accuracy and balanced classification accu- tance. The random forest predictions using predicted racy above 80% for periods less than 2 years (see Fig. 6). longitudinal markers from the joint models as inputs Between 2 and 5  years, we achieve an overall accuracy Predicted response Predicted response Predicted response Predicted response Iddi et al. Brain Inf. (2019) 6:6 Page 10 of 18 ADAS13 CDRSB EcogPtTotal EcogSPTotal ADAS13 CDRSB EcogPtTotal EcogSPTotal 0.5 3 1.00 20 0.6 0.5 10 0.4 60.75 15 1.0 2 0.3 0.4 0.50 410 0.3 5 1 0.2 0.5 0.25 5 2 0.2 0.1 0 0.00 0 0.1 0.0 FAQ MMSE MOCA RAVLT_immediat FAQ MMSE MOCA RAVLT_immediate 10.0 4 6 16 Model Model 15 7.5 JMM 9 7.5 JMM 3 12 JMMCognitive 10 JMMCognitive4 10 5.0 JMMImage 6 5.0 JMMImage 2 8 IMM IMM 2.5 2 LTJMM 5 3 2.5 5 LTJMM 1 4 BMA BMA 0.0 0 FDG Entorhinal Hippocampus_ICV Ventricles_ICV FDG Entorhinal Hippocampus_ICV Ventricles_ICV 700 0.16 600 6e−04 0.00075 0.0150.0075 700 0.12 500 0.2 400 4e−04 0.0050 0.010 0.08 500 0.00050 300 0.1 0.005 0.04 0.00252e−04 300 0.00025 200 0.000 0.0 2.5 5.0 7.510.0 0.0 2.5 5.0 7.510.0 0.0 2.5 5.0 7.510.0 0.0 2.5 5.0 7.510.0 0.0 2.5 5.0 7.510.0 0.0 2.5 5.0 7.510.0 0.0 2.5 5.0 7.510.0 0.0 2.5 5.0 7.510.0 Years from baseline Years from baseline a CN. b MCI. ADAS13 CDRSB EcogPtTotal EcogSPTotal ADAS13 CDRSB EcogPtTotal EcogSPTotal 2.0 8 4 0.5 1.00 0.4 0.3 1.5 310 0.4 0.75 6 0.3 2 0.3 0.50 0.2 4 1.0 5 0.2 0.2 1 0.25 0.1 2 0.5 0.10 0.00 FAQ MMSE MOCA RAVLT_immediat FAQ MMSE MOCA RAVLT_immediate 3.5 5 15 4 4 Model 10.0 Model6 6 3.0 JMM 4 JMM 7.5 3 JMMCognitive 10 JMMCognitive2.5 3 3JMMImage 4 JMMImage 4 5.0 2.0 2 IMM 2 IMM 2 LTJMM 2.5 2 5 LTJMM1.5 2 1 BMA 1 BMA 1.0 0.0 FDG Entorhinal ippocampus_IC Ventricles_ICV FDG Entorhinal Hippocampus_ICV Ventricles_ICV 350 0.00030 0.16 600 0.005 0.075 300 6e−040.00025 0.0090.12 500 0.004 250 0.00020 400 4e−04 0.006 0.050 0.003 0.08 200 300 0.00015 0.002 0.04 0.0032e−04 0.025 150 200 0.00010 0.001 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 0.0 2.5 5.0 7.510.0 0.0 2.5 5.0 7.510.0 0.0 2.5 5.0 7.510.0 0.0 2.5 5.0 7.510.0 Years from baseline Years from baseline c Dementia. d All. Fig. 4 Validation set weighted error scores over time for each model by diagnosis. CN Control, MCI mild cognitive impairment, IMM independent mixed-effects model, JMM joint mixed-effects model, LTJMM latent-time joint mixed-effects model, BMA Bayesian model averaging, JMMCognitive JMM fitted to cognitive and function outcomes only, JMMImage JMM fitted to imaging markers only of between 60–80%. To facilitate overall comparisons, IMM, LTJMM, BMA, JMMCognitive and JMMimage, we computed BCA aggregated across all the time points respectively. This reinforces the interpretation that the and weighted according to the amount of data available JMM with both cognitive and imaging markers performs at each time point. These weighted aggregate BCAs were better than the models with either cognitive or imaging 88.9%, 85.2%, 86.6%, 87.4%, 87.7% and 85.7% for JMM, markers only. Weighted error score Weighted error score Weighted error score Weighted error score Iddi et al. Brain Inf. (2019) 6:6 Page 11 of 18 DX_bl CDRSB EcogSPTotal FAQ MMSE ADAS13 MOCA RAVLT_immediate EcogPtTotal Hippocampus_ICV Years_bl Entorhinal FDG AGE Ventricles_ICV PTEDUCAT PTMARRY APOE4 PTGENDER 0 200 400 600 800 MeanDecreaseGini Fig. 5 Random forest variable importance for categorical diagnosis (cognitively normal, mild cognitive impairment, or dementia). ADAS13 Alzheimer’s Disease Assessment Scale, CDRSB Clinical Dementia Rating—Sum of Boxes, EcogPtTotal everyday cognition participant, EcogSPTotal everyday cognition study partner, FAQ Functional Assessment Questionnaire, FDG FluoroDeoxyGlucose, MMSE Mini-Mental State Examination, MOCA Montreal Cognitive Assessment, RAVLT Rey Auditory Verbal Learning Test, ICV intracranial volume, PTGENDER participant’s gender, PTMARRY participant’s marital status, PTEDUCAT participant’s education, Year_bl years from baseline, DX_bl baseline diagnosis, APOE4 APOE e4 allele 4.4 S ub‑analysis for subjects with amyloid pathology added benefit with the inclusion of amyloid elevation sta- information tus. This is not too surprising as the diagnostic classifica- To explore the role of amyloid pathology, we applied our tion in ADNI is based solely on the clinical presentation approach to a subset of the original data involving only done without the clinicians’ knowledge of any biomark- individuals with amyloid information in both the training ers. Figure  11a, b shows the predictive performance of and test dataset as described in Sect. 2. Baseline amyloid the continuous longitudinal markers under each of the elevation status was included as a predictor in both the joint models for groups of elevated and non-elevated random forest and multivariate mixed-effects models. amyloid individuals, respectively. We observed that the To highlight the important role of amyloid status in the models predict follow-up biomarkers outcomes better models, we compare the out-of-bag accuracy of the ran- for the individuals with non-elevated amyloid, owing to dom forest with versus without including baseline amy- the fact that these individuals are likely to be more sta- loid status as a predictor on the subset of the training set ble over time. The joint mixed-effects model continues to with observed amyloid status. The OOB estimate of error outperform the other models in terms of accuracy. Clas- rates were 4.99% and 5.13% for analysis with and without sification accuracy of clinical diagnosis is also depicted in amyloid information, respectively. Thus, there is a modest Fig. 12. The random forest based on predictions from the Iddi et al. Brain Inf. (2019) 6:6 Page 12 of 18 100 896 100 896 845 845 766 766 734 734 506 498 80 506 80498 259 181 154 259 Model Model 181 126 a JMM a JMM 154 a JMMCognitive 104 a JMMCognitive 60 126 a JMMImage 60 68 a JMMImage a IMM a IMM a LTJMM a LTJMM 104 68 a BMA a BMA 40 40 20 20 0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 Years from baseline Years from baseline a Diagnosis accuracy. b Balanced classification accuracy. Fig. 6 Comparison of performance metrics for categorical diagnosis. Note that only the LTJMM did not include baseline diagnosis as a covariate. The numbers on the graph represent the number of subjects at each of the time points. IMM Independent mixed-effects model, JMM joint mixed-effects model, LTJMM latent-time joint mixed-effects model, BMA Bayesian model averaging, JMMCognitive JMM fitted to cognitive and function outcomes only, JMMImage JMM fitted to imaging markers only joint models and baseline characteristics again yields bal- using a univariate logistic regression model, their approach ance classification accuracy of above 80% for the first two yielded accuracies of 82% at 5 years and 71% at 10 years. and a half years and declined over time. Again, the joint Using a survival regression approach, Tabert et al. [34] pre- mixed-effects model combined with the random forest dicted conversion from MCI to AD based on neurologi- algorithm consistently outperformed the others. cal batteries used as inputs and adjusted for other study participants’ characteristics. Their approach resulted in 5 Discussion and conclusion a 3-year predictive accuracy of 86%. Time-to-event out- In this study, we have investigated the use of a two-stage comes generally have the ability to improve predictions data-driven approach to modeling and predicting the over univariate logistic regression models. A more recent progression of AD markers and clinical diagnosis. Lon- review by Rathore et al. [35] details how different classifi- gitudinal data were jointly modeled to take advantage cation frameworks have been used as an effective tool for of correlations among outcomes and within individuals. making individualized diagnosis and prediction. Classifi- Random forests were used to derive an algorithm to cat- cation accuracies ranged from 70 to 95% for binary classi- egorize diagnoses. Predictions were assessed on an inde- fication. These accuracies are impressive, but might not be pendent validation set. The approach achieved overall comparable to the accuracies that we have reported. One accuracy and balanced classification accuracy of above reason for the incomparability is that the accuracies that 80% for the first 2 years, but accuracy diminished precipi- we report are based on a held-out test that was not used to tously beyond 2 years. This finding supports the utility of fit models. The accuracies we report also blend initial diag- our two-stage method for predicting disease course over noses and consider all possible transitions (multinomial a limited time frame. The findings also support the use outcome) of disease status rather than the binary approach of machine learning methods to derive algorithms which adopted by these authors. For example, the classifica- might help avoid subjectivity in diagnostic categorization. tion approach by Tierney et al. [33] does not include MCI A number of publications have addresses diagnostic pre- patients. However, it is generally more difficult to discrimi- diction at various stages of AD. For example,  Tierney  et nate between adjacent diagnoses (e.g., cognitively normal al. [33] attempted to predict the onset of dementia at 5 and and MCI) compared to non-adjacent diagnoses (e.g., cog- 10  years based on an initial neurological test battery. By nitively normal and dementia). Overall diagnosis accuracy Overall balanced classification accuracy Iddi et al. Brain Inf. (2019) 6:6 Page 13 of 18 The different approaches we considered for the “stage Abbreviations one” modeling each have their own strengths and weak- AD: Alzheimer’s disease; ADAS13: Alzheimer’s Disease Assessment—Cogni- tive 13-item scale; ADNI: Alzheimer’s Disease Neuroimaging Initiative; APOE: nesses. The independent mixed model, for example, is apolipoprotein E gene; BCA: balanced classification accuracy; BMA: Bayesian easier to fit than the joint mixed-effects models and is model averaging; CN: cognitively normal; CDRSB: Clinical Dementia Rating— also less cumbersome to interpret. However, this model Sum of Boxes; CSF: cerebrospinal fluid; ECog: everyday cognition; ECogPtTotal: ECog participant total; ECogSPTotal: ECog study partner total; FAQ: Functional ignores the correlations among outcomes which are Assessment Questionnaire; FDG: fluorodeoxyglucose; ICV: intracranial generally known to be mild to strong for some pairs of volume; IMM: independent mixed-effects model; JMM: joint mixed-effects AD markers. The correlation matrix of the random model; JMMCognitive: JMM fitted to cognitive and function outcomes only; JMMImage: JMM fitted to imaging markers only; LTJMM: latent time joint effects estimated in this study provides evidence of these mixed-effects model; LOOIC: leave-one-out information criterion; MAE: between-outcome associations. On the other hand, joint mean absolute error; MCMC: Markov Chain Monte Carlo; MMSE: Mini-Mental models are complex, take more computational time, State Examination; MOCA: Montreal Cognitive Assessment; MRI: magnetic resonance imaging; PET: positron emission tomography; RAVLT Immediate: and can be challenging to interpret. In the presence of Rey Auditory Verbal Learning Test Immediate; SUVR: standardized uptake value baseline diagnosis, the conventional joint mixed-effects ratio; WAIC: widely applicable information criterion; WES: weighted error score. model was preferred by the model selection criteria we Acknowledgements considered. The latent-time joint mixed-effects model, We are grateful to the ADNI study volunteers and their families. motivated by the desire to predict long-term trajecto- The Alzheimer’s Disease Neuroimaging Initiative: Data used in prepara- ries with short-term follow-up data, may be useful when tion of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.usc.edu). As such, the investigators baseline diagnosis is unknown. The Bayesian model aver- within the ADNI contributed to the design and implementation of ADNI and/ aging, which aggregates the other models, is probably the or provided data but did not participate in analysis or writing of this report. most complex but helps to account for model uncertainty A complete listing of ADNI investigators can be found at: http://adni.loni.usc. edu/wp-conte nt/uploa ds/how_to_apply /ADNI_Acknow ledg ement _List.pdf. in the estimation of parameters and prediction. Some modifications might improve the prediction Authors’ contributions accuracy of the proposed two-stage algorithm. Instead SI, DL, WKT, MCD conceived the methodological idea for the study. SI, DL and MCD contributed to the writing of the computer codes and performed the of relying on a single time point to predict future course, analysis. PSA and MSR provided expertise in the selection of markers for inclu- one could utilize run-in data from multiple time points, sion and the clinical interpretations of the findings. SI drafted the manuscript which would likely improve estimates of subject-specific with contributions, comments and editing from DL, WKT, MCD, PSA and MSR. All authors read and approved the final manuscript. trajectories. Also, our models only considered a simple linear time trend. And while nonlinear trends were not Funding supported by the data at hand, it is possible that a more This work was supported by Biomarkers Across Neurodegenerative Disease (BAND-14-338179) Grant from the Alzheimer’s Association, Michael J. Fox flexible mean structure might improve model perfor- Foundation, and Weston Brain Institute; and National Institute on Aging mance. Larger datasets and/or improved disease markers Grant R01-AG049750. Data collection and sharing for this project was funded might also serve to enhance the quality of predictions in by the ADNI (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is the future. funded by the National Institute on Aging, the National Institute of Biomedi- The approach can be applied to sharpen clinical trial cal Imaging and Bioengineering, and through generous contributions from inclusion and exclusion criteria to provide target popu- the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb lations with desired predicted longitudinal characteris- Company; CereSpir, Inc.; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and tics, e.g., a cognitively normal population with increased Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company risk of imminent progression to MCI. However, such an Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immu- notherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical application might complicate and prolong the recruit- Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso ment process and eventual drug labeling. Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis In the clinic, these methods can be applied to improve Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Phar- maceutical Company; and Transition Therapeutics. The Canadian Institutes of the accuracy of prognosis. Improved prognostic accuracy Health Research is providing funds to support ADNI clinical sites in Canada. can help physicians, patients, and families make more Private sector contributions are facilitated by the Foundation for the National informed decisions regarding therapies and care through Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated the transitions from healthy cognition, to mild impair- by the Alzheimer’s Therapeutic Research Institute at the University of Southern ment, to dementia. Once effective therapies have been California. discovered, the proposed two-stage approach could be Availability of data and materials fit to clinical trial data to provide a more sophisticated ADNI data are disseminated by the Laboratory for Neuro Imaging at the model of treatment response. Such a treatment response University of Southern California. This work used the TADPOLE data sets https model, would provide personalized “theragnoses,” or pre- ://tadpo le.grand -chall enge.org constructed by the EuroPOND consortium http://europ ond.eu funded by the European Union’s Horizon 2020 research dictions of treatment response; and help make decisions and innovation programme under Grant Agreement No. 666992. on when, and to whom, to prescribe therapies. Competing interests The authors declare that they have no competing interests. Iddi et al. Brain Inf. (2019) 6:6 Page 14 of 18 Appendix: Supplementary appendix See Figs. 7, 8, 9, 10, 11 and 12. Fig. 7 Number of individuals observed at each visit by initial diagnosis. CN Control, MCI mild cognitive impairment Fig. 8 Observed and imputed values. The MissForest algorithm was used to impute missing values which appear to be plausible when compared to observed values at other visits. CN Control, MCI mild cognitive impairment Iddi et al. Brain Inf. (2019) 6:6 Page 15 of 18 Fig. 9 Observed values (points) versus predicted lines (lines) based on only their baseline data for each of the four modeling approaches for subject#314 and for subject#4263. IMM Independent mixed-effects model, JMM joint mixed-effects model, LTJMM latent-time joint mixed-effects model, BMA Bayesian model averaging, ADAS13 Alzheimer’s Disease Assessment Scale, CDRSB Clinical Dementia Rating—Sum of Boxes, EcogPtTotal everyday cognition participant, EcogSPTotal everyday cognition study partner, FAQ Functional Assessment Questionnaire, FDG FluoroDeoxyGlucose, MMSE Mini-Mental State Examination, MOCA Montreal Cognitive Assessment, RAVLT Rey Auditory Verbal Learning Test, ICV intracranial volume Iddi et al. Brain Inf. (2019) 6:6 Page 16 of 18 Fig. 10 Mean absolute error. CN Control, MCI mild cognitive impairment Iddi et al. Brain Inf. (2019) 6:6 Page 17 of 18 Fig. 11 Weighted error score for subset of the population with amyloid burden information. IMM Independent mixed-effects model, JMM joint mixed-effects model, LTJMM latent-time joint mixed-effects model Fig. 12 Comparison of performance metrics on clinical status for subset of the population with amyloid burden information. Note that only the LTJMM did not include baseline diagnosis as a covariate. The numbers on the graph represent the number of subjects at each of the occasions. CN Control, MCI mild cognitive impairment, IMM independent mixed-effects model, JMM joint mixed-effects model, LTJMM latent-time joint mixed-effects model Iddi et al. Brain Inf. (2019) 6:6 Page 18 of 18 Author details TADPOLE challenge: prediction of longitudinal evolution in Alzheimer’s 1 Alzheimer’s Therapeutic Research Institute, Keck School of Medicine, Univer- disease. arXiv: 1805.03909 sity of Southern California, San Diego, USA. 2 Department of Family Medicine 19. Tsiatis AA, Davidian M (2004) A joint modeling of longitudinal and time- and Public Health, University of California, San Diego, USA. 3 Department to-event data: an overview. Stat Sin 14:809–834 of Statistics and Actuarial Science, University of Ghana, Legon-Accra, Ghana. 20. Andrinopoulou ER, Eilers PHC, Takkenberg JJM, Rizopoulos D (2017) 4 African Population and Health Research Center, APHRC Campus, Manga Improved dynamic predictions from joint models of longitudinal and Close, Off Kirawa Road, P.O. Box 10787-00100, Nairobi, Kenya. survival data with time-varying effects using p-splines. Biometrics. https ://doi.org/10.1111/biom.12814 Received: 9 February 2019 Accepted: 17 June 2019 21. Breiman L (2001) Random forests. Mach Learn 45(1):5–32 22. Johnson KA, Sperling RA, Gidicsin CM, Carmasin JS, Maye JE, Coleman RE, Reiman EM, Sabbagh MN, Sadowsky CH, Fleisher AS, Doraiswamy M, Carpenter AP, Clark CM, Joshi AD, Lu M, Grundman M, Mintun MA, Pontecorvo MJ, Skovronsky DM (2013) Florbetapir (f18-av-45) pet to References assess amyloid burden in Alzheimer’s disease dementia, mild cognitive 1. Steyerberg WE (2009) Clinical prediction models: a practical approach to impairment, and normal aging. Alzheimer’s Dement 9(5):72–83 development, validation and updating. Springer, New York 23. Tapiola T, Alafuzoff I, Herukka S-K, Parkkinen L, Hartikainen P, Soininen H, 2. Petersen RC (2004) Mild cognitive impairment as a diagnostic entity. J Pirttila T (2009) Cerebrospinal fluid β-amyloid 42 and tau proteins as bio- Intern Med 256(3):183–194 markers of Alzheimer-type pathologic changes in the brain. Arch Neurol 3. Chong MS, Sahadevan S (2005) Preclinical Alzheimer’s disease diagnosis 66(3):382–389 and prediction of progression. Lancet Neurol 4:576–579 24. Joshi AD, Pontecorvo MJ, Clark CM, Carpenter AP, Jennings DL, Sadowsky 4. Sperling RA, Rentz DM, Johnson KA, Karlawish J, Donohue M, Salmon DP, CH, Adler LP, Kovnat KD, Seiby JP, Arora A, Saha K, Burns JD, Lowrey MJ, Aisen P (2014) The a4 study: stopping ad before symptoms begin? Sci Mintun MA, Skovronsky DM, the Florbetapir F18 Study Investigators Transl Med 6(228):228-1322813 (2012) Performance characteristics of amyloid pet with florbetapir f18 in 5. Rowe CC, Ellis KA, Rimajova M, Bourgeat P, Pike KE, Jones G, Fripp J, patients with Alzheimer’s disease and cognitively normal subjects. J Nucl Tochon-Danguy H, Morandeau L, O’Keefe G et al (2010) Amyloid imaging Med 53(3):378–384 results from the Australian Imaging, Biomarkers and Lifestyle (AIBL) study 25. Li D, Iddi S, Thompson WK, Donohue MC (2017) Bayesian latent time joint of aging. Neurobiol Aging 31(8):1275–1283 mixed effect models for multicohort longitudinal data. Stat Methods Med 6. Donohue MC, Sperling RA, Petersen R, Sun C, Weiner MW, Aisen PS (2017) Res 28(3):835–845 Association between elevated brain amyloid and subsequent cognitive 26. Iddi S, Li D, Aisen P, Rafii M, Thompson WK, Litvan I, Donohue MC (2018) decline among cognitively normal persons. JAMA 317(22):2305–2316 Estimating the evolution of disease in the Parkinson’s Progression Markers 7. Gray KR, Aljabar P, Heckemann RA, Hammers A, Rueckert D, for the Initiative. Neurodegener Dis (Accepted) Alzheimer’s Disease Neuroimaging Initiative (2013) Random forest-based 27. Stan Development Team (2016) Stan modeling language users guide and similarity measures for multi-modal classification of Alzheimer’s disease. reference manual, Version 2.12.0. http://mc-stan.org/ Neuroimage 65:167–175 28. Stan Development Team (2016) RStan: the R interface to Stan, Version 8. Ortiz A, Gorriz JM, Ramirez J, Martinez-Murcia FJ, for the Alzheimer’s 2.10.1. http://mc-stan.org Disease Neuroimaging Initiative (2013) LVQ-SVM based CAD tool applied 29. Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999) Bayesian model to structural MRI for the diagnosis of the Alzheimer’s disease. Pattern averaging: a tutorial. Stat Sci 14(4):382–417 Recognit Lett 34:1725–1733 30. Liaw A, Wiener M (2002) Classification and regression by randomforest. R 9. Stefano FD, Epelbaum S, Coley N, Cantet C, Ousset P-J, Hampel H, Bakard- News 2(3):18–22 jian H, Lista S, Vellas B, Dubois B, Andrieu S, for the GuidAge Study Group 31. Vehtari A, Gelman A, Gabry J (2017) Practical Bayesian model evaluation (2015) Prediction of Alzheimer’s disease dementia: data from the guidage using leave-one-out cross-validation and WAIC. Stat Comput 27:1413– prevention trial. J Alzheimer’s Dis 48:793–804 1432. https ://doi.org/10.1007/s1122 2-016-9696-4 10. Buckley RF, Maruff P, Ames D, Bourgeat P, Martins RN, Masters CL, Rainey- 32. Stekhoven DJ, Buhlmann P (2012) Missforest—nonparametric missing Smith S, Lautenschlager N, Rowe CC, Savage G, Villemagne VL, Ellis KA, value imputation for mixed-type data. Bioinformatics 28(1):112118 on behalf of the AIBL Study (2016) Subjective memory decline predicts 33. Tierney MC, Yao C, Kiss A, McDowell I (2005) Neuropsychological test greater rates of clinical progression in preclinical Alzheimer’s disease. accurately predict incident Alzheimer disease after 5 and 10 years. Neu- Alzheimer’s Dement 12:776–785 rology 64:1853–1859 11. Seixas FL, Zadrozny B, Laks J, Conci A, Saade DCM (2014) A Bayesian 34. Tabert MH, Manly JJ, Liu X, Pelton GH, Rosenblum S, Jacobs M, Zamora D, network decision model for supporting the diagnosis of dementia, Goodkind M, Bell K, Stern Y, Devanand DP (2006) Neuropsychological pre- Alzheimer’s disease and mild cognitive impairment. Comput Biol Med diction of conversion to Alzheimer disease in patients with mild cognitive 51:140–158 impairment. Arch Gen Psychiatry 63:916–924 12. Beheshti I, Demirel H, Matsuda H, for the Alzheimer’s Disease Neuroimag- 35. Rathore S, Habes M, Iftikhar MA, Shacklett A, Davatzikos C (2017) A review ing Initiative (2017) Classification of Alzheimer’s disease and prediction on neuroimaging-based classification studies and associated feature of mild cognitive impairment-to-Alzheimer’s conversion from structural extraction methods for Alzheimer’s disease and its prodromal stages. magnetic resource imaging using feature ranking and a genetic algo- Neuroimage. https ://doi.org/10.1016/j.neuroi mage .2017.03.057 rithm. Comput Biol Med 83:109–119 13. Zheng C, Xia Y, Pan Y, Chen J (2016) Automated identification of dementia using medical imaging: a survey from a pattern classification perspective. Publisher’s Note Brain Inform 3:17–27 Springer Nature remains neutral with regard to jurisdictional claims in pub- 14. Folstein MF, Folstein SE, McHugh PR (1975) Mini-mental state: a practical lished maps and institutional affiliations. method for grading the cognitive state of patients for the clinician. J Psychiatr Res 12(3):189–198 15. Wechsler D (1987) WMS-R: Wechsler Memory Scale-revised. Psychological Corporation, New York 16. Morris JC (1993) The clinical dementia rating (CDR): current version and scoring rules. Neurology 43(11):2412–2414 17. Tang BL, Kumor R (2008) Biomakers of mild cognitive impairment and Alzheimer’s disease. Ann Acad Med Singapore 37:406–410 18. Marinescu RV, Oxtoby NP, Young AL, Bron EE, Toga AW, Weiner MW, Bark- hof F, Fox NC, Klein S, Alexander DC, the EuroPOND Consortium (2018)