Millar et al. Malar J (2018) 17:343 https://doi.org/10.1186/s12936-018-2491-2 Malaria Journal RESEARCH Open Access Detecting local risk factors for residual malaria in northern Ghana using Bayesian model averaging Justin Millar1* , Paul Psychas1, Benjamin Abuaku2, Collins Ahorlu2, Punam Amratia1, Kwadwo Koram2, Samuel Oppong3 and Denis Valle1 Abstract Background: There is a need for comprehensive evaluations of the underlying local factors that contribute to residual malaria in sub-Saharan Africa. However, it is difficult to compare the wide array of demographic, socio-eco- nomic, and environmental variables associated with malaria transmission using standard statistical approaches while accounting for seasonal differences and nonlinear relationships. This article uses a Bayesian model averaging (BMA) approach for identifying and comparing potential risk and protective factors associated with residual malaria. Results: The relative influence of a comprehensive set of demographic, socio-economic, environmental, and malaria intervention variables on malaria prevalence were modelled using BMA for variable selection. Data were collected in Bunkpurugu-Yunyoo, a rural district in northeast Ghana that experiences holoendemic seasonal malaria transmis- sion, over six biannual surveys from 2010 to 2013. A total of 10,022 children between the ages 6 to 59 months were used in the analysis. Multiple models were developed to identify important risk and protective factors, accounting for seasonal patterns and nonlinear relationships. These models revealed pronounced nonlinear associations between malaria risk and distance from the nearest urban centre and health facility. Furthermore, the association between malaria risk and age and some ethnic groups was significantly different in the rainy and dry seasons. BMA outper- formed other commonly used regression approaches in out-of-sample predictive ability using a season-to-season validation approach. Conclusions: This modelling framework offers an alternative approach to disease risk factor analysis that generates interpretable models, can reveal complex, nonlinear relationships, incorporates uncertainty in model selection, and produces accurate predictions. Certain modelling applications, such as designing targeted local interventions, require more sophisticated statistical methods which are capable of handling a wide range of relevant data while maintaining interpretability and predictive performance, and directly characterize uncertainty. To this end, BMA represents a valu- able tool for constructing more informative models for understanding risk factors for malaria, as well as other vector- borne and environmentally mediated diseases. Keywords: Risk factors, Bayesian model averaging, Nonlinear patterns, Statistical methods *Correspondence: jjmillar@ufl.edu 1 Emerging Pathogens Institute, University of Florida, Gainesville, USA Full list of author information is available at the end of the article © The Author(s) 2018. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creat ivecom mons .org/licens es/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creat ivecom mons .org/ publi cdoma in/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Millar et al. Malar J (2018) 17:343 Page 2 of 14 Background One limitation of these approaches for variable In spite of significant global reductions in malaria trans- selection is they do not account for uncertainty in the mission and prevalence over the past decade [1], many selection process, which can produce overconfident districts and municipalities across sub-Saharan Africa predictions. Consider the following contextualized continue to experience high malaria burden [2, 3]. In example provided by Hoetling et  al. [28]: a researcher several instances, residual malaria transmission has has gathered a comprehensive data set on potential risk persisted despite widespread coverage of conventional factors of malaria, and wants to construct a model in malaria interventions, such as insecticide-treated bed order to compare risk factors and make predictions. netting (ITN) and indoor-residual spraying of insecti- They use a variable selection procedure, which identi- cides (IRS) [4, 5]. An important factor contributing to fies a specific model, M*, as having the best fit based residual malaria transmission is a high degree of spatial on some information criterion, which is then used to heterogeneity [5]. Malaria prevalence can differ dramati- compare risk factors, make predictions, and inform cally [6], even over relatively short distances [7], which interventions. Suppose that there exists an alterna- has the potential to undermine universal intervention tive model, M**, which has nearly as good of fit but guidelines [8]. Similarly, some subpopulations might have consists of a different set of covariates and produces a substantially higher malaria risk than other groups. different effect sizes and/or predictions. In this case, Identifying these hotspots and hot-pops is critical for the researcher should have less certainty in M*. Hoet- developing targeted approaches to reduce malaria bur- ing et  al. [28] demonstrates that this scenario where den and guide holoendemic areas towards malaria elimi- uncertainty in model selection is ignored is very com- nation [7, 9]. mon and unfortunately typical variable selection meth- Local risk factors for malaria can be difficult to char- ods do not provide a mechanism for incorporating this acterize due to the wide range of variables that can be uncertainty. relevant to malaria epidemiology [10]. Studies on malaria Bayesian model averaging (BMA) is an alternative risk factors have often focused on particular types or approach to variable selection which fully accounts categories of variables, such as models based on envi- for uncertainty associated with the model selection ronmental data [11–13], or demographic and socio-eco- process [29]. Previous studies outside of the field of nomic factors [14–17]. However, as information becomes disease control have demonstrated that BMA often out- more accessible and available at finer geographic and performs other methods of variable selection [30–32]. temporal resolutions, malaria risk models have sought to This technique has been adopted in many modelling incorporate a greater variety of explanatory data [18–21]. applications [33], such as weather forecasting [34, 35], Additionally, the importance of complex patterns, such phylogenetics [36], and hyperspectral image analysis as nonlinear relationships and seasonally-dependent [32, 37]. While BMA is not new to modelling disease shifts, has emerged as a significant component to model- risk factors [38, 39], recent applications (i.e. in the past ling malaria risk [22, 23]. 15  years) are uncommon, and to our knowledge BMA Incorporating a wider range of explanatory informa- has yet to be used in the context of malaria or other tion into disease risk factors models can be difficult when arthropod-transmitted diseases. Given the wide vari- using traditional statistical approaches such as standard ety of factors that contribute to malaria, the increased logistic regression. Having a large number of predictors attention to complex patterns, and the increasing avail- or independent variables (i.e. potential risk factors) can ability of data, BMA could represent a valuable statisti- lead to overfitting [24], which can decrease the accuracy cal tool for enhancing risk factor models and designing of out-of-sample predictions and increase the probability targeted interventions. of detecting spurious relationships, which in the context In this study, BMA was used to identify the underly- of disease risk factor analysis can undermine the applica- ing factors that shape the spatiotemporal patterns of bility towards guiding interventions. Additionally, tradi- malaria prevalence in a district located in the Guinea tional statistical models often make critical assumptions, savannah zone of northern Ghana that experiences high such as linearity. These shortcomings have led to the seasonal malaria transmission [40, 41]. The article dem- application of sophisticated variable selection methods, onstrates how BMA can be used to identify seasonal which are able to incorporate more independent varia- differences and nonlinear relationships in malaria risk bles and model complex relationships without sacrificing factors, compare the performance of BMA to standard forecasting accuracy by reducing dimensionality. Exam- logistic and Lasso regression, and describe how BMA ples of variable selection methods that have been used results can be useful for designing targeted malaria for malaria-related data include stepwise regression [20], intervention strategies. ridge regression [25], and Lasso regression [26, 27]. Millar et al. Malar J (2018) 17:343 Page 3 of 14 Methods marginalized group across northern Ghana, are recog- Site description nized as more culturally conservative and in general tend Data were collected from the Bunkpurugu-Yunyoo dis- to be less educated [43]. trict, Northern Region, which is in the Guinea savannah zone of northeastern Ghana and experiences recurring Data collection high levels of seasonal malaria transmission (Fig.  1). The individual-level longitudinal dataset was collected Two highly efficient malaria vectors predominate in this in the course of operations research on IRS, which was area, namely Anopheles gambiae sensu stricto (s.s.) and conducted by the University of Ghana with the support Anopheles funestus [42]. During the study period cover- of the President’s Malaria Initiative [44, 45]. The current age of long-lasting insecticide-treated bed net (LLINs) study, which was carried out at the University of Flor- was greater than 75%, having benefitted from two mass ida in collaboration the University of Ghana, consisted distribution campaigns in 2010 and 2012. Furthermore, fundamentally of enhancing that original dataset with annual IRS campaigns were conducted in 2011 and 2012 remote-sensed variables and conducting follow-on analy- using alphacypermethrin 0.4% WP (ICON®10CS, Syn- ses to address a different set of research objectives. genta, Basel Switzerland), with a second application of Children between the ages of 6 to 59 months were sur- IRS provided in the dry season in the eastern portion of veyed in six biannual surveys, three during the rainy sea- the district. son (late October to November) and three during the dry The district is composed of rural communities sup- season (late March to April), from 2010 to 2013. A new ported by small-scale farming and herding, and two representative sample was selected for each survey using modest urban centers: Bunkpurugu (population: 7436) a multi-stage randomized cluster sampling technique. and Nakpanduri (population: 5783). The major ethno-lin- Probability proportional to size estimates were used to guistic groups are the Bimoba (approximately 60%) and randomly select representative communities based on Konkomba (approximately 30%) with smaller populations a Ghana Health Service roster of communities in the of Mamprusi, Kusasis, Dagombas, Fulanis and others. district. This sample covered approximately 20% of the The Bimoba tend to predominate in the higher ground of under-five population in each survey, based on 2010 cen- the north and east portions of the district, including the sus data. Individuals under 6 months old were removed two urban areas, while the Konkomba are more prevalent from this analysis to eliminate the influence of maternal in the lower lying area of the south and west, where they immunity. Each survey was conducted over a 3-week graze their cattle in the riverine plains. The Konkomba, period. Malaria status was assessed via blood-film who tend to be a geographically and economically microscopy. The survey also captured data on relevant Fig. 1 Map of study district, Bunkpurugu-Yunyoo, in northern Ghana (red polygon show in insert map). Interpolations depict malaria prevalence in young children (ages 6–59 months) in Bunkpurugu-Yunyoo, Ghana during the rainy and dry seasons in the left and right maps, respectively. Six biannual surveys were collected from 2010 to 2013 and pooled by season. Black circles denote the sampled communities and yellow stars denote local urban centers. Interpolations were made using inverse-distance weighted function in ArcGIS 10.3 Millar et al. Malar J (2018) 17:343 Page 4 of 14 demographic, socioeconomic, and malaria intervention each of the subsequent surveys. The number of individuals variables (Table  1), using a modified Malaria Indicator in each survey ranged from 1341 to 1788. Survey questionnaire. GPS coordinates were recorded for a central point in each community center. The original Statistical methods dataset was enhanced by collecting additional informa- Base model tion on environmental variables using GIS software (Arc- All malaria risk models were constructed using the same Map 10.4) and freely available remote sensing sources general Bayesian framework. Let yijt be the binary micros- (Table  2). Childhood malaria prevalence in this district copy outcome (1 = positive, 0 = negative) for individual i exhibited a high degree of spatial heterogeneity over the in community j at time t. This variable was modelled using study period, in both the rainy and dry seasons (Fig. 1). a Bayesian probit regression model, assuming that: Correlations between all potential risk factors were cal- culated, and in cases of high correlations (R2 > 0.49), a sin- yijt = 1 if zijt > 0 y gle representative covariate was selected (see Additional ijt = 0 otherwise file 1). Selection of these covariates was based on the rele- vance of each covariate to malaria epidemiology and inter- In other words, individual i in community j at time t is vention strategies. The covariates dropped from all models positive for malaria only if zijt is greater than zero. This is were farming caretakers, indoor residual spraying (IRS) in determined by: past 7 months, average daytime land surface temperature, ( ) normalized difference vegetation index (NDVI), cumula- z Tijt ∼ N xijtβ , 1 tive rainfall, and historical precipitation trends. Because BMA requires each individual to have information for all covariates, all individuals with at least one covariate with where xTijt is a vector of the intercept and potential risk missing data were dropped from the analysis. As a conse- factors, and β is a vector with the corresponding regres- quence, all analyses were based on 10,029 children (84.0% sion parameters. Finally, the priors were specified as: of total dataset). These data were distributed across 80 ( ) communities in the first survey and 71 communities in β ∼ N 20, σ Σ Table 1 Potential risk or protective covariates collected from surveys Variable Details Demographic and socio-economic Age From 6 to 59 months old Caretaker’s education Binary variable; either (1) for high school education and above or (0) otherwise Caretaker’s age In years Ethnicity Four groups; (1) Bimoba, (2) Konkomba, (3) Mamprusi, and (4) Other, based on language of caretaker Farming c aretakera Binary variable; either caretaker occupation being farming (1) or otherwise (0) Gender Binary variable; either male (1) or female (0) Surface water source Binary variable; either (1) source of drinking water from exposed surface water or (0) otherwise Thatch roofing Binary variable; either housing structure had a thatched roof (1) or otherwise (0) Wealth quintile Constructed from multiple variables, using the methodology of the Ghana Demographic Health Survey (2008) [80] Malaria intervention Health insurance—personal Binary variable; either personal access to health insurance (1) or not (0) Health insurance—community Binary variable; either (1) for ≥ 80%b community coverage of sampled population or (0) otherwise IRS in past 7 monthsa Binary variable; either individual household having been treated with IRS in past 7 months (1) or not (0) IRS in past year Binary variable; either individual household having been treated with IRS in past year (1) or not (0) Indoor residual spraying (IRS)—community Binary variable; either (1) for ≥ 80%b community coverage or (0) otherwise coverage Insecticide treated nets (ITN)—personal Binary variable; either (1) if net was used in previous night or (0) otherwise ITN—community coverage Binary variable; either (1) for ≥ 80b % community coverage or (0) otherwise Personal medication use Binary variable; either (1) used in the past 2 weeks or (0) otherwise a Removed from models due to high correlations (R2 ≥ 0.49) with one or more other variables b Based on targets from Roll Back Malaria Millar et al. Malar J (2018) 17:343 Page 5 of 14 Table 2 Potential risk or protective covariates collected from remote sensing and GIS-based sources Variable Source/satellite Details Distance to health facility GIS-derived Euclidean distance from active health facility at time of survey (based on survey location) Distance to main roads GIS-derived [81] Euclidean distance from major roads Distance to urban centers GIS-derived Euclidean distance from center with population ≥ 5000 individuals Distance to water bodies GIS-derived [82] Euclidean distance from rivers and standing water bodies Elevation CGIAR SRTM [83] Meters above sea level Land surface temperature—daya NASA (Terra) MOD13A3 (Aqua) MYD13A3 [84] Average monthly daytime temperature (in degrees Celsius) 30 days prior to a survey Land surface temperature—night NASA (Terra) MOD13A3 and (Aqua) MYD13A3 [84] Average monthly nighttime temperature (in degrees Celsius) 30 days prior to a survey Normalized difference vegetative indexa NASA (Terra) MOD13A3 and (Aqua) MYD13A3 [85] The maximum monthly index 30 days prior to a survey Population density WorldPop [86] Population density per 100 m grid, log-transformed Population density (≤ 5 y.o.)a WorldPop [86] Population under 5 years of age density per 100 m grid, log-transformed Rainfall (historical)a WorldClim [87] Average of the cumulative sum of precipitation from 3 to 1 month prior to the survey date from past 50 years Rainfall (current)a FEWSNET [88] Average of the cumulative sum of precipitation from 3 to 1 month prior to survey Slope GIS-derived (from elevation) a Removed from models due to high correlations (R2 ≥ 0.49) with one or more other variables health facility, through the use of linear splines. These σ ∼ Unif (0, 100) variables were selected based on outcomes from the base where the matrix Σ in the prior for β is a diagonal matrix model and their applicability towards design interven- [ ] with diag(Σ) = 100 1 . . . 1 . Similar Bayesian regres- tions. The creation of linear splines consists of first a set sion frameworks have been used in disease risk factors of m values within the domain of the covariate x, referred analyses, including for HIV and tuberculosis [46, 47], as to as knots k1, . . . , km . For which knot, a “new” derived well as malaria [48]. covariate xd is created in the following way: { 0, x < kd Complex models: seasonal differences and nonlinear xd = x − kd , x ≥ kd associations In addition to the general risk factor regression, extended versions of the base model were created by including resulting in m additional derived variables for each additional derived covariates in order to describe com- splined covariate. Knot values were selected at the 20, 40, plex patterns. First, a model was constructed to evaluate 60 and 80% quantiles of the observed variables. Including whether the effect of risk factors differed between the dry these splines allows the effect of these variables to shift at and rainy seasons. For example, distance to the nearest the knot values, which can reveal nonlinear associations health facility may be a strong risk factor in the rainy sea- in the specified risk factors. Seasonal interaction terms son but may be an irrelevant covariate during the dry sea- were also included in this model, which allowed these son. This was modelled by including additional elements nonlinear patterns to also differ in each season. in the design vector xTijt representing the interaction of each covariate with the binary variable representing the Bayesian model averaging rainy season. This model allows the parameter estimates Each of the models discussed above can be fitted using for each covariate to vary by season. Risk or protective a Markov chain Monte Carlo (MCMC) algorithm. Vari- factors that vary substantially with season may suggest able selection was incorporated into this MCMC algo- that different malaria intervention strategies could be rithm by implementing a reversible jump MCMC [49]. required for each season. The MCMC is initialized with a model containing a sub- Finally, this framework was used to describe potential set of the possible covariates. At each iteration of the nonlinear patterns in two relevant continuous variables, MCMC a new candidate model is proposed using a ran- distance to nearest urban centre and distance to nearest domly selected move; either a birth (addition of a new Millar et al. Malar J (2018) 17:343 Page 6 of 14 covariate), death (removal of an included covariate), or Bayesian regression, the regression coefficient (β) for swap (switching an included covariate with an excluded each covariate is estimated based on posterior draws covariate). The candidate model is then either accepted and considered statistically significant if the 95% cred- or rejected based on the marginal log-likelihood. Inform- ible interval did not contain zero. Note that this is the ative covariates (and combination of covariates) will have “model averaging” component of BMA, as individual a tendency to increase the marginal likelihood, and there- posterior samples are based on different parameter fore tend to be retained in the selection process, while spaces. This allows for the uncertainty associated with less informative covariates are more likely to be excluded. variable selection to be incorporated into parameter The marginal probability associated with a particular estimation. More in-depth descriptions of this model model Mq, defined by the subset of covariates q, can be and how it is fit are provided by Zhao et  al. [32] and calculated in closed form after integrating out the associ- Denison [51]. ated regression parameters βq . This is given by: ∫ Out‑of‑sample predictions ( ) ( ) ( ) p M |z σ 2 ∝ N z|X β I β | σ 2q , q q , N q 0, � dβq As illustrated in the preceding sections, allowance for greater model flexibility can be achieved through the additional of several derived covariates and their associ- ated parameters. Specifically, the base model contained pq+1 ( ) ( )− [ ]1 ∣ ∣ 1 29 covariates, adding interaction terms increased the 2 ∝ σ 2 exp − −µT −1 ∣ ∣ 2q Tq µq Tq 2 number of covariates to 56, and adding linear splines expanded the model to include a total of 73 covariates. Increasing the number of parameters in a model can lead where pq is the number of covariates in subset q, to overfitting, making variable selection an increasingly { } −1 = XTX 1 −1 TT q +  and µ = TqX z. important task. To assess the out-of-sample performance q q σ 2 q q q of BMA, we performed predictions by training the model The prior for each model were then set to on data from a particular year and estimating malaria sta- ( )−1 ( ) P tus for a future year. Due to high seasonality in malaria p M ∝ (P + 1)−1q , where P is the overall pq risk in the district, only same-season predictions were ( ) P considered (i.e. rainy season predictions were based on number of covariates. In this expression, counts all pq a rainy season training dataset). These predictions were compared to standard logistic regression, as well as least the possible combinations of pq elements out of P and 1P +1 absolute shrinkage and selection operator (Lasso) regres- is a discrete uniform distribution for all possible number sion. Lasso is an alternative method for variable selection of covariates 0, …, P. which has been shown to improve out-of-sample predic- As mentioned above, the algorithm explores model tions [52]. To demonstrate how these models performed space by randomly proposing the birth of a new covariate relative to the number of covariates, season-to-season or the death or swap of an existing covariate. These pro- predictions for the base model and the extended model posed moves are then accepted or rejected using a stand- which contained seasonal interactions and spline terms ard Metropolis–Hastings acceptance ratio given by: were performed. Out-of-sample predictive skill was eval-   pq∗+1 � � �� � � 1 � � � �  − � � � �  e σ 2 2 exp − 1 −µT −1   p M ∗ |z, σ 2 p M ∗ q∗T � � 2 2 q∗ µ ∗ T q q q q ∗  min 1, � � � � = min 1, × R p M z p +1 � � �� q| , σ 2 p M q � � 1� �q    σ 2 − 2 exp −1− 1 −µT µ T 2  T � �  2 q q q q where R is typically equal to 1 and Mq∗ and Mq are the uated based on the sum of the log likelihood, where the proposed and current models, respectively. model with the largest log likelihood sum was considered This approach was used to fit a customized Gibbs to have the best predictive ability. sampler (see Additional file 1) using R software (v3.3.1) [50]. Each model was run for 10,000 iterations with the Results first 1000 iterations dropped to account for the burn- Descriptive analysis in period. Convergence on the parameter estimates There was a slight decreasing trend in malaria preva- was confirmed using trace plots. Similar to traditional lence over the course study, however the distribution Millar et al. Malar J (2018) 17:343 Page 7 of 14 of seasonal community prevalence remained relatively risk factors are present amongst the early childhood consistent over the course of the study (see Additional populations in Bunkpurugu-Yunyoo (Fig. 2). The strong- file  1). Mean community prevalence (and interquartile est risk factor associated with malaria infection was rainy ranges) in the three rainy season surveys were 0.57 (0.39– season (mean regression coefficient equal to 0.647 with 0.75), 0.52 (0.33–0.73), and 0.46 (0.27–0.61), whereas a credible interval (CI) of 0.565–0.728), as evident in in the three dry season surveys these values were 0.35 the prevalence maps (Fig.  1). Age was also a significant (0.15–0.50), 0.31 (0.14–0.47), and 0.23 (0.10–0.33). The risk factor (0.296, CI 0.268–0.324), as would be expected parasitaemia rate remained high during the final rainy among young children (i.e. less than 5  years old) in an season despite high coverage of ITNs and 2 years of IRS, area of stable, holoendemic malaria. Among the distance highlighting the importance of devising complementary measures, distance to nearest health facility (0.094, CI malaria control strategies based on the local risk factors. 0.056–0.131) and urban centers (0.183, CI 0.137–0.229) were significant risk factors, whereas distance to near- Risk factor outcomes est road (0.014, CI 0.00–0.060) and water body (− 0.013, Base model CI − 0.052 to 0.00) had little to no effect. The Konkomba The basic risk factor model with BMA variable selection communities experienced significantly higher malaria detected that many expected, classic patterns of malaria risk (0.233, CI 0.137–0.323), relative to the Bimoba, and Fig. 2 Mean slope estimates (circles) and 95% credible intervals (horizontal grey bars) from probit regression parameters. Variables whose credible intervals do not include zero are considered significant (labelled in bold). Risk factors (positive slopes) and protective factors (negative slopes) are shown in red and blue, respectively Millar et al. Malar J (2018) 17:343 Page 8 of 14 generally had a high mean prevalence overall. Note that experience higher malaria burden in the rainy season, this represents the risk associated with ethnicity are children in the upper end of the observed age range adjusting for other covariates in the model, such as edu- (50–59  months old) experienced nearly the same pre- cation, wealth, and elevation. Statistically significant pro- dicted prevalence in the dry season as they did in the tective factors were access to health insurance (− 0.463, rainy season (Fig. 3). Another important finding refers CI − 0.530 to − 0.391) and mother’s education (− 0.220, to ethnicity. All ethnic groups experienced increased CI − 0.300 to − 0.141). Elevation was a significant fac- malaria burden in the rainy season, however predicted tor, however given the relatively narrow range in eleva- mean prevalence based on seasonal-ethnicity interac- tions (135–449 meters above sea level) this is likely a tion terms indicate that the increase in malaria preva- consequence of the two urban centers being in higher lence during the rainy season was more intense for elevation, not because of high-altitude effects on local the Konkomba communities than for the other ethnic climate. IRS in the past year was also a significant protec- groups (Fig. 3). For example, the odds-ratio associated tive factor (− 0.154, CI − 0.254 to − 0.050). The categori- with the effect of Konkomba ethnicity compared to cal variables for wealth quintiles did not have statistically Bimoba ethnicity increased from 1.27 in the dry season significant effects individually, however as a group these to 1.60 in the rainy season. By comparison, the odds- variables indicated that the lower wealth quintile groups ratio associated with the effect of Mamprusi ethnicity (below median and well below median) were positively compared to Bimoba ethnicity were 1.09 and 1.15 in the associated with malaria prevalence. dry and rainy seasons, respectively. Other marginal dif- ferences included health insurance, which was less sig- Seasonal differences nificant of a protective factor in the rainy season, and Modelling malaria risk with seasonal interaction terms personal medication use, which was a moderate risk (see Additional file  1 for regression coefficients) sug- factor in the dry season but had relatively no influence gested most risk factors did not exhibit prominent dif- in the rainy season (see Additional file 1). ferences between the rainy and the dry seasons, with a few notable exceptions. Age was an important risk Nonlinear associations factor for malaria in both seasons, however the slope The final model containing linear spline covariates estimate for this parameter was significantly lower in revealed interesting nonlinear associations between the rainy season than in the dry season, as illustrated malaria prevalence and distance to nearest urban centre, in Fig.  3. These patterns suggest that while all ages and distance to nearest health facility (Fig.  4). Distance Fig. 3 Modelled patterns in malaria risk factors based on Bayesian probit regression containing seasonal interaction terms. The left panel depicts mean slope estimate (lines) and 95% credible intervals (polygons) for the predicted malaria prevalence based on age in the rainy and dry seasons. The right panel depicts the mean (points) and 95% credible intervals (vertical bars) for the predicted malaria prevalence based on ethnic group in the rainy and dry seasons Millar et al. Malar J (2018) 17:343 Page 9 of 14 to nearest urban centre was positively associated with Table 3 Predictive comparisons of  models based malaria infection in a roughly linear pattern until about on the sum of the log-likelihood 12–14  kilometres (km), after which malaria risk began Training Testing Sum of log‑likelihood to plateau. Similarly, the implied malaria risk was greater for communities that were further away from the near- Logistic Lasso BMA est health facilities, however there was a less steep rela- Base model (p = 29)a tionship after approximately 2–4  km. These nonlinear Rainy 2010 Rainy 2011 − 1072.27 − 1049.24 − 1028.24b patterns in malaria risk and proximity to urban centres Rainy 2011 Rainy 2012 − 1055.39 − 1037.49 − 1032.57b and health facilities were consistent in the rainy and dry Rainy 2010 Rainy 2012 − 1153.62 − 1110.05 − 1057.07b seasons. Dry 2011 Dry 2012 − 969.88 − 919.45b − 921.54 Dry 2012 Dry 2013 − 915.95 − 897.03b − 903.60 Out‑of‑sample predictions Dry 2011 Dry 2013 − 967.83 − 920.28 − 915.81b Based on the sum of the log-likelihood, BMA and Lasso Average − 1022.49 − 988.92 − 976.47 regression both outperformed standard logistic regres- Model with interactions and splines (p = 73)a sion for all predictions (Table  3). Both approaches Rainy 2010 Rainy 2011 − 1079.63 − 1042.02 − 1027.85b improved the out-of-sample predications compared to Rainy 2011 Rainy 2012 − 1066.56 − 1035.44 − 1030.75b standard logistic regression by shrinking the regres- Rainy 2010 Rainy 2012 − 1156.76 − 1092.52 − 1050.55b sion coefficient estimates towards zero (Fig.  5). A nota- Dry 2011 Dry 2012 − 1065.27 − 1029.66 − 921.05b ble difference between these approaches is that BMA Dry 2012 Dry 2013 − 922.40 − 902.79 − 902.32b allows for near-zero parameter estimates, whereas Lasso Dry 2011 Dry 2013 − 1079.34 − 1059.24 − 917.82b will force marginal factors to zero. For the base set of Average − 1061.66 − 1026.95 − 975.06 covariates, BMA and Lasso had similar likelihood val- ues, however BMA had higher likelihood values for all BMA Bayesian model averagea p refers to the number of covariates in the model predictions based on the extended set of covariates, b Indicates the model with the best fit which included seasonal interactions and linear splines. In particular, note that the out-of-sample predictive skill of BMA increased slightly for the extended model rela- suggest that BMA is noticeably more resistant to overfit- tive to the base model whereas the predictive skill of the ting than Lasso or logistic regression as the number of logistic regression model and the Lasso often (or always) parameters is substantially increased. decreased when comparing these models. These results Fig. 4 Implied patterns in malaria prevalence and distance to urban center (left) and distance to health facility (right) based on Bayesian probit regression model containing linear splines and seasonal interactions. Results for the rainy and dry seasons are shown in blue and yellow, respectively. The open circles depict where slopes are allowed to change (i.e., knot locations), selected at 20% quantiles of the observed data Millar et al. Malar J (2018) 17:343 Page 10 of 14 Fig. 5 Regression parameter estimates using BMA (black), logistic regression (red), and Lasso regression (gray) models containing interactions terms and spline covariates (73 independent variables). Parameter estimates were ordered according to the logistic regression results to better illustrate the shrinkage of coefficients associated with the BMA and Lasso algorithms Discussion with poor mixing of chains that often plague these other Methodological findings variable selection approaches [32, 53]. Machine learn- These findings lend strong support for the usefulness ing techniques, such as artificial neural networks and of Bayesian model averaging (BMA) as a statistical tool support vector machines, can also detect nonlinear and for detecting complex patterns in malaria risk factors. other relationships and often have better predictive In order to promote reproducibility of these methods performance than standard logistic regression, but it and findings, the code used to run this analysis has been is typically difficult to make direct inferences about the provided in Additional file  1 and have placed the data role of individual covariates using these techniques [60]. and R scripts in a public repository (see “Availability of BMA may be useful for specific applications to model- data and materials” section), and note that packages for ling malaria risk factors where both interpretability and similar model selection and averaging approaches using predictive ability are important (such as designing locally OpenBUGS and R are available [53, 54]. Moreover, BMA targeted interventions). in this context demonstrated similar advantages over The trade-off between interpretability and predictive standard variable selection procedures found in simula- skill, spatial and temporal scope, data accessibility, and tion studies [30–32], and studies in other ecological con- computational limitations are important factors to con- texts [32–37, 55], including epidemiological risk analysis sider when choosing a variable selection procedure. For [38, 39]. Unlike standard logistic and Lasso regression, example, Weiss et al. [23] describes an exhaustive analy- increasing model complexity by including several addi- sis of variable selection for identifying environmental fac- tional covariates did not reduce the out-of-sample pre- tors associated with Plasmodium falciparum prevalence dictive performance when using BMA. Importantly, across sub-Saharan Africa, containing over 50 million constructing confidence intervals for Lasso regression covariates. They then used a series of selection phases coefficients continues to be an ongoing area of research based on Akaike information criteria (AIC) to reduce the [56–58], whereas characterizing uncertainty via cred- number of covariates. This procedure was able to distill a ible intervals in the Bayesian framework is straightfor- parameter space that would be computational impossible ward. Credible intervals are also often a better approach to explore using BMA, however it lacks interpretability for comparing the strength of associations when com- and does not account for uncertainty in the selection pro- pared to other traditional metrics, such as p-values [59]. cess. These tradeoffs may not be significant for prediction Reversible jump MCMC tends to be computational effi- applications, but are critical for the analysis in this study cient and effective, and by integrating out the regression to appropriately generate inference on the significance of coefficients this Gibbs sampler avoided issues associated different predictor variables. Millar et al. Malar J (2018) 17:343 Page 11 of 14 Inferences on malaria risk factors and control strategies to nearest health facility became less pronounced after Another area for further analysis is utilizing the model about 2–4 km. Comparable rate stabilization patterns at interpretability characteristics of this framework for similar distances have been described in health facilities informing management applications. The BMA-based in rural regions of Kenya [75]. Distance to nearest health analysis described well-established patterns in malaria facility is known to be an important factor in treatment- aetiology across sub-Saharan Africa, including the strong seeking behaviour and health outcomes [76–78]. Access seasonal patterns in malaria transmission [40, 41] and to healthcare is a guiding management principle in age-related prevalence patterns [61]. Protective factors Ghana, as demonstrated by the expansion of access to identified in our models, including access to health insur- health insurance and revitalization of the Community- ance and mother’s education, have also been described as Based Health Planning and Services (CHPS) programme. important factors in similar settings [62, 63]. In addition Future work with these data will build upon these find- to validating these data and the methodology, these find- ings to describe the impact of CHPS facilities on early ings may provide insight for guiding local intervention childhood malaria in Bunkpurungu-Yunyoo, as well as and control strategies. For instance, the protective effect project the potential impact of new CHPS facilities and of personal health insurance coverage, which was detect- optimize their locations. able in one of Ghana’s more remote corners, underscores Bayesian model selection approaches, like BMA, are the value of Ghana’s pioneering effort to institute and likely to find its greatest value in forecasting applications, scale up a national health insurance scheme since 2006 in which model interpretability, predictive performance, [64]. and uncertainty characterization are equally valued. The capacity to increase model complexity without Bayesian frameworks often require a deeper understand- sacrificing predictive performance is an important mod- ing in statistical theory and programming, can be com- elling characteristic, particularly when inference is used putationally intensive, and may lack accessible tools/ to inform management strategies [65]. The inclusion of software, but offer many advantages for modelling epi- seasonal interaction terms revealed seasonal differences demiological data, including high flexibility and intuitive in age- and ethnicity-related risk that may be useful for expressions of inference and uncertainty [79]. Based on designing seasonal chemoprophylaxis interventions, background literature review, this appears to be the first which can be extremely effective method for reduc- instance of using BMA for variable selection to model ing cost and maximizing impact depending on the local malaria risk factors. This methodology offers a flexible malaria dynamics [66–69]. The linear spline covariates framework with many advantages over other methods for allowed the model to describe the nonlinear protective modelling disease risk factors. buffer provided by the modest urban centers. The link between urbanicity and malaria transmission has been Limitations extensively discussed in the literature [70–73], but under- From a methodological perspective, the outcomes from standing the relative impact modest urban centers can this study provide promising support for BMA as a use- have on health outcomes in rural regions can be challeng- ful statistical tool for modelling highly dimensional data ing [74], particularly at small spatial scales. The revealed on malaria risk factors, however there are notable limi- nonlinear relationship between malaria risk and distance tations. The analysis uses a single data set, and therefore to urban centers suggested that the risk associated with further efforts are needed to corroborate these find- living far from the urban center eventually reaches a pla- ings. It may be that at certain dimensionalities BMA teau around 12  km in Bunkpurugu-Yunyoo. This is an less effective than the other methods tested in this arti- interesting finding considering that the increased hous- cle, and therefore this analyses should be applied to ing density, reduced non-polluted water resources, and other data sets, particularly at different spatio-temporal other urban characteristics resolved about 2–3 km from scales. Other comparison criteria (such as area under the the centers of the towns, based on field observation and receiver operating curve) or other tests (such as cross- satellite imagery. In addition, IRS coverage was universal validation) could also be used to compare predictive per- across the district and ITN use was the same or higher formance. From an epidemiological perspective, while at the more remote locations. This implies that ecological this study incorporates many potential risks for malaria and entomological factors are less likely to be driving this there are additional variables that are not included. Most phenomenon, suggesting that socio-economic factors notably these data do not include vector-related vari- may be important. ables. The data are also limited by the periodicity of sam- Furthermore, this framework may be useful for project- pling time (seasonal), rather than a continuous sampling ing the impact of future management efforts. For exam- approach. ple, the association between malaria risk and distance Millar et al. Malar J (2018) 17:343 Page 12 of 14 Conclusion Competing interests The BMA approach for variable selection produced easily The authors declare that they have no competing interests. interpretable models, which incorporate selection uncer- Availability of data and materials tainty and outperformed standard logistic and Lasso The data files and code used in this analysis are available at https ://githu regressions in out-of-sample predictions. The risk factor b.com/justi nmill ar/bma-malar ia. The R code for constructing the Gibbs sam-pler is also provided in Additional file 1. (Note to reviewers, this is currently a models for malaria prevalence  in young children from a private GitHub repository that will be made public upon article acceptance as holoendemic district in northern Ghana experiencing well as be submitted to Zenodo to generate a permanent DOI). residual transmission revealed complex patterns of dis- Consent for publication ease drivers, including nonlinear relationships between Not applicable. malaria status and distance from the nearest urban cen- tre and health facility, as well as seasonal differences in Ethics approval and consent to participateEthical approval for the data collection was granted by the Institutional risk associated with age and ethnicity. Models quickly Review Board (IRB) of the Noguchi Institute for Medical Research at the become increasingly more complex with additional University of Ghana (NMIMR IRB CPN#009-10-11 revd 2013, FWA 001824/IRB explanatory variables (and their associated parameters) 908). Approval for faculty and student involvement in the follow-on analysis of de-identified data was given by the University of Florida (IRB201500051). to increase flexibility, underscoring the need for reli- able methods for model selection. Bayesian approaches Funding for variable selection, such as BMA, for identifying and Funding for the original data collection was provided by the US President’s Malaria Initiative. The funders had no role in study design, data collection and describing risk factor have potential for expanding the analysis, or preparation of the manuscript. The submitted draft was approved understanding of local drivers of disease, leading to more by PMI without suggested revisions. Funding for this follow-on study was efficient targeting and prioritization of existing interven- provided through a graduate research assistantship to JM from the University of Florida. tions, and informing new interventions, for malaria and other vector-borne diseases. Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in pub- lished maps and institutional affiliations. Received: 9 April 2018 Accepted: 21 September 2018 Additional file Additional file 1. Contains descriptive statistics on covariates, code for running the Gibbs sampler, and additional model outputs. References 1. Bhatt S, Weiss DJ, Cameron E, Bisanzio D, Mappin B, Dalrymple U, et al. Abbreviations The effect of malaria control on Plasmodium falciparum in Africa between 2000 and 2015. Nature. 2015;526:207–11. BMA: Bayesian model averaging; Comm.: community; Edu.: education; GIS: 2. Kienberger S, Hagenlocher M. Spatial-explicit modeling of social vulner- geographic information systems; IRS: indoor-residual spraying of insecticides; ability to malaria in East Africa. Int J Health Geogr. 2014;13:29. ITN: insecticide-treated bed netting; Km: kilometres; Lasso: least absolute 3. World Health Organization. World malaria report 2014. Geneva: World shrinkage and selection operator; LLIN: long-lasting insecticide-treated bed Health Organization; 2014. net; LST: land surface temperature; NDVI: normalized difference vegetation 4. Durnez L, Coosemans M. Residual transmission of malaria: an old issue index; WQ: wealth quintile. for new approaches. In: Manguin S, editor. Anopheles mosquitoes—new Authors’ contributions insights into malaria vectors. London: Intech Publ; 2013. p. 671–704. 5. Killeen GF. Characterizing, controlling and eliminating residual malaria JM conducted the analysis and wrote the primary draft. PP, BA, CA, SO, and KK transmission. Malar J. 2014;13:330. organized and supervised the collection of survey data. PA collected remote- 6. Hagenlocher M, Castro MC. Mapping malaria risk and vulnerability in the sensed and GIS-derived data. PA, JM, and PP cleaned and organized data. DV United Republic of Tanzania: a spatial explicit model. Popul Health Metr. and JM constructed statistical model and coded the algorithms. PP, PA, and DV 2015;13:2. provided substantial feedback and editing to the manuscript. All authors read 7. Clark TD, Greenhouse B, Njama-Meya D, Nzarubara B, Maiteki-Sebuguzi and approved the final manuscript. C, Staedke SG, et al. Factors determining the heterogeneity of malaria Author details incidence in children in Kampala, Uganda. J Infect Dis. 2008;198:393–400. 1 8. Valle D, Millar J, Amratia P. Spatial heterogeneity can undermine the effec- Emerging Pathogens Institute, University of Florida, Gainesville, USA. 2 tiveness of country-wide test and treat policy for malaria: a case study Noguchi Memorial Institute for Medical Research, College of Health Sciences, 3 from Burkina Faso. Malar J. 2016;15:513.University of Ghana, Legon, Ghana. National Malaria Control Programme, 9. Chaccour C, Killeen GF. Mind the gap: residual malaria transmission, vet- Public Health Division, Ghana Health Service, Accra, Ghana. erinary endectocides and livestock as targets for malaria vector control. Acknowledgements Malar J. 2016;15:24. 10. Protopopoff N, Bortel WV, Speybroeck N, Geertruyden J-PV, Baza D, We would like to thank Kok Ben Toh for providing comments on an earlier D’Alessandro U, et al. Ranking malaria risk factors to guide malaria version of this manuscript, as well as Damian Adams, Gregory Glass, and Ethan control efforts in African highlands. PLoS ONE. 2009;4:e8022. White for providing critiques on the framework of the analysis. We would also 11. Craig MH, Snow RW, le Sueur D. A Climate-based distribution model like to thank Brooke Eckman and Syed Abdul-Rahman for early contributions of malaria transmission in sub-Saharan Africa. Parasitol Today. to this research. 1999;15:105–11. Millar et al. Malar J (2018) 17:343 Page 13 of 14 12. Hay SI, Cox J, Rogers DJ, Randolph SE, Stern DI, Shanks GD, et al. 34. Raftery AE, Gneiting T, Balabdaoui F, Polakowski M. Using Bayesian Climate change and the resurgence of malaria in the East African model averaging to calibrate forecast ensembles. Mon Weather Rev. highlands. Nature. 2002;415:905–9. 2005;133:1155–74. 13. Pascual M, Ahumada JA, Chaves LF, Rodó X, Bouma M. Malaria resur- 35. Sloughter JML, Raftery AE, Gneiting T, Fraley C. Probabilistic quantitative gence in the East African highlands: temperature trends revisited. Proc precipitation forecasting using Bayesian model averaging. Mon Weather Natl Acad Sci USA. 2006;103:5829–34. Rev. 2007;135:3209–20. 14. Krefis AC, Schwarz NG, Nkrumah B, Acquah S, Loag W, Sarpong N, et al. 36. Posada D, Buckley TR. Model selection and model averaging in phy- Principal component analysis of socioeconomic factors and their asso- logenetics: advantages of Akaike information criterion and Bayesian ciation with malaria in children from the Ashanti Region, Ghana. Malar approaches over likelihood ratio tests. Syst Biol. 2004;53:793–808. J. 2010;9:201. 37. Dobigeon N, Tourneret J-Y, Chang C-I. Semi-supervised linear spectral 15. Koram KA, Bennett S, Adiamah JH, Greenwood BM. Socio-economic unmixing using a hierarchical Bayesian model for hyperspectral imagery. risk factors for malaria in a peri-urban area of The Gambia. Trans R Soc IEEE Trans Signal Process. 2008;56:2684–95. Trop Med Hyg. 1995;89:146–50. 38. Volinsky CT, Madigan D, Raftery AE, Kronmal RA. Bayesian model averag- 16. Kreuels B, Kobbe R, Adjei S, Kreuzberg C, von Reden C, Bäter K, et al. ing in proportional hazard models: assessing the risk of a stroke. J R Stat Spatial variation of malaria incidence in young children from a Soc Ser C Appl Stat. 1997;46:433–48. geographically homogeneous area with high endemicity. J Infect Dis. 39. Viallefont V, Raftery AE, Richardson S. Variable selection and Bayesian 2008;197:85–93. model averaging in case-control studies. Stat Med. 2001;20:3215–30. 17. Njama D, Dorsey G, Guwatudde D, Kigonya K, Greenhouse B, Musisi S, 40. National Malaria Control Programme, University of Health & Allied Sci- et al. Urban malaria: primary caregivers’ knowledge, attitudes, practices ences, AGA Malaria Control Programme, World Health Organization and and predictors of malaria incidence in a cohort of Ugandan children. the INFORM Project. An epidemiological profile of malaria and its control Trop Med Int Health. 2003;8:685–92. in Ghana; 2013. https: //www.linkm alari a.org/sites/ www.linkm alaria .org/ 18. Rulisa S, Kateera F, Bizimana JP, Agaba S, Dukuzumuremyi J, Baas L, files /conte nt/count ry/profi les/Ghana -epi-repor t-2014.pdf. Accessed 13 et al. Malaria prevalence, spatial clustering and risk factors in a low Oct 2017. endemic area of eastern Rwanda: a cross sectional study. PLoS ONE. 41. Owusu-Agyei S, Asante KP, Adjuik M, Adjei G, Awini E, Adams M, et al. 2013;8:e69443. Epidemiology of malaria in the forest-savanna transitional zone of Ghana. 19. Adigun AB, Gajere EN, Oresanya O, Vounatsou P. Malaria risk in Nigeria: Malar J. 2009;8:220. Bayesian geostatistical modelling of 2010 malaria indicator survey 42. Coleman S, Dadzie SK, Seyoum A, Yihdego Y, Mumba P, Dengela D, et al. data. Malar J. 2015;14:156. A reduction in malaria transmission intensity in Northern Ghana after 20. Sharma RK, Singh MP, Saha KB, Bharti PK, Jain V, Singh PP, et al. Socio- 7 years of indoor residual spraying. Malar J. 2017;16:324. economic & household risk factors of malaria in tribal areas of Madhya 43. Yelyang A. Conflict prevention strategies in Northern Ghana: a case study Pradesh, central India. Indian J Med Res. 2015;141:567. of the ethnic conflicts in Kpemale. J Confl Transform Secur. 2016;5:75–94. 21. Ferrari G, Ntuku HMT, Ross A, Schmidlin S, Kalemwa DM, Tshefu AK, 44. President’s Malaria Initiative. Ghana Malaria Operational Plan FY 2014. et al. Identifying risk factors for Plasmodium infection and anaemia in https: //www.pmi.gov/docs/defau lt-sourc e/defau lt-docum ent-libra Kinshasa, Democratic Republic of Congo. Malar J. 2016;15:362. ry/malari a-operat ional -plans /fy14/ghana_ mop_fy14.pdf?sfvrsn =20. 22. Chirombo J, Lowe R, Kazembe L. Using structured additive regression Accessed 13 Oct 2017. models to estimate risk factors of malaria: analysis of 2010 Malawi 45. President’s Malaria Initiative. Ghana Malaria Operational Plan FY 2015. malaria indicator survey data. PLoS ONE. 2014;9:e101116. https ://www.pmi.gov/docs/defau lt-sourc e/defaul t-docum ent-librar y/ 23. Weiss DJ, Mappin B, Dalrymple U, Bhatt S, Cameron E, Hay SI, et al. malari a-opera tional -plans /fy-15/fy-2015-ghana -malar ia-opera tional -plan. Re-examining environmental correlates of Plasmodium falciparum pdf?sfvrs n=3. Accessed 13 Oct 2017. malaria endemicity: a data-intensive variable selection approach. Malar 46. Prata N, Morris L, Mazive E, Vahidnia F, Stehr M. Relationship between HIV J. 2015;14:68. Risk perception and condom use: evidence from a population-based 24. Babyak MA. What you see may not be what you get: a brief, nontechni- survey in Mozambique. Int Fam Plan Perspect. 2006;32:192–200. cal introduction to overfitting in regression-type models. Psychosom 47. Jones RM, Masago Y, Bartrand T, Haas CN, Nicas M, Rose JB. Character- Med. 2004;66:411–21. izing the risk of infection from Mycobacterium tuberculosis in commercial 25. Tremblay M, Dahm J, Wamae C, De Glanville W, Fèvre E, Döpfer D. passenger aircraft using quantitative microbial risk assessment. Risk Anal. Shrinking a large dataset to identify variables associated with increased 2009;29:355–65. risk of Plasmodium falciparum infection in Western Kenya. Epidemiol 48. Ayele DG, Zewotir TT, Mwambi HG. Prevalence and risk factors of malaria Infect. 2015;143:3538–45. in Ethiopia. Malar J. 2012;11:195. 26. Sturrock HJ, Cohen JM, Keil P, Tatem AJ, Le Menach A, Ntshalintshali NE, 49. Green PJ. Reversible jump Markov chain Monte Carlo computation and et al. Fine-scale malaria risk mapping from routine aggregated case Bayesian model determination. Biometrika. 1995;82:711–32. data. Malar J. 2014;13:421. 50. R Core Team. R: A language and environment for statistical computing. 27. Kouwayè B, Fonton N, Rossi F. Lasso based feature selection for malaria Vienna: R Foundation for Statistical Computing; 2017. https ://www.R- risk exposure prediction. ArXiv Prepr. ArXiv151101284; 2015. proje ct.org/. Accessed 3 Mar 2018. 28. Hoeting JA, Madigan D, Raftery AE, Volinsky CT. Bayesian model aver- 51. Denison DG. Bayesian methods for nonlinear classification and regres- aging: a tutorial. Stat Sci. 1999;14:382–401. sion. Hoboken: John Wiley & Sons; 2002. 29. Hoeting JA, Madigan D, Raftery AE, Volinsky CT. Bayesian model aver- 52. Tibshirani R. Regression shrinkage and selection via the Lasso. J R Stat Soc aging. In: Proceedings of the AAAI workshop on integrating multiple Ser B Methodol. 1996;58:267–88. learned models. 1998. p. 77–83. 53. O’Hara RB, Sillanpää MJ. A review of Bayesian variable selection methods: 30. Wang D, Zhang W, Bakhai A. Comparison of Bayesian model averaging what, how and which. Bayesian Anal. 2009;4:85–117. and stepwise methods for model selection in logistic regression. Stat 54. Zeugner S, Feldkircher M. Bayesian model averaging employing fixed and Med. 2004;23:3451–67. flexible priors: the BMS package for R. J Stat Softw. 2015;68:1–37. 31. Genell A, Nemes S, Steineck G, Dickman PW. Model selection in Medi- 55. Hooten MB, Hobbs NT. A guide to Bayesian model selection for ecolo- cal Research: a simulation study comparing Bayesian Model Averaging gists. Ecol Monogr. 2015;85:3–28. and Stepwise Regression. BMC Med Res Methodol. 2010;10:108. 56. Kyung M, Gill J, Ghosh M, Casella G. Penalized regression, standard errors, 32. Zhao K, Valle D, Popescu S, Zhang X, Mallick B. Hyperspectral remote and Bayesian lassos. Bayesian Anal. 2010;5:369–411. sensing of plant biochemistry using Bayesian model averaging with vari- 57. Javanmard A, Montanari A. Confidence intervals and hypothesis testing able and band selection. Remote Sens Environ. 2013;132:102–19. for high-dimensional regression. J Mach Learn Res. 2014;15:2869–909. 33. Wintle BA, McCARTHY MA, Volinsky CT, Kavanagh RP. The use of Bayesian 58. Lockhart R, Taylor J, Tibshirani RJ, Tibshirani R. A significance test for the model averaging to better represent uncertainty in ecological models. Lasso. Ann Stat. 2014;42:413–68. Conserv Biol. 2003;17:1579–90. 59. Altman N, Krzywinski M. Points of significance: P values and the search for significance. Nat Methods. 2016. https ://doi.org/10.1038/nmeth .4120. Millar et al. Malar J (2018) 17:343 Page 14 of 14 60. Tu JV. Advantages and disadvantages of using artificial neural networks pediatric health utilisation in rural western Kenya. Trop Med Int Health. versus logistic regression for predicting medical outcomes. J Clin Epide- 2009;14:54–61. miol. 1996;49:1225–31. 76. Blanford JI, Kumar S, Luo W, MacEachren AM. It’s a long, long walk: acces- 61. Snow RW, Omumbo JA, Lowe B, Molyneux CS, Obiero J-O, Palmer A, sibility to hospitals, maternity and integrated health centers in Niger. Int J et al. Relation between severe malaria morbidity in children and level of Health Geogr. 2012;11:24. Plasmodium falciparum transmission in Africa. Lancet. 1997;349:1650–4. 77. Huerta Munoz U, Källestål C. Geographical accessibility and spatial 62. Dike N, Onwujekwe O, Ojukwu J, Ikeme A, Uzochukwu B, Shu E. Influence coverage modeling of the primary health care network in the Western of education and knowledge on perceptions and practices to control Province of Rwanda. Int J Health Geogr. 2012;11:40. malaria in Southeast Nigeria. Soc Sci Med. 2006;63:103–6. 78. Oppong JR. Accommodating the rainy season in Third World location- 63. Feachem RG, Phillips AA, Hwang J, Cotter C, Wielgosz B, Greenwood allocation applications. Socioecon Plann Sci. 1996;30:121–37. BM, et al. Shrinking the malaria map: progress and prospects. Lancet. 79. Dunson DB. Commentary: practical advantages of Bayesian analysis of 2010;376:1566–78. epidemiologic data. Am J Epidemiol. 2001;153:1222–6. 64. Agyepong IA, Adjei S. Public social policy development and implementa- 80. Ghana Statistical Service—GSS, Ghana Health Service—GHS, ICF Macro. tion: a case study of the Ghana National Health Insurance scheme. Health Ghana Demographic and Health Survey 2008. Accra, Ghana: GSS, GHS, Policy Plan. 2008;23:150–60. and ICF Macro; 2009. http://dhspr ogram .com/pubs/pdf/FR221/ FR221 65. Guyant P, Corbel V, Guérin PJ, Lautissier A, Nosten F, Boyer S, et al. Past .pdf. Accessed 13 Oct 2017. and new challenges for malaria control and elimination: the role of 81. Center for International Earth Science Information Network (CIESIN)/ operational research for innovation in designing interventions. Malar J. Columbia University, and Information Technology Outreach Services 2015;14:279. (ITOS)/University of GeorgiaCenter for International Earth Science Infor- 66. Worrall E, Rietveld A, Delacollette C. The burden of malaria epidemics and mation Network (CIESIN)/Columbia University and ITOS (ITOS)/University cost-effectiveness of interventions in epidemic situations in Africa. Am J of G. Global Roads Open Access Data Set, Version 1 (gROADSv1). http:// Trop Med Hyg. 2004;71(2_suppl):136–40. sedac. ciesi n.colum bia.edu/data/set/groad s-globa l-roads- open-acces s-v1. 67. Greenwood B. Intermittent preventive treatment—a new approach Accessed 4 Apr 2016. to the prevention of malaria in children in areas with seasonal malaria 82. ESRI. Digital chart of the world (DCW): inland water bodies. http://www. transmission. Trop Med Int Health. 2006;11:983–91. diva-gis.org. Accessed 5 Apr 2014. 68. White MT, Conteh L, Cibulskis R, Ghani AC. Costs and cost-effectiveness of 83. Jarvis A, Reuter HI, Nelson A, Guevara E. Hole-filled SRTM for the globe malaria control interventions—a systematic review. Malar J. 2011;10:337. Version 4, CGIAR-CSI SRTM 90 m Database, available at: h ttp. Srtm Csi 69. Cairns M, Roca-Feltrer A, Garske T, Wilson AL, Diallo D, Milligan PJ, et al. Cgiar Org Last Access. 2012;5:2008. Estimating the potential public health impact of seasonal malaria chemo- 84. NASA LP DAAC. MODIS Level 1 Land Surface Temperatures Registered At- prevention in African children. Nat Commun. 2012;3:881. Sensor Radiance. Version 5. https ://lpdaa c.usgs.gov/datase t_discov ery/ 70. Hay SI, Guerra CA, Tatem AJ, Noor AM, Snow RW. The global distribution modis /modis _produc ts_table /mod11a 2. Accessed 15 Jan 2016. and population at risk of malaria: past, present, and future. Lancet Infect 85. NASA LP DAAC. MODIS Level 1 Vegetation Indices Registered At-Sensor Dis. 2004;4:327–36. Radiance. Version 5. NASA EOSDIS Land Processes DAAC, USGS Earth 71. Murray CJ, Rosenfeld LC, Lim SS, Andrews KG, Foreman KJ, Haring D, et al. Resources Observation and Science (EROS) Center, Sioux Falls, South Global malaria mortality between 1980 and 2010: a systematic analysis. Dakota. https ://lpdaac .usgs.gov/datase t_disco very/modis/ modis _produ Lancet. 2012;379:413–31. cts_table /mod13 c1. Accessed 15 Jan 2016. 72. Noor AM, Kinyoki DK, Mundia CW, Kabaria CW, Mutua JW, Alegana VA, 86. Stevens FR, Gaughan AE, Linard C, Tatem AJ, Sorichetta A, Hornby GM, et al. The changing risk of Plasmodium falciparum malaria infection in et al. WorldPop-RF, Version 2b.1.1; 2015. https ://doi.org/10.6084/m9.figsh Africa: 2000–10: a spatial and temporal analysis of transmission intensity. are.149149 0.v3. Lancet. 2014;383:1739–47. 87. Hijmans RJ, Cameron SE, Parra JL, Jones PG, Jarvis A. Very high resolu- 73. Pond BS. Malaria indicator surveys demonstrate a markedly lower preva- tion interpolated climate surfaces for global land areas. Int J Climatol. lence of malaria in large cities of sub-Saharan Africa. Malar J. 2013;12:313. 2005;25:1965–78. 74. Vlahov D, Galea S. Urbanization, urbanicity, and health. J Urban Health. 88. Funk C, Verdin A, Michaelsen J, Peterson P, Pedreros D, Husak G. A global 2002;79:S1–12. satellite assisted precipitation climatology. Earth Syst Sci Data Discuss. 75. Feikin DR, Nguyen LM, Adazu K, Ombok M, Audi A, Slutsker L, et al. The 2015;8:401–25. impact of distance of residence from a peripheral health facility on Ready to submit your research ? Choose BMC and benefit from: • fast, convenient online submission • thorough peer review by experienced rese archers in your field • rapid publication on acceptance • support for research data, including large and complex data types • gold Open Access which fosters wider collaboration and increased citations • maximum visibility for your research: over 100M website views per year At BMC, research is always in progress. Learn more biomedcentral.com/submissions