Mweemba et al. Population Health Metrics (2022) 20:8 https://doi.org/10.1186/s12963-022-00286-3 RESEARCH Open Access Estimating district HIV prevalence in Zambia using small-area estimation methods (SAE) Chris Mweemba1* , Peter Hangoma1, Isaac Fwemba1,2, Wilbroad Mutale1 and Felix Masiye3 Abstract Background: The HIV/AIDS pandemic has had a very devastating impact at a global level, with the Eastern and Southern African region being the hardest hit. The considerable geographical variation in the pandemic means varying impact of the disease in different settings, requiring differentiated interventions. While information on the prevalence of HIV at regional and national levels is readily available, the burden of the disease at smaller area levels, where health services are organized and delivered, is not well documented. This affects the targeting of HIV resources. There is need, therefore, for studies to estimate HIV prevalence at appropriate levels to improve HIV-related planning and resource allocation. Methods: We estimated the district-level prevalence of HIV using Small-Area Estimation (SAE) technique by utilizing the 2016 Zambia Population-Based HIV Impact Assessment Survey (ZAMPHIA) data and auxiliary data from the 2010 Zambian Census of Population and Housing and the HIV sentinel surveillance data from selected antenatal care clinics (ANC). SAE models were fitted in R Programming to ascertain the best HIV predicting model. We then used the Fay– Herriot (FH) model to obtain weighted, more precise and reliable HIV prevalence for all the districts. Results: The results revealed variations in the district HIV prevalence in Zambia, with the prevalence ranging from as low as 4.2% to as high as 23.5%. Approximately 32% of the districts (n = 24) had HIV prevalence above the national average, with one district having almost twice as much prevalence as the national level. Some rural districts have very high HIV prevalence rates. Conclusions: HIV prevalence in Zambian is highest in districts located near international borders, along the main transit routes and adjacent to other districts with very high prevalence. The variations in the burden of HIV across dis- tricts in Zambia point to the need for a differentiated approach in HIV programming within the country. HIV resources need to be prioritized toward districts with high population mobility. Keywords: SAE, Small-area estimation, HIV, Prevalence, District, Fay–Herriot, Auxiliary information Background cases [1]. Interestingly, the burden of HIV varies con- The HIV/AIDS pandemic has continued to be a global siderably within Africa, with sub-Saharan Africa alone public health problem, with an estimated 38 million accounting for about 70% of all global HIV cases in SSA people globally living with HIV in 2019 and the African [2]. However, a closer review of HIV in the SSA region region bearing the largest burden of the global HIV/AIDS reveals that the burden is mainly in Eastern and Southern African region (ESA) where, with only 6.2% of the world population, the ESA region accounted for approximately *Correspondence: chris.muna@gmail.com 54% of the total global HIV infections and 43% of all 1 Department of Health Policy, Systems and Management, School AIDS-related deaths in 2019 [1]. There is substantial vari- of Public Health, University of Zambia, Ridgeway Campus, P.O. Box 50110, Lusaka, Zambia ation in the distribution of HIV within the ESA region. Full list of author information is available at the end of the article For instance, of the 24 countries in this region, more than © The Author(s) 2022. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licens es/ by/4. 0/. The Creative Commons Public Domain Dedication waiver (http:// creat iveco mmons. org/ publi cdoma in/ zero/1.0 /) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Mweemba et al. Population Health Metrics (2022) 20:8 Page 2 of 11 a quarter of the new HIV infections in 2018 were in South District-level HIV statistics are of particular impor- Africa, while 50% of infections were in 7 other countries, tance for Zambia because, a district is the lowest level namely, and in order of magnitude, Mozambique, Tanza- of decentralization where health services are organized nia, Uganda, Zambia, Kenya, Malawi and Zimbabwe [3]. and delivered [10]. A previous study by Dwyer-Lind- Similarly, the distribution of HIV within countries has gren et  al. [11] produced HIV prevalence estimates at a been shown to vary remarkably. In Zambia, for instance, 5 × 5  km pixel resolution for countries in sub-Saharan some provinces such as Lusaka (16.1%), Western (16%) Africa, including Zambia, which can be aggregated to the and Copperbelt (14.2%) have relatively high prevalence district level. However, the Dwyer-Lindgren et al. model compared to provinces like North-western (6.9%) and is very computationally intensive, and not specifically Muchinga (5.9%) (ZAMPHIA, 2016). This trend is similar tailored for Zambia. Our study, on the other hand, uses for South Africa where the burden of HIV among adult novel methodology that are specifically tailored for Zam- South Africans in 2016 ranged from as low as 12.6% in bia and can easily be replicated in other country-specific Western Cape to as high as 27% in Kwazulu-Natal (KZN) contexts. [4]. The information on the geographical variation in HIV Methods prevalence at provincial level is certainly important for The district HIV prevalence was estimated using Small- guiding government policy, prioritization of interven- Area Estimation (SAE) methods by utilizing multiple data tions and resource allocation both across and within sources. The SAE method is a statistical technique for countries. It should, however, be noted that the burden obtaining reliable statistics for small areas that are mostly of diseases within the provinces can be heterogeneous. underrepresented in existing data sources due to small For example, within KZN province in South Africa, the sample sizes. Using both direct and indirect methods, available district-specific HIV prevalence in 2016 ranged SAE models combine multiple data sources (censuses, from 16.1% in ILembe to 20.6% in uMgungundlovu [5]. surveys, etc.) containing other related information—aux- Similarly, a study that modeled district-level estimates for iliary data—for these small areas [12]. HIV prevalence in South Africa found variations in the Put simply, small-area estimates for HIV prevalence prevalence within the South African provinces [6]. This are a weighted average of the direct prevalence estimate means that effective preventive and control strategies to from existing data which, due to sample size, may be too combat HIV require knowledge of the burden of the dis- unreliable, and therefore requiring a statistical model that ease at smaller and more similar areas such as districts utilizes auxiliary data from outside the survey to improve [7]. This is challenging, however, because most data cur- the estimates [13]. More weight is placed on the  pre- rently in use are not sufficiently powered to provide relia- dicted prevalence if the variance of the direct prevalence ble estimates at the small-area levels such as districts [7]. is high, and vice versa [6]. The Zambian Ministry of Health acknowledges the importance of district-level estimates for more focused Data sources approaches in HIV programming and in facilitating the The outcome variable was HIV prevalence—obtained achievement of the Fast Track targets [8]. These targets from the ZAMPHIA of 2016, while auxiliary predictors are a set of 10 global guidelines for countries to adopt and included HIV prevalence among pregnant women, the implement in order to end the HIV pandemic by 2030 2010 Zambian population; proportion of the population through ensuring, among other things, zero new HIV aged 15–36 years; dependence ratio (the ratio of popula- infections, zero discrimination and zero AIDS-related tion aged 0–14 years and persons aged 65 years and older deaths. The Zambian MoH also acknowledges the impor- per 100 persons in the working age group 15–64  years tance of district-level HIV estimates in the achievement old [14]); the proportion of the population in formal of these targets in an equitable manner. However, infor- dwelling; proportion of the population with higher edu- mation on the prevalence of HIV at the district level is cation attainment; proportion of the population residing very limited and the MoH’s Fast Track strategies are in the urban area, population density and the proportion unlikely to be realized. Currently, existing districts esti- of females in the population. Data on HIV prevalence mates for HIV in Zambia are from routine health facility among pregnant women were obtained from selected data which cannot be generalized to the general popu- ANC facilities in the 74 districts in 2017 and 2018, while lation due to the non-random nature of the people that the rest of the auxiliary predictors were obtained from present to test for HIV [9]. This problem can only be the 2010 Census of Population and Housing for Zambia. remedied with the use of robust techniques, such as SAE The ZAMPHIA is a nationally representative cross- methods, designed to provide valid estimates of the bur- sectional, population-based survey of households across den of HIV at district level. Zambia, aimed at measuring the status of Zambia’s M weemba et al. Population Health Metrics (2022) 20:8 Page 3 of 11 national HIV response [15]. The 2016 ZAMPHIA used while the ANC HIV prevalence rates are the prevalence a two-stage stratified cluster sampling. The first stage proportions among pregnant women who obtained ante- selected 511 enumeration areas (EAs) using probabil- natal care services from clinics dotted across the various ity proportional to size method, and the second stage districts in Zambia. The logit transformation was neces- selected an average of 27 households per EA using equal sary for converting prevalence proportions to the real line probability method. A total of 13,441 households and which helps in ascertaining the normality assumption test. 28,142 individuals were sampled for the survey; 19,168 Similarly, sampling error variance was estimated as Delta- were adults aged 15–59  years, and 8974 were children method approximation using the variances of the domain aged 0 to 14  years. Those aged 15–59  years received estimates as reported and elaborated elsewhere [5]. The home-based counselling and testing for HIV. Additional model estimated the true HIV prevalence by combining information on the ZAMPHIA methodology is provided the direct estimate (i.e., direct methods estimation) from in the ZAMPHIA report [15]. the ZAMPHIA survey and the indirect model-based esti- Our study and the Dwyer-Lindgren study, alluded to mates, based on auxiliary predictors and the spatial cor- earlier, have similar data sources, with both studies hav- relation effects meant to improve the model prediction by ing utilized the ZAMPHIA and the ANC sentinel sur- borrowing strength from across the districts [6]. The direct veillance data, for HIV seroprevalence estimates and estimate of HIV prevalence, yi for district i, was obtained HIV prevalence among pregnant women, respectively. as a weighted mean district-specific HIV prevalence from However, in addition to the above sources, the Dwyer- the ZAMPHIA survey. This estimate can be viewed to be Lindgren study also uses the Zambia Demographic and as follows: Health Survey (ZDHS) for HIV seroprevalence estimates, while our study uses the Zambia 2010 Census of Popu- yi = �i + εi (1) lation and Housing to obtain additional auxiliary predic- where y is the HIV prevalence estimate for district i esti- tors for HIV. Both studies offer very useful insights for imated from the survey data; i is the district’s true HIV better targeting of HIV resources at district level and prevalence being estimated; and ε is the random error facilitating the achievement of the Ministry of Health’s iwith mean 0 and variance σ 2 and is assumed to be nor- strategic HIV goals. imally distributed. However, since the number of respondents sampled at Variable description district level, during the ZAMPHIA, is not sufficient to HIV prevalence is the number of HIV positive cases provide reliable district HIV prevalence estimates, the sec- per 100 people tested for HIV in the ZAMPHIA and in ond part of the model, referred to as indirect method, was the selected ANC clinics dotted across all the district. estimated to improve the reliability of the estimates. There- According to the 2010 Census of Population and Hous- fore, in addition to the direct prevalence estimates obtained ing [14], population density is the total number of per- from ZAMPHIA, the indirect method used auxiliary infor- sons per square kilometer; proportion of urban area is mation from within the district and neighboring districts, the area considered to be urban out of the total area of and other data sources to borrow strength and improve the the district; formal dwelling is defined as a room/set of precision of the HIV prevalence estimates [16]. Since the rooms in a permanent building that could be structurally outcome variable was a logit transformation of HIV preva- separated from a permanent building; dependence ratio lence, we assumed that HIV prevalence is a linear function is the ratio of the economically inactive persons to a 100 of covariates or HIV risk factors obtained from auxiliary economically active persons; and higher education is the data [6]. The true HIV prevalence ( i in Eq. 1) can there- proportion of the population that have attained tertiary fore be thought of as: education. �i = xiβ + vi (2) Statistical models β is a set of regression coefficients obtained by regressing This study used the SAE technique to model and estimate yi on HIV risk factors (xs) and vi are normally distributed HIV prevalence in Zambia, adapting methods from a simi- random errors with mean 0 and variance σ 2v . Note that σ 2v lar study in South Africa [6]. Note that the outcome variable and σ 2i are independent of each other. Combining Eqs. 1 entered the modeling framework as a logit transformation and 2 gives the following mixed-effects linear regression of the direct district HIV prevalence from the ZAMPHIA model; survey. The ANC HIV prevalence rate was also modeled as a logit transformation. The HIV prevalence rates are yi = xiβ + vi + εi (3) the direct domain estimates of the Zambian district-level HIV prevalence proportions from the ZAMPHIA survey, Mweemba et al. Population Health Metrics (2022) 20:8 Page 4 of 11 To improve precision of the HIV prevalence estimates where dij, in Eq. (5), is the distance between districts i and from Eq. 3, there is need for a model that combines direct j; α is a parameter of the distance decay (α = 0 if ij do not and indirect estimates into a single estimate, such as the share a border, otherwise 0 < α < 1). According to Eq. (6), Fay–Herriot (FH) small-area estimator. The FH estimator the total amount of influence that one area receives from is a linear combination of a direct and synthetic estimator other areas is fixed [22]. which reduces estimation variance in the underrepresented The data analysis was conducted in R [23] utilizing the small areas and in the whole model [17]. The FH estimator SAE package built in the software [24]. Figures were pro- is given by: duced with the ggplot2 package [25]. yi = γiyi + (1− γi)xiβ̂ (4) Model selection where γi and 1 − γi are weights for the direct estimate y We fitted a variation of basic area-level models which dif-i and the synthetic estimate, xiβ̂ , respectively, which con- fered in the inclusion of auxiliary predictors and assump- stitute the FH estimator. Note that γ is simply the ratio of tions about the random effects. Model 1 included only 2 the model error variance to the total error, i.e., σv . the logit of ANC prevalence proportion  as an auxiliary σ 2+σ 2v i predictor. Models 2–9 augmented model 1 with inclu- This means that if the survey-based estimates are precise, sion of the district-level percentages of dependency ratio more weight is given to the direct estimate. Similarly, low (DR), formal dwelling (Formal), high education (HE), precision of the survey-based estimates results in more land considered to be urban (Urban), district population weight being given to the synthetic or indirect estimate. (Pop), population aged between 15 and 35  years (15– 35 years), population density (PD) and female population Spatial correlation (Female), respectively. There is evidence that areas close to each other tend to have Model 10 augmented model 2 with inclusion of formal similar population dynamics, such as disease risk factors dwelling. Model 11 augmented model 10 with inclusion and disease burden [18]. This highlights the importance of higher education. Model 12 augmented model 11 with of location and geographical clustering in determining the inclusion of urban prop. Model 13 augmented model 12 spread of, and burden of disease—especially infectious dis- with inclusion of pop2010. This continued until model eases, for areas that are in close proximity [19, 20]. A study 15, which augmented model 14 with the inclusion of in Ethiopia documented the importance of geographi- pop density. Model 16 augmented model 15 with female cal clustering in determining the prevalence of HIV and population. Tuberculosis (TB) [21]. Model 17 was reduced from model 16 by deletion of To account for this spatial correlation, we built a spa- the logit of ANC prevalence and provides the contrast tial Fay–Herriot (SFH) model and tested it against a non- needed to assess the value of ANC prevalence. Models spatial model to ascertain the best fitting model for this 18–35 relaxed the assumption of independent model study. A spatial adjacency matrix (W) was built in Excel, as errors in models 1, 2 through to 17, respectively, with follows: × inclusion of a simultaneously autoregressive (SAR) spa-Spatial adjacency matrix (W) is an n n matrix where n tial covariance structure. Model 35 only contained the is the number of district in Zambia. SAR covariance structure without any covariates. The The diagonal entries are Wii = 0, indicating no correlation spatial adjacency matrix, described earlier, accounted for district i to itself. = for the SAR covariance structure. Relative model perfor-The off-diagonal row entries add up to 1, i.e., Wij 1. This mance was assessed using the Akaike Information Cri- can be thought of as follows, as presented by Yakoi and terion (AIC). The AIC balances model fit against model Ando [22]; complexity; smaller values of AIC indicate relatively { 1/dα better predictive ability. AIC is a dimensionless relative w ij,1ij = (5) 0 measure, and according to Gutreuter and others [6], dif- ferences of 5 between models are customarily considered to be important. If i = j otherwise District-level estimates of the burden of HIV infection were estimated from the best fitting model (Model 19) N ∑ which included the logit of ANC prevalence proportion woij = wij/ w1ik (6) and dependence ratio with the SAR spatial covariance k=1 structure. This model was thereafter used, in combina- tion with the survey-based HIV prevalence estimates, to model the prevalence of HIV in all the 74 districts of M weemba et al. Population Health Metrics (2022) 20:8 Page 5 of 11 Zambia. A table containing information on the fitted Results models has been included as an appendix (See Additional Table 1 shows the population demographics of the aux- file 1). iliary predictors used to predict district HIV prevalence. Note that there are other models that can be used to For instance, it can be seen that the population aged account for autocorrelation effect, such as the conditional 15–35  years represented about 35% of the population, autoregressive (CAR) model, and its intrinsic version although it ranged from the lowest rate of about 32% in (intrinsic autoregressive [IAR] model), and the decision some districts to highest of almost 45% in other district. to use SAR is because these models are equivalent and in The median population with higher education was 3.3% practice produce similar results [26, 27]. (ranged from 1.2 to 16%), while the median population of Note that there are differences in the modeling HIV positive pregnant women was approximately 26%. approaches between our study and the comparable study The females made up of 50.8% of the population. Table 1 by Dwyer-Lindgren and others [11]. Our study was based provides more details. on small-area estimation process, while the Dwyer-Lind- gren study focuses on estimating the sub-national varia- Model diagnosis and validation tion of HIV prevalence using within-country variation at The results obtained using the SAE estimates model were a 5 × 5-km resolution. Further, the paper reported use of consistently more precise than those obtained from the a cross-walking model to link disparate data sources that direct estimate methodology. For instance, the relative leveraged existing microdata and linear regression esti- mean standard errors (RMSE) in Fig.  1 and the relative mates. Use of k-means clustering to generate a reduced standard errors (See Additional file  2) for the SAE are set of locations based on the centroid of each k-means continuously lower than those from the direct estimate cluster helped to generate pseudo-points which were model. In addition, the reduction in relative standard assigned to HIV prevalence observed for the polygon as errors, due to SAE, was greatest in districts which pro- a whole. This is different from our paper, where district- duced the least precise direct estimates. For instance, dis- level data were obtained and not estimated or assumed. tricts like Chadiza, Milenge, Gwembe and Chavuma have All the estimates in our study were linked to available relative standard errors reducing from 99.7 to 30.7%, 70.2 survey data which helped to provide associated survey to 30%, 70.9 to 29.5% and 70.4 to 33.1%, respectively. parameters. Assuming, for example, that “useful” estimates are those Further, Dwyer-Lindgren et al. fitted three sub-models for which RSE ≤ 20%, then our SAE model produced use- to the HIV survey data using generalized additive mod- ful estimates in 52 of the 74 districts for which direct esti- els, boosted regression trees and lasso regression. They mation failed to produce useful estimates. implemented geostatistical modeling framework which It is worth noting that the estimates from the Fay– allowed them to model HIV prevalence using a spatially Herriot estimator had narrower 95% confidence inter- and temporally explicit generalized linear mixed effects vals than the direct estimates (See Fig.  2). Conversely, model. Unlike in our model, their logit-transformed some point estimates for some districts such as Chadiza HIV prevalence was modeled as a linear combination and Gwembe differed rather substantially between the of a regional intercept, covariate effects, country ran- design-based and model-based estimates. The design- dom effects, spatially and temporally correlated random based survey domain estimate of HIV prevalence in effects. In our modeling framework, temporality sea- Gwembe and Chadiza was of little value for lack of pre- sonal effect was not included even though the effect of cision, and at most misleading. Smaller relative standard the spatial term was done. Note also that the frequentist approach was the main inference strategy for our study, while Dwyer-Lindgren et  al. used Bayesian framework Table 1 Population demographics of the auxiliary predictors with a deterministic approach. Their model used the sto- chastic partial differential equation approach to approxi- Auxiliary predictors Mean (median) Min.–Max mate the continuous spatial and spatiotemporal Gaussian Population aged 15–35 years 35.2 (34.4) 31.7–44.9 random fields. We note that this was appropriate given Dependence ratio 99.8 (103.1) 64.7–114.9 the complexity of their dataset which would have suffered Population living in formal dwelling 15.8 (8.3) 0.76–88.8 from serious computation cost if the frequentist or the Population with higher Education 4.6 (3.3) 1.2–16.1 sampling-based approach was implemented. Urban area 25.4 (14.7) 0–100 HIV among pregnant women at ANC 31.6 (25.5) 2–90 clinics Population density 107.9 (15.8) 2.8–4853 Female population 50.8 (50.8) 49.2–53.4 Mweemba et al. Population Health Metrics (2022) 20:8 Page 6 of 11 Fig. 1 Relative mean standard errors (RMSE) for the FH HIV prevalence estimates and survey-based prevalence estimates: The RMSE show lower mean standard errors for the Fay–Herriot small-area estimations over the survey-based estimation for all the 74 Zambian districts errors from the FH small-area estimates are more likely 2.9–7.5), Mafinga (4.6%; CI 2.7–7.5) and Lundazi (4.3%; to be true, compared to those from the direct estimates, CI 2.6–6.9). The results of the SAE reveal that 37 of the and are much more likely to be similar to surrounding 74 districts had relatively low HIV prevalence (≤ 10%), districts. 25 districts had relatively moderate HIV prevalence The conclusion from this model diagnostics and valida- (between 10 and 15%), 10 districts had relatively high tion is that the FH estimator produces smaller standard HIV prevalence (between 15 and 20%), while 2 districts errors compared to the survey-based estimates, across all had relatively very high HIV prevalence (between 18.1% the 74 districts of Zambia. This means that SAE preva- and 23.5%). Table 2 (See Additional file 3) provides both lence estimates are more reliable than those obtained direct and modeled HIV estimates for all the 74 districts, from the direct estimates. with confidence intervals. The distribution of district HIV prevalence is further District HIV prevalence estimates illustrated with the two maps in Fig. 3. Figure 3a shows the The district HIV prevalence in Zambia ranges from as district prevalence map from the direct estimates, while low as 4.3% (CI 2.6–6.9) in Lundazi to as high as 23.3% Fig.  3b shows the map generated using SAE data. The (CI 19.3–27.8) in Namwala. Other notable districts with notable difference between the maps is that the one devel- high HIV prevalence, in order of magnitude, include oped using raw data has a wider HIV prevalence interval Mongu (22.8%; CI 19.2–26.8), Mazabuka (18.7%; 15.4– (0.8–25.4%) compared to the SAE map (4.3–23.3%). The 22.5), Kalulushi (17.5%; CI 13.2–22.7), Choma (17.2%; spatial effect of HIV prevalence can also be seen from the CI 14.4–20.5), Itezhi-tezhi (17.1%; CI 11.8–24.1),  Kafue SAE map (Fig.  3b), with relatively high HIV being con- (17.1%; CI 14.4–20.1) and Lusaka (16.5%; CI 15.3–17.8). centrated in areas around central, southern and western On the other hand, the five districts with the lowest Zambia. Note, however, that the prevalence intensities in HIV prevalence, in descending order, were: Chama (5%; maps 3a and 3b are based on relative prevalence between 3.3–7.6), Zambezi (4.9%; CI 3–8.1), Kabompo (4.8%; CI the lowest and highest  prevalence estimates within each Mweemba et al. Population Health Metrics (2022) 20:8 Page 7 of 11 Fig. 2 HIV prevalence estimates and confidence intervals for the FH and direct estimates in Zambia’s districts: The confidence intervals of the FH estimates are narrower than those of the direct estimates for most of the districts Fig. 3 Zambia district HIV prevalence maps for raw (a) and SAE (b) data: The color variations in the heat map show the magnitude of the HIV prevalence in the 74 districts map. Therefore, comparing the two maps should be done The mapping shows that, generally, the districts in with caution. the north and eastern parts of the country have mod- erate HIV prevalence, while districts in north-western Mweemba et al. Population Health Metrics (2022) 20:8 Page 8 of 11 and north eastern parts of the country, i.e., North-West- among pregnant women to be a good predictor of adult ern and Muchinga provinces, have the lowest HIV HIV prevalence. On the other hand, dependence ratio prevalence. may be influencing HIV prevalence indirectly, i.e., high dependence ratio negatively affects economic well-being Discussion [37], which increases the vulnerability of the population This paper is the first to use SAE methods to estimate the and their susceptibility to HIV [38, 39]. prevalence of HIV at district level in Zambia. Our study Another important finding in this study is that dis- has demonstrated that national HIV estimates currently trict HIV prevalence in Zambia is spatially correlated, being used for HIV programming fail to account for the i.e., the prevalence in one district is correlated with the full picture of the distribution, and the extent of the vari- prevalence in adjacent districts. This is reasonable and ations in HIV prevalence at lower levels [6, 7, 28, 29]. expected since district boundaries are arbitrary, and Amoako Johnson [28], for instance, warns against relying therefore, individuals living in districts close to each other on national estimates for planning as this could lead to an are likely to have similar characteristics and risk factors “ecological fallacy,” where planning and resource alloca- [28, 40]. Similar studies have acknowledged the impor- tion fail to properly account for the variations that exist tance of accounting for spatial correlation at small-area at small domains, but may not be apparent at national levels [6, 28], and this is especially true for communicable level. The one-size-fits-all approach, associated with diseases such as HIV. It would be prudent, therefore, for national level estimates, is unlikely therefore to achieve neighboring districts to employ coordinated approaches the desired results at local levels [29]. to HIV programming and have a shared understanding of In the midst of declining HIV funding [30], design- local HIV drivers and impact of the disease. The mapping ing and targeting of HIV interventions require adequate of HIV prevalence in our study provides useful informa- knowledge on where the biggest resource needs lie. In tion to facilitate such a coordinated HIV response. the context of Zambia, for example, national HIV esti- The national HIV prevalence for Zambia has gener- mates would demand that more resources be allocated ally been highest in urban areas [15, 41, 42], and this is to the Western province, based on the disease burden. similar to other countries in the region such as Malawi, However, these national level estimates do not provide Kenya, South Africa and Zimbabwe [43–46]. However, any information on the district-specific HIV burden, district-level estimates from our study have revealed or sub-groups in greater need of HIV policy targeting that HIV prevalence in some rural districts is compa- within the province [31]. The revelations of the wide vari- rable, and sometimes even higher than the prevalence ations in the burden of HIV within districts should be in urban districts. For instance, we found that the two a policy concern and effectively makes the “bigger pic- highest HIV prevalence estimates in Zambia are in pre- ture” approaches  redundant, especially if the intention dominantly rural districts, with the highest district hav- is to make HIV programs more pragmatic and optimal ing almost seven percentage points higher prevalence at the local levels [32, 33]. The importance of account- than that of the most urbanized district of Lusaka. This ing for within district variation in HIV prevalence has is further proof that national-level estimates mask very been highlighted by our study. For instance, while the important HIV dynamics that can guide resource alloca- average HIV prevalence for Southern province is around tion at local levels [47]. It is likely that the national-level 13%, the within province prevalence varies from as low HIV dynamics observed in most countries are different as 7.4% to as high as 23.5%. Ensuring effective service to the situation at lower levels. As long as lower-level delivery, under such circumstances, requires recognizing prevalence estimates remain unknown, the allocation of and tailoring interventions to the needs of the different HIV resources will remain sub-optimal [48]. subpopulations at the level at which service delivery is The lessons that can be learnt from our study are that organized and delivered [34]. This remains a challenge for HIV prevalence is highest in districts located near inter- low resource countries, however, due to the higher cost national borders, along the main transit routes and adja- of obtaining data to generate small-area estimates [35]. cent to other districts with very high prevalence. Such Our study has also revealed important information districts tend to have high population mobility due to on the predictors of HIV prevalence at district level. For commerce and trade. Similarly, the two rural districts instance, our study has shown that ANC HIV prevalence with the highest HIV prevalence in Zambia are fishing and dependence ratio are the best out of survey predic- districts and attract a large number of people for fish tor for district HIV prevalence. This is similar to the HIV rated trade every year [49–52]. Population mobility has prediction model in South Africa [6], except the one in been shown to be a driver of HIV infections in other set- our study included an SAR spatial covariance struc- tings as well [53–55]. Other similar countries can draw ture. Other studies [36] have also found HIV prevalence important lessons from this finding. To demonstrate the M weemba et al. Population Health Metrics (2022) 20:8 Page 9 of 11 importance of population mobility in HIV transmission, HIV interventions. Profiling the burden of disease at our study found that districts that experience seasonal- appropriate levels is a key aspect in designing respon- ity of employment located along the main transit routes sive HIV interventions, and SAE models will increasingly and those along the international border have higher HIV become important tools in guiding policy making and prevalence than the national average. The above factors decision making, especially for low resource settings. have been shown to be associated with HIV in other set- tings as well [56–58]. Districts experiencing high popula- Study limitations tion mobility are potential HIV hotspots and should be The SAE model used in this study helped produce dis- prioritized for HIV interventions such as test and treat trict HIV prevalence estimate; however, the use of relative services, regardless of location. Similarly, areas that are in mean standard errors and confidence intervals to validate close proximity to districts with known high HIV preva- the model has a potential bias. It should be noted that lence need close attention due to the spatial nature of the ZAMPHIA is not designed to collect representative data HIV epidemic, as revealed by our study. at district level, and by design therefore, SAE methods There are some notable differences between our district are always going to produce relatively better estimates, HIV estimates and those of Dwyer-Lindgren et al., [11] a with smaller standard errors than ZAMPHIA estimates comparable study. For instance, our HIV prevalence esti- because they utilize additional data, in addition to the mates, which are based on the 2016 ZAMPHIA survey, survey-based estimates. An additional validation method ranged from 4.2% in Lundazi to 23.3% in Namwala, while would have been useful. Additionally, the model was built the prevalence estimates for Dwyer-Lindgren and others, with covariates as collected by the Census data and ZAM- over the same period, ranged from 4.4% in Isoka to 17.9% PHIA, and there is a chance that other HIV-related covar- in Mongu. The differences in results may be attributed to iates, not collected by the Census and the ZAMPHIA, e.g., the different modeling principles employed by the two the prevalence of transactional sex, could have strength- studies, or the fact that the Dwyer-Lindgren estimates ened the model. The other limitation is that this study are based on the age group 15–49 years with data from 9 uses data from different time points, i.e., the 2016 ZAM- provinces and 72 districts, while our estimates are based PHIA, 2017–18 ANC and 2010 census data, which may on the age group 15–59  years with data from 10 prov- affect the observed relationship between the outcome and inces and 74 districts. the explanatory covariates. It is, however, unlikely that any demographic changes over the review period would Conclusion significantly change our findings. Generally, this study This is the first study in Zambia to present and map HIV has provided policy relevant information that can be uti- prevalence estimates at district level using SAE methods. lized to improve targeting of HIV resources at local levels It is clear from the results that national estimates mask where interventions are planned and delivered. the wide variation in HIV prevalence within the districts. Ensuring that HIV resources are allocated where they are Abbreviations needed require knowledge on the distribution of HIV HIV/AIDS: Human immunodeficiency virus/acquired immunodeficiency syn- at smaller, more homogeneous areas such as districts. drome; SAE: Small-area estimation; ZAMPHIA: Zambia population-based HIV This study has been able to provide this information and impact assessment survey; ANC: Antenatal care; FH: Fay–Herriot; ESA: Eastern and Southern African region; KZN: Kwazulu-Natal; MoH: Ministry of Health; EA: mapped the distribution of district HIV in Zambia. Enumeration area; TB: Tuberculosis; SAR: Simultaneously autoregressive; CAR The revelation that HIV prevalence is very high in some : Conditional autoregressive; IAR: Intrinsic autoregressive; AIC: Akaike informa- rural districts is an important finding for HIV program- tion criterion; RMSE: Relative mean standard errors; UNZABREC: University of Zambia Biomedical Research Ethics Committee; ZamStats: Zambia Statistics ming. It is useful for policy makers to realize that relying Agency; US: United States; NIH: National Institutes of Health. on national level prevalence to plan interventions at dis- trict level may not be optimal because the HIV dynam- Supplementary Information ics at district level are likely to be different. Utilizing The online version contains supplementary material available at https:// doi. results from SAE techniques for planning and resource org/1 0. 1186/ s12963- 022-0 0286-3. allocation would ensure achievement of universal access to resources by underserved and underrepresented Additional file 1. Fitted models to estimate district HIV prevalence in populations. Zambia. Our results have documented drivers and markers of Additional file 2. Relative Standard Errors for direct and modeled HIV high HIV prevalence at district level; information that estimates. can be used to plan prevention and treatment interven- Additional file 3. Direct and modeled HIV estimates with confidence intervals. tions. Population mobility is a key driver of HIV and should be an important consideration when designing Mweemba et al. Population Health Metrics (2022) 20:8 Page 10 of 11 Acknowledgements 8. Ministry of Health. AIDS response fast track strategy 2015–2020. 2015 We are grateful to the Zambia Statistical Agency and the Zambian Ministry of [cited 2021 Mar 30]. https://w ww.n ac.o rg.z m/ sites/d efau lt/fi les/ publi Health for providing the data for this study. catio ns/ Zambia% 20Fast% 20Tra ck% 20Str ategy_0. pdf 9. Ouma J, Jeffery C, Valadez JJ, Wanyenze RK, Todd J, Levin J. Combining Authors’ contributions national survey with facility-based HIV testing data to obtain more CM contributed to the designing of the study, data analysis, interpretation and accurate estimate of HIV prevalence in districts in Uganda. BMC Public writing of the manuscript. IF contributed to data analysis, writing of the meth- Health. 2020;20(1):1–14. odology section and model building. PH contributed to building the model 10. Ministry of Health. National health strategic plan monitoring and and validating it and also writing up of the manuscript. WM contributed to evaluation framework 2017–2021 (2019). https://w ww.m oh. gov. zm/? writing up of the results and discussion section. FM contributed to critically wpfb_ dl= 121 reviewing and finalizing the write up of the manuscript. All authors read and 11. Dwyer-Lindgren L, Cork MA, Sligar A, Steuben KM, Wilson KF, Provost approved the final manuscript. NR, et al. Mapping HIV prevalence in sub-Saharan Africa between 2000 and 2017. Nature. 2019;570(7760):189–93. Funding 12. Hidiroglou M. Small-area estimation: theory and practice. In: JSM pro- This research was supported by the Fogarty International Center of the US ceedings, survey research methods section. Alexandria, VA: American National Institutes of Health (NIH) under Award Number D43 TW009744. The Statistical Association; 2007 [cited 2018 Sep 4]. /paper/Small-Area- content is solely the responsibility of the authors and does not necessarily Estimation-%3A-Theory-and-Practice-Hidiroglou/327266333da6fa5f51f- represent the official views of the National Institutes of Health. 71bf74f19fcfd0c2b24df 13. Chandra H, Chambers R. Multipurpose weighting for small area estima- Availability of data and materials tion. J Off Stat. 2009;25:379–95. The datasets used and/or analyzed during the current study are available from 14. Central Statistics Office. 2010 Census of population and housing: Zam- the corresponding author on reasonable request. bia National Analytical Report. 2012 [cited 2015 Apr 20]. http://w ww. zamst ats. gov.z m/ report/C ensus/ 2010/ Natio nal/2 010%2 0Cen sus% 20of% 20Pop ulati on% 20Nat ional%2 0Ana lytic al% 20Repo rt. pdf Declarations 15. Ministry of Health, Zambia. Zambia population-based HIV impact assessment (ZAMPHIA) 2016: First Report. Zambia. 2017. Ethics approval and consent to participate 16. Pfeffermann D, Tiller R. Small-area estimation with state–space Ethical approval was obtained from the University of Zambia Biomedical models subject to benchmark constraints. J Am Stat Assoc. Research Ethics Committee (UNZABREC) (Ref. No. 937-2020) and permission 2006;101(476):1387–97. from the Zambia National Health Research Authority. The study utilized sec- 17. Porter AT, Holan SH, Wikle CK, Cressie N. Spatial Fay-Herriot mod- ondary data from the Zambia Statistics Agency (ZamStats) and the Zambian els for small area estimation with functional covariates. Spat Stat. Ministry of Health. 2014;10:27–42. 18. Legendre P, Legendre L. Numerical ecology. Amsterdam: Elsevier; 2012. Consent for publication 19. Anselin L, Bera AK. Spatial dependence in linear regression models Not applicable. with an introduction to spatial econometrics. Urbana-Champaign; 1996. (Office of Research working paper / University of Illinois Competing interests at Urbana-Champaign, College of Commerce and Business The authors declare that they have no competing interests. Administration). 20. Riley S. Large-scale spatial-transmission models of infectious disease. Author details Science. 2007;316(5829):1298–301. 1 Department of Health Policy, Systems and Management, School of Public 21. Alene KA, Viney K, Moore HC, Wagaw M, Clements ACA. Spatial pat- Health, University of Zambia, Ridgeway Campus, P.O. Box 50110, Lusaka, terns of tuberculosis and HIV co-infection in Ethiopia. PLoS ONE. Zambia. 2 School of Public Health, University of Ghana, P.O. Box LG 571, Accra, 2019;14(12):e0226127. Ghana. 3 Department of Economics, School of Humanities and Social Science, 22. Yokoi T, Ando A. One-directional adjacency matrices in spatial autoregres- University of Zambia, Great East Road Campus, P.O Box 32379, Lusaka, Zambia. sive model: A land price example and Monte Carlo results. Econ Model. 2012;29(1):79–85. Received: 17 May 2021 Accepted: 8 February 2022 23. R Core Team. R: a language and environment for statistical computing. 2013; 24. Molina I, Marhuenda Y. sae: an R package for small area estimation. R J. 2015;7(1):81–98. 25. Wickham H. ggplot2. New York: Springer; 2009. https:// doi. org/1 0. 1007/ References 978-0-3 87- 98141-3. 1. UNAIDS. Global HIV & AIDS statistics—2020 fact sheet. 2020 [cited 2021 26. Ver Hoef JM, Hanks EM, Hooten MB. On the relationship between con- Apr 15]. https://w ww. unaids.o rg/e n/ resou rces/ fact- sheet ditional (CAR) and simultaneous (SAR) autoregressive models. Spat Stat. 2. UNAIDS. AIDSinfo. 2020 [cited 2021 May 10]. https://a idsi nfo.u naids. org/ 2018;25:68–85. 3. UNAIDS. AIDSinfo. 2019 [cited 2020 Nov 3]. http:// aidsin fo.u naids.o rg/ 27. De Smith MJ. Statistical Analysis Handbook 2018 edition. The Winchelsea 4. Simbayi L, Zuma K, Zungu NP, Moyo S. South African National HIV Press; 2018 [cited 2021 Apr 25]. https://w ww. statsr ef. com/ HTML/i ndex. prevalence, incidence, behaviour and communication survey, 2017—The html? car_m odels.h tml Human Sciences Research Council (HSRC). 2017 [cited 2020 Oct 21]. 28. Amoako Johnson F, Padmadas SS, Chandra H, Matthews Z, Madise NJ. https://w ww. hsrcpr ess.a c.z a/ books/ south- afric an-n atio nal-h iv- preva Estimating unmet need for contraception by district within Ghana: lence- incid ence- behavi our- and- commu nicati on-s urvey- 2017 an application of small-area estimation techniques. Popul Stud. 5. KwaZulu-Natal Office of the Premier HIV/AIDS Directorate. The KwaZulu- 2012;66(2):105–22. Natal Provincial multi-sectoral HIV, TB and STIs implementation plan 29. Makurumidze R, Decroo T, Lynen L, Chinwadzimba ZK, Van Damme W, 2017–2022. 2017. https:// sanac. org. za/w p- conte nt/ uploa ds/2 019/ 02/ Hakim J, et al. District-level strategies to control the HIV epidemic in Zim- PIP_K ZN_F inal-1. pdf babwe: a practical example of precision public health. BMC Res Notes. 6. Gutreuter S, Igumbor E, Wabiri N, Desai M, Durand L. Improving esti- 2020;13(1):393. mates of district HIV prevalence and burden in South Africa using small 30. ten Brink D, Martin-Hughes R, Kelly SL, Wilson DP. What is the impact of area estimation techniques. PLoS ONE. 2019;14(2):e0212445. a 20% funding cut in international HIV aid from the United States? AIDS. 7. Kondlo LO, Manda SOM. Small area estimation of HIV prevalence using 2019;33(8):1406–8. National Survey data in South Africa (2011). M weemba et al. Population Health Metrics (2022) 20:8 Page 11 of 11 31. Asian Development Bank. Introduction to small area estimation tech- operat ions/l ist/ impro ving- access- to- wash- and-h iv- servi ces- in- limul unga- niques: a practical guide for national statistics offices. Asian Development distr ict-i n-z ambia Bank; 2020. https:// www.a db. org/ publi catio ns/s mall- area- estim ation- 52. Mweemba CE, Funder M, Nyambe I, Van Koppen B. Poverty and Access to guide- natio nal- stati stics- offic es Water in Namwala District, Zambia: Report on the results from a House- 32. Rao JNK. Some methods for small area estimation. Riv Internazionale Sci hold Questionnaire Survey, Zambia. 2011. https:// www. diis. dk/ en/ resea Sociali. 2008;116(4):387–406. rch/p over ty- and- access- to-w ater- in- namwa la- distr ict- zambia 33. Niragire F, Achia TNO, Lyambabaje A, Ntaganira J. Bayesian mapping of 53. Solomon S, Kumarasamy N, Ganesh AK, Amalraj RE. Prevalence and risk HIV infection among women of reproductive age in Rwanda. PLoS ONE. factors of HIV-1 and HIV-2 infection in urban and rural areas in Tamil 2015;10(3):e0119944. Nadu, India. Int J STD AIDS. 1998;9(2):98–103. 34. Jain S, Wilk AS, Thorpe KE, Hammond PS. A model for delivering popula- 54. Camlin CS, Charlebois ED. Mobility and its effects on HIV acquisition and tion health across the care continuum. AJMC. 2018. https:// www. ajmc. treatment engagement: recent theoretical and empirical advances. Curr com/ view/a-m odel- for- deliv ering- popul ation- health- across- the-c are- HIV/AIDS Rep. 2019;16(4):314–23. contin uum. 55. Cassels S. Time, population mobility, and HIV transmission. Lancet HIV. 35. Bernal RTI, de Carvalho QH, Pell JP, Leyland AH, Dundas R, Barreto ML, 2020;7(3):e151–2. et al. A methodology for small area prevalence estimation based on 56. Coulibaly I. The impact of HIV/AIDS on the labour force in sub-Saharan survey data. Int J Equity Health. 2020;19(1):124. Africa: a preliminary assessment. Int Labour Organ Res Policy Anal No 3. 36. Grassly NC, Morgan M, Walker N, Garnett G, Stanecki KA, Stover J, et al. 2005; Uncertainty in estimates of HIV/AIDS: the estimation and application of 57. Avert. HIV and AIDS in Zambia. Avert. 2018 [cited 2021 Mar 18]. https:// plausibility bounds. Sex Transm Infect. 2004;80(suppl 1):i31–8. www.a vert. org/ profes siona ls/ hiv-a round- world/ sub- sahar an- africa/ 37. Ashraf Q, Weil D, Wilde J. The effect of fertility reduction on economic zambia growth. Popul Dev Rev. 2013;39:97–130. 58. Jawando JO, Adeyemi EO. Sexual exchange and cross-bor- 38. Bunyasi EW, Coetzee DJ. Relationship between socioeconomic status and der trade: implications for HIV/AIDS in Nigeria. SAGE Open. HIV infection: findings from a survey in the Free State and Western Cape 2020;10(2):2158244020917949. Provinces of South Africa. BMJ Open. 2017;7(11):e016232. 39. Igulot P, Magadi MA. Socioeconomic status and vulnerability to HIV infec- tion in Uganda: evidence from multilevel modelling of AIDS indicator Publisher’s Note survey data. AIDS Res Treat. 2018. https:// www. hindaw i.c om/j ourn als/ art/ Springer Nature remains neutral with regard to jurisdictional claims in pub- 2018/ 781214 6/ lished maps and institutional affiliations. 40. Pratesi M, Salvati N. Small area estimation: the EBLUP estimator based on spatially correlated random area effects. Stat Methods Appl. 2008;17(1):113–41. 41. Nakazwe C, Michelo C, Sandøy IF, Fylkesnes K. Contrasting HIV prevalence trends among young women and men in Zambia in the past 12 years: data from demographic and health surveys 2002–2014. BMC Infect Dis. 2019;19(1):432. 42. Fylkesnes K, Musonda RM, Kasumba K, Ndhlovu Z, Mluanda F, Kaetano L, et al. The HIV epidemic in Zambia: socio-demographic prevalence pat- terns and indications of trends among childbearing women. AIDS Lond Engl. 1997;11(3):339–45. 43. Nutor JJ, Duah HO, Agbadi P, Duodu PA, Gondwe KW. Spatial analysis of factors associated with HIV infection in Malawi: indicators for effective prevention. BMC Public Health. 2020;20(1):1167. 44. Magadi MA. Understanding the urban–rural disparity in HIV and poverty nexus: the case of Kenya. J Public Health. 2017;39(3):e63-72. 45. Gibbs A, Reddy T, Dunkle K, Jewkes R. HIV-Prevalence in South Africa by settlement type: a repeat population-based cross-sectional analysis of men and women. PLoS ONE. 2020;15(3):e0230105. 46. Schaefer R, Gregson S, Takaruza A, Rhead R, Masoka T, Schur N, et al. Spatial patterns of HIV prevalence and service use in East Zimba- bwe: implications for future targeting of interventions. J Int AIDS Soc. 2017;20(1):21409. 47. Kayeyi N, Fylkesnes K, Michelo C, Makasa M, Sandøy I. Decline in HIV prevalence among young women in Zambia: national-level estimates of trends mask geographical and socio-demographic differences. PLoS ONE. 2012;7(4):e33652. 48. National Research Council (US) Committee on Health. Improving health in the United States: the role of health impact assessment: why we need health-informed policies and decision-making. Washington (DC): Ready to submit your research ? Choose BMC and benefit from: National Academies Press (US); 2011 [cited 2021 Apr 24]. https:// www. • fast, convenient online submission ncbi. nlm.n ih. gov/ books/ NBK83 538/ 49. Ndubani P, Kamwanga J, Tembo R, Tete J, Buckner B. PLACE in Zambia: • thorough peer review by experienced rese archers in your field identifying gaps in HIV prevention in Mongu, Western Province, 2005— • rapid publication on acceptance MEASURE Evaluation. 2006 [cited 2021 Mar 18]. https:// www.m easu reeva • support for research data, including large and complex data types luati on. org/ resour ces/ publi catio ns/ tr-0 6- 42 50. Singh K, Buckner B, Tate J, Ndubani P, Kamwanga J. Age, poverty and alco- • gold Open Access which fosters wider collaboration and increased citations hol use as HIV risk factors for women in Mongu. Zambia Afr Health Sci. • maximum visibility for your research: over 100M website views per year 2011;11(2):204–10. 51. The OPC Fund for International Development. Improving Access to WASH At BMC, research is always in progress.and HIV Services in Limulunga District in Zambia—OPEC Fund for Inter- national Development. https:// opecf und. org. 2014. https:// opecf und. org/ Learn more biomedcentral.com/submissions