ARTICLE https://doi.org/10.1038/s41467-020-15779-8 OPEN The impact of antimalarial resistance on the genetic structure of Plasmodium falciparum in the DRC Robert Verity 1,17✉, Ozkan Aydemir 2,17, Nicholas F. Brazeau 3,17, Oliver J. Watson 1, Nicholas J. Hathaway4, Melchior Kashamuka Mwandagalirwa5, Patrick W. Marsh2, Kyaw Thwai3, Travis Fulton6, Madeline Denton6, Andrew P. Morgan 6, Jonathan B. Parr6, Patrick K. Tumwebaze7, Melissa Conrad8, Philip J. Rosenthal8, Deus S. Ishengoma9, Jeremiah Ngondi 10, Julie Gutman 11, Modest Mulenga12, Douglas E. Norris13, William J. Moss14, Benedicta A. Mensah15, James L. Myers-Hansen 15, Anita Ghansah15, Antoinette K. Tshefu5, Azra C. Ghani1, Steven R. Meshnick3, Jeffrey A. Bailey 2,18 & Jonathan J. Juliano 3,6,16,18✉ The Democratic Republic of the Congo (DRC) harbors 11% of global malaria cases, yet little is known about the spatial and genetic structure of the parasite population in that country. We sequence 2537 Plasmodium falciparum infections, including a nationally representative population sample from DRC and samples from surrounding countries, using molecular inversion probes - a high-throughput genotyping tool. We identify an east-west divide in haplotypes known to confer resistance to chloroquine and sulfadoxine-pyrimethamine. Fur- thermore, we identify highly related parasites over large geographic distances, indicative of gene flow and migration. Our results are consistent with a background of isolation by dis- tance combined with the effects of selection for antimalarial drug resistance. This study provides a high-resolution view of parasite genetic structure across a large country in Africa and provides a baseline to study how implementation programs may impact parasite populations. 1Medical Research Council Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London, UK. 2 Department of Pathology and Laboratory Medicine, Warren Alpert Medical School, Brown University, Providence, RI, USA. 3 Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, USA. 4 Program in Bioinformatics and Integrative Biology, University of Massachusetts, Worcester, MA, USA. 5 Kinshasa School of Public Health, Hôpital Général Provincial de Référence de Kinshasa, Kinshasa, Democratic, Republic of Congo. 6Division of Infectious Diseases, Department of Medicine, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA. 7 Infectious Disease Research Collaboration, Kampala, Uganda. 8 Department of Medicine, University of California- San Francisco, San Francisco, CA, USA. 9 National Institute for Medical Research, Tanga, Tanzania. 10 RTI International, Dar es Salaam, Tanzania. 11Malaria Branch, Center for Global Health, Centers for Disease Control, Atlanta, GA, USA. 12 Tropical Disease Research Centre, Ndola, Zambia. 13 Department of Molecular Microbiology and Immunology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA. 14 Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA. 15 Noguchi Memorial Institute of Medical Research, University of Ghana, Accra, Ghana. 16 Curriculum in Genetics and Molecular Biology, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA. 17These authors contributed equally: Robert Verity, Ozkan Aydemir, Nicholas F. Brazeau. 18These authors jointly supervised this work: Jeffrey A. Bailey, Jonathan J. Juliano. ✉email: r.verity@imperial.ac.uk; jjuliano@med.unc.edu NATURE COMMUNICATIONS | (2020) 11:2107 | https://doi.org/10.1038/s41467-020-15779-8 | www.nature.com/naturecommunications 1 1234567890():,; ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-15779-8 Malaria remains one of the largest global public health comprehensively genotype known antimalarial resistance genes inchallenges, with an estimated 219 million cases world- several hundred samples from the DRC10. Here, we introduce anwide in 20171. Despite decades of scale-up in control, expanded MIP panel targeted at 1834 single-nucleotide poly- there has been a recent resurgence, particularly in high trans- morphisms (SNPs) distributed throughout the P. falciparum mission countries in sub-Saharan Africa1. In addition, the genome, and designed to quantify differentiation and relatedness emergence of antimalarial resistance poses a major threat to between samples. Using this panel of genome-wide SNP MIPs, in current control and elimination efforts worldwide, and new tools combination with the previous drug resistance MIP panel, we are needed to quantify the changing landscape of drug resistance evaluate the parasite population diversity in 2537 parasite isolates on timescales relevant to malaria control programmes. Genomics from the DRC and surrounding countries in East and West has emerged as an useful method for better understanding Africa. We use this information to quantify relatedness of and parasite populations that can be leveraged to support the design gene-flow between parasites over large geographic scales and to of effective interventions against a continually evolving parasite. assess the origins of antimalarial resistance mutations. Data from genomic studies provides information that is com- plementary to epidemiological data2, and can help to answer Results several key questions, including how parasites are transmitted, Sample quality and filtering. We obtained 2537 samples col- how drug resistance spreads, and how malaria control efforts lected in 2013–2015 from the DRC and surrounding countries impact the diversity of the parasite population. However, to date, (DRC= 2039, Ghana= 194, Tanzania= 120, Uganda= 63, efforts to use genomics to inform malaria control efforts have Zambia= 121). All samples were sequenced using two separate suffered from three major limitations. First, much of the work has MIP panels: a genome-wide panel designed to capture overall been conducted in low transmission regions, such as Asia and levels of differentiation and relatedness, and a drug resistance transmission fringe regions of Africa, leaving it unclear how panel designed to target polymorphic sites known to be associated useful information can be gathered in the highest transmission with antimalarial resistance10. The genome-wide panel included settings. Some of these high burden regions have experienced 739 ostensibly geographically informative SNPs, chosen on the increasing malaria prevalence in recent years and are now the basis of high differentiation (FST) between surrounding African center of strategic plans for control efforts3,4. Second, most countries in publicly available genomic sequences made available genomic studies in Africa have relied upon convenience sampling by the Pf3K project (see Supplementary Note 1 and Supple- from a few sites usually collected for other purposes, rather than mentary Data 1), and 1151 putatively neutral SNPs distributed population-representative samples. Lastly, studies have either throughout the genome, with an overlap of 56 SNPs that were relied on relatively few genetic markers, providing limited insight both neutral and geographically informative. The drug resistance into the complete genome, or on expensive whole-genome panel included SNPs in known and putative drug resistance genes sequencing, limiting the number of samples studied. Over- and has been described elsewhere10. The median number of coming these limitations is essential for genomics to have broader unique molecular identifiers (UMIs) per MIP was 31 (range: impacts on malaria control. 1–8490) for the genome-wide panel, and 10 (range: 1–32,511) for Within Africa, parasite populations have been shown to vary the drug resistance panel. Complete UMI depth distributions are significantly between East and West, as demonstrated by their shown in Supplementary Fig. 1. After filtering for samples and distinct antimalarial drug susceptibilities and population loci with sufficient UMI coverage, we were left with 1382 samples genetics5,6. However, few genomic studies have incorporated and 1079 loci from the genome-wide panel, and 674 samples and samples from central Africa, limiting our understanding of the 1000 loci from the drug resistance panel, with an overlap of connectivity of parasite populations across the continent. The 452 samples between both panels. In addition to these samples, Democratic Republic of the Congo (DRC) is the largest malaria- 114 controls consisting of known mixtures were sequenced and endemic country in Africa, borders nine countries, and harbors used to assess the accuracy of allele calls and frequencies. ~11% of global P. falciparum malaria cases1. The DRC harbors a Expected versus measured allele frequencies for each SNP, cal- large, understudied parasite population that likely serves as a culated from these controls, are shown in Supplementary Fig. 2. bridge between African parasite populations. Limited previous work has shown that the DRC represents a watershed between East and West African drug resistant parasite populations for Complexity of infection in the DRC. Initial analyses focused on sulfadoxine-pyrimethamine and chloroquine resistance7–9. More the genome-wide MIP panel only. Complexity of infection (COI)13 recently, parasite population structuring due to mutations at these for each sample was estimated using THE REAL McCOIL and other loci associated with antimalarial resistance has been (Supplementary Fig. 3). The mean COI was estimated at 2.2 con rmed within the DRC10. However, studies focusing on (range 1–8) for the study as a whole. We observed significantfi hypervariable surface antigen diversity or neutral microsatellites differences in COI between countries (Ghana: 1.55 (non-para- have been unable to detect significant structure in the parasite metric bootstrap 95% CI: 1.39–1.73), DRC: 2.23 (2.15– 2.31), population10,11, likely due to a lack of high-quality genome-wide Tanzania: 2.17 (1.83–2.51), Zambia: 2.68 (2.39–3.00), Uganda signal. A better understanding of parasite populations and the 2.18 (1.87– 2.51), and within the DRC we observed a statistically spread of antimalarial resistance in the DRC will allow for the significant relationship between COI and P. falciparum pre- design of more effective interventions accounting for evolutionary valence by microscopy at both the province and cluster levels forces. (Supplementary Fig. 4), with higher COIs observed at higher To address this knowledge gap, we leverage a recent advance in prevalences. malaria genomics, high-throughput molecular inversion probe (MIP) capture and sequencing, to characterize and map parasite Population structure in the DRC. We explored population population structure and antimalarial resistance profiles in the structure through principal component analysis (PCA) evaluated DRC and to define the connections of parasites within the DRC on within-sample allele frequencies at all 1079 genome-wide loci. to East and West African parasite populations12. This approach We found the same separation between East and West Africa provides a cost-effective and scalable method of genome inter- described in previous studies (Fig. 1) as well as finer structure rogation, without the expense or informatic complexities of between regions within East Africa. DRC samples comprised a whole-genome sequencing. We previously employed MIPs to continuum between the East and West African clusters. 2 NATURE COMMUNICATIONS | (2020) 11:2107 | https://doi.org/10.1038/s41467-020-15779-8 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-15779-8 ARTICLE a b 4 3 2 Country 2 1 DRC 0 PC3 (1.23%) Ghana −1 Tanzania 0 −2 −3 −6 Uganda −4 −3 Zambia −2 −2 −1 0 0 PC1 (3.08%) −2 1 PC2 (1.53%) 2 2 3 −5.0 −2.5 0.0 2.5 4 4 PC1 (3.08%) Fig. 1 Principal component analysis. The first two (a) and three (b) principal components calculated from within-sample allele frequencies using the genome-wide MIP panel. Colors indicate country of origin of each sample. Principal Component 1 3 2 1 0 Principal Component 2 3 pfcrt 2 1 0 Principal Component 3 3 2 1 0 Principal Component 4 3 dhps 2 1 0 SNP position (ordered chromosomes) Geographic Informative Non-informative status of site Fig. 2 Per-locus contributions to principal components. The relative contribution (%) of each locus to the first four principal components. Chromosomes are plotted in order, separated by vertical white gridlines. Point colors indicate sites that were chosen in the design based on FST values to be geographically informative (blue) or not (red). The relative contribution of each locus to each principal and a relatively larger contribution (65.2%) came from putatively component was quantified through normalized loading values. geographically informative SNPs (non-parametric bootstrap, p < Relative contributions to the first four principal components are 0.001). In contrast, contributions to PC2 were concentrated in a shown in Fig. 2. After the fourth principal component the percent region on chromosome seven in close proximity to P. falciparum variance explained by subsequent components plateaued (Sup- chloroquine resistance transporter (pfcrt), a known drug plementary Fig. 5). For principal component 1 (PC1) large resistance locus, suggesting that resistance to chloroquine or contributions came from loci distributed throughout the genome, amodiaquine may be driving differentiation along this secondary NATURE COMMUNICATIONS | ( 2020) 11:2107 | https://doi.org/10.1038/s41467-020-15779-8 | www.nature.com/naturecommunications 3 Relative contribution (%) PC2 (1.53%) ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-15779-8 a PC1 b PC2 e pfcrt K76T prevalence Prevalence 10 10 5 5 8 8 0.8 6 6 0 4 4 0 −4 −2 0 2 −4 −2 0 2 0.6 −5 −5 0.4 PC value −10 4 −10 0.2 2 10 15 20 25 30 Longitude c PC3 d PC4 0 dhps K540E prevalence 10 10 f 5 −2 Prevalence8 8 5 6 6 4 4 −4 0.5 0 −4 −2 0 2 −4 −2 0 2 0 0.4 −5 −5 0.3 −10 −10 0.2 10 20 30 10 20 30 10 15 20 25 30 Longitude Longitude Fig. 3 Spatial patterns in principal components. a–d Show the mean principal component value per DHS cluster. e, f Show estimated distributions of the prevalence of molecular markers of resistance for pfcrt and pfdhps. axis. For PC3, locus contributions were concentrated in three genic regions: PF3D7_0215300 (8.5%), PF3D7_0220300 (5.0%), and PF3D7_1127000 (4.3%). The first and largest of these 0.0020 encodes an acyl-CoA synthetase and is part of a diverse gene 60 family known to undergo extensive gene conversion and 0.0015 recombination14. For PC4 we observed a region of high locus contribution on chromosome eight in close proximity to the 0.0010 known antifolate drug resistance gene dihydropteroate synthase (dhps). Combined, these results suggest that geography and drug 40 0.0005 resistance are both contributors to the observed population structure. 0.0000 The relationship between the PCA results and the spatial 0.5 0.6 0.7 0.8 0.9 1.0 IBD distribution of parasites was explored by plotting raw principal 20 component values against the geographic location of samples (Fig. 3a–d). For PC1 this revealed a complex pattern of spatial variation, containing both north–south and east–west clines. For PC2 and PC4 the maps essentially recapitulate the known geographic distribution of pfcrt and dhps resistance mutations, 0 respectively (Fig. 3e, f). For PC3 the map indicates some 0.00 0.25 0.50 0.75 1.00 east–west spatial structuring that is not explained by known IBD markers of antimalarial resistance and warrants further investigation. Fig. 4 Histogram of pairwise IBD. Pairwise IBD between all samples, estimated by maximum likelihood. Inset shows the heavy tail of the distribution, with some pairs of samples having IBD > 0.9. Between sample relatedness of parasites. The relatedness of all pairs of samples was explored through pairwise identity by des- on the number of genotyped positions, with estimates becoming cent (IBD), estimated using a maximum likelihood approach. IBD increasingly unreliable for smaller datasets of 100 or 20 SNP loci. describes the relatedness of samples in terms of their shared In the real data, the overall distribution of pairwise IBD was evolutionary history, and consequently is not influenced by a found to be heavy-tailed, consisting of a large body of weakly particular allele frequency distribution. This makes it a better related samples and a tail of very highly related samples (Fig. 4). measure than simple identity by state (IBS) when comparing Mean IBD was significantly higher within clusters compared to between studies, as values can be compared directly15. We first between clusters (0.06 vs. 0.02, two-sample t-test, p < 0.001). carried out a simulation-based analysis to explore the accuracy of When plotted against geographic separation there was a clear fall- our maximum likelihood estimator (see Supplementary Note 2 off of IBD with distance (Fig. 5a), consistent with the classical and Supplementary Fig. 6), finding that we were conservatively pattern expected under isolation-by-distance16,17. Focussing on biased in cases of high polyclonality. Hence, we expect to the tail of highly related samples, which includes the major strain underestimate true IBD by this method. This result did depend in complex infections, there were 12 sample pairs with a 4 NATURE COMMUNICATIONS | ( 2020) 11:2107 | https://doi.org/10.1038/s41467-020-15779-8 | www.nature.com/naturecommunications Latitude Frequency (%) Latitude Latitude Frequency (%) NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-15779-8 ARTICLE a 0.06 b 5 Congo river Pairwise IBD 0 910 1.000 0.04 1228 0.975 291 1174 1330 −5 0.950 0.925 0.02 0.900 −10 0 500 1000 1500 2000 15 20 25 30 Spatial distance (km) Longitude Fig. 5 Spatial patterns in IBD. a Shows the mean IBD between clusters, binned by the spatial distance between clusters. Vertical lines show 95% confidence intervals. b Shows the spatial distribution of highly related (IBD > 0.9) parasite pairs. Values above edges give distances in km. Black areas indicate major water bodies, including the Congo River, which is labeled. relatedness greater than IBD= 0.9. Comparison of raw allele divide with the K540E and A581G mutants concentrated in the frequency distributions confirmed that these were likely clones east, and S436A and A437G concentrated in the west. For crt we (Supplementary Fig. 7). These highly related pairs were found also find evidence of an east–west divide, with haplotypes con- more often within the same cluster than in different clusters (7 vs. taining N326S and F325C concentrated in the east and those 5, respectively, chi-squared test, p < 0.001), suggesting the containing I356T concentrated in the west. presence of local clonal transmission chains. The five between- cluster highly related pairs (Fig. 5b) were spread over large Selective sweep and haplotype analysis of antimalarial resis- geographic distances (281–1331 km), far beyond the normal tance. Using the antimalarial resistance MIPs and genome-wide expected scale of the breakdown in genetic relatedness (Fig. 5a), SNP MIPs combined, the extended haplotypes of the monoclonal suggesting recent long distance migration. infections were determined for 200 kb upstream and downstream of each putative drug resistance allele that had at least 5% overall Prevalence of markers of antimalarial resistance. Based on prevalence in the DRC. The CVIET haplotype within the crt gene previous findings of an east–west divide in molecular markers of showed a signal of positive selection, with longer haplotype blocks antimalarial resistance in the DRC8,9, all samples in the DRC in western DRC as compared to eastern DRC (Fig. 7; p’XP- were divided by geographically weighted K-means clustering into EHHD < 0.05). In the east, patterns of haplotype homozygosity two populations (Supplementary Fig. 8). The prevalence of every are consistent with positive selection for the derived I356T hap- mutation identified by the drug resistance MIP panel was cal- lotype (Supplementary Fig. 9), although a XP-EHHD statistic culated in eastern and western DRC, as well as at the country could not be calculated for this locus because the derived hap- level. Table 1 gives a summary of all mutations that reached a lotype was absent in western DRC, supporting the geographic prevalence >5% in any geographic unit, and a complete list of all localization of the I356T mutation in the east (Fig. 6). identified mutations along with their prevalence is given in Mutations in dhps were more difficult to interpret. This gene Supplementary Data 2. Note that in the dhpsmutation G437A the has undergone multiple selective sweeps associated with increas- reference is resistant, hence this is re-coded as A437G and pre- ing drug resistance. The most recently introduced mutation into valence values indicate the prevalence of the reference allele. the DRC, dhps A581G, showed relatively conserved local Estimated prevalences of these alleles in the DRC as a whole were haplotypes around the mutation in both eastern and western broadly similar to previously published estimates10. However, we DRC (Supplementary Fig. 10). Extended haplotypes around the did identify several polymorphisms in known and putative other mutations (Supplementary Figs. 11 and 12) are inconsistent resistance genes not previously reported in the DRC, including with a classical hard sweep, perhaps due to selection on multiple kelch K189T and pfatp6 N569K, both of which have been independent haplotypes or to interference between A581G and described at appreciable frequencies elsewhere in Africa18–20. other linked alleles. Finally, we did not detect any strong signals of differing patterns of recent positive selection between the Geographic distribution of antimalarial resistant haplotypes. eastern and western DRC among the dhfr and mdr2 genes Previous studies have demonstrated that mutations associated (Supplementary Table 1, Supplementary Fig. 13). with antimalarial resistance are clustered into east–west group- ings within DRC8,10. Focusing on the 107 samples from DRC that Discussion were identified as monoclonal from the REAL McCOIL analysis, Here we provide the first large-scale, robustly sampled study of we explored the joint distribution of all combinations of mutant falciparum malaria in central Africa using MIP capture and haplotypes in both the dhps and crt genes. Raw combinations of sequencing, a high-throughput genotyping approach that is mutations were visualized using the UpSetR package in R21, and appropriate for large population based surveys. Using a panel of the spatial distribution of haplotypes in the DRC was explored by probes designed to detect genome-wide SNPs, combined with a plotting these same mutant combinations against their corre- second panel targeting drug resistance genes, we were able to sponding DHS cluster locations (Fig. 6). Our results for dhps show that the parasite population in the DRC contains a signal of recapitulate those found previously, showing a clear east–west differentiation by geographic separation, consistent with the NATURE COMMUNICATIONS | ( 2020) 11:2107 | https://doi.org/10.1038/s41467-020-15779-8 | www.nature.com/naturecommunications 5 Mean IBD Latitude ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-15779-8 Table 1 Prevalence (%) of mutations identified by the drug resistance MIP panel. Prevalence Gene Chromosome Position Mutation Name Overall DRC DRC West DRC East Ghana Uganda Zambia atp6 chr1 267007 I723V 1.1 0.3 0.7 0.0 4.2 7.3 0.0 atp6 chr1 267257 G639D 2.0 1.8 2.9 1.0 0.0 7.3 0.0 atp6 chr1 267467 N569K 24.1 21.9 18.8 24.0 16.7 41.5 28.9 atp6 chr1 267882 E431K 15.3 17.0 18.8 15.7 16.7 9.8 6.7 atp6 chr1 267970 L402V 7.1 8.2 10.1 6.9 12.5 0.0 2.2 dhfr-ts chr4 748239 N51I 83.0 79.5 81.2 78.4 75.0 100.0 97.8 dhfr-ts chr4 748262 C59R 71.2 63.2 63.0 63.2 95.8 95.1 97.8 dhfr-ts chr4 748410 S108N 97.8 97.1 97.1 97.1 100.0 100.0 100.0 dhfr-ts chr4 748577 I164L 3.1 0.6 0.0 1.0 0.0 29.3 0.0 mdr1 chr5 958145 N86Y 12.4 14.3 18.8 11.3 16.7 7.3 0.0 mdr1 chr5 958440 Y184F 37.4 36.5 39.9 34.3 58.3 31.7 37.8 mdr1 chr5 958484 T199S 1.3 0.0 0.0 0.0 0.0 14.6 0.0 mdr1 chr5 958584 S232Y 2.7 3.5 5.1 2.5 0.0 0.0 0.0 mdr1 chr5 961625 D1246Y 4.4 2.9 3.6 2.5 0.0 24.4 0.0 crt chr7 403620 M74I 30.3 28.7 37.7 22.5 16.7 85.4 0.0 crt chr7 403621 N75E 30.3 28.7 37.7 22.5 16.7 85.4 0.0 crt chr7 403625 K76T 30.3 28.7 37.7 22.5 16.7 85.4 0.0 crt chr7 404407 A220S 28.1 24.6 31.9 19.6 8.3 100.0 0.0 crt chr7 405600 I356T 7.1 9.4 21.0 1.5 0.0 0.0 0.0 dhps chr8 549681 S436A 15.0 17.3 28.3 9.8 37.5 0.0 0.0 dhps chr8 549685 G437A 26.8 32.7 27.5 36.3 4.2 0.0 17.8 dhps chr8 549993 K540E 25.4 17.0 9.4 22.1 0.0 85.4 48.9 dhps chr8 550117 A581G 8.2 6.1 2.2 8.8 0.0 34.1 4.4 k13 chr13 1726431 K189T 14.8 14.9 18.8 12.3 54.2 0.0 6.7 mdr2 chr14 1956202 I492V 23.2 21.3 22.5 20.6 20.8 31.7 31.1 mdr2 chr14 1956408 F423Y 31.4 30.1 28.3 31.4 29.2 36.6 37.8 Includes all mutations that reached a prevalence >5% in any given geographic unit. classical pattern of isolation-by-distance. This background values in publically available samples from surrounding countries. population structure is overlaid with the clear impacts of drug This increases our power to detect geographic differentiation, but resistance mutations, which cause distinct structure between East comes at the cost of not being able to comment on the relative and West African parasite populations. Additionally, the use of importance of geography vs. drug resistance, which would require relatively dense genome-wide SNPs allowed us to carry out random genetic sampling or alternatively whole genomes. Simi- relatedness analysis, revealing a handful of cases where human larly, we should be cautious when interpreting spatial clines in hosts separated by many hundreds of kilometers were infected by population structure from our data, as we may have greater power essentially identical clones. Given the rapid breakdown of distinct to detect structure along some axes than others due to the genotypes by recombination in high transmission areas, it is unequal distribution of surrounding countries in publically highly likely that these events represent relatively recent infection available samples, although in general we have good representa- and migration events. With this in mind, it is interesting to note tion in both the East–West and North–South directions. that pairwise links of high relatedness tend to fall along the Congo The flexible nature of MIP panels allows for multiplex detec- River, an important route of transportation in DRC. Lastly, the tion of SNPs associated with drug resistance in any known or combination of the two MIP panels allowed us to examine putative resistance loci for which they are designed. This allowed extended haplotypes surrounding drug resistance genes, revealing for a more detailed evaluation of molecular markers associated rapid breakdown of haplotypes in the population and different with antimalarial resistance than has previously been possible in signals of selection in East vs. West DRC. the DRC. To date, studies of antimalarial resistance markers in We previously investigated population structure using MIPs the DRC have focused primarily on pfcrt (K76T), dhfr (N51I, targeting 20 microsatellites in the DRC10, failing to detect a C59R, S108N, I164L), dhps (I431V, S436A, A437G, K540E, strong signal of population structure based upon these markers. A581G, A613S), pfmdr (N86Y, F184Y, D1246Y), and a few kelch Here we leveraged the same 552 samples as the previous study, mutations23–29. The data suggests that mutations associated with plus additional samples from the DRC and neighboring countries, artemisinin resistance remained absent in the country as of 2014. to identify clear structure with an improved SNP-based geno- The World Health Organization identified nine mutations within typing method. Our ability to detect population structure in the the K13 propeller region that are validated in terms of their present study is likely due to several factors. First, the SNP panel clinical phenotype of artemisinin resistance, and a further 11 contains nearly two orders of magnitude more markers than the mutations that are candidates associated with the phenotype of previous panel. While this SNP MIP panel expanded the number delayed clearance30. We identified 14 mutations within the K13 of loci interrogated, we have yet to achieve the full potential of gene (Supplementary Data 2), although none of these correspond MIPs. Specifically, massively increased, multiplexed probe sets to validated or candidate artemisinin resistance mutations. that target additional portions of the genome are feasible. MIPs Beyond looking at mutations within drug resistance genes, have now been used in human studies to detect as many as 55,000 differences in extended haplotypes around drug resistance genes markers in a single reaction22. Second, a large number of have been used to understand evolution and spread31. Though genome-wide SNPs in this study were chosen based on high FST not originally designed for this purpose, the genome-wide MIP 6 NATURE COMMUNICATIONS | (2020) 11:2107 | https://doi.org/10.1038/s41467-020-15779-8 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-15779-8 ARTICLE a b 43 5 40 30 23 0 20 13 11 10 10 7 –5 A581G S436A K540E –10 None A437G 60 40 20 0 15 20 25 30 Set size Longitude c 80 d 70 5 60 40 0 20 18 7 2 1 3 1 1 2 2 –5 G32D F325C N326S I356T A220S –10 K76T N75E M741 None 60 40 20 0 15 20 25 30 Longitude Set size Fig. 6 Spatial distribution of drug resistant haplotypes. The spatial distribution of all combinations of mutant haplotypes for dhps and crt from the monoclonal DRC samples. a, c UpSetR plots showing the number of times each combination of mutations was seen for dhps and crt, respectively. b, d Show these same haplotypes on a map of DRC. Colors correspond horizontally between panels, i.e. between a and b, and between c and d, with the exception of wild-type haplotypes (gray) which are not shown in b, d. panel can be leveraged for conducting similar analyses. For Methods example, the differences in CVIET EHH between the West and Study populations. Chelex-extracted DNA samples from dried blood spots, col- East suggests that the CVIET haplotype in the West has poten- lected as part of the 2013–2014 DRC Demographic Health Survey (DHS), were tested using quantitative real-time PCR to detect Plasmodium falciparum lactate tially been more recently introduced, has experienced less dehydrogenase (pfldh)32,33. Previously published DRC samples10 were included breakdown through recombination, or has undergone stronger (n= 589), and used to set a Ct threshold of <30 for inclusion for sequencing, which recent positive selection as compared to the East. Redesign of the was applied to the remaining DRC samples (n= 1450), resulting in a total of 2039 selected targets with denser sampling around known drug resis- DRC samples sent for sequencing. These samples represented 369 of the overall 539 DHS clusters. In addition, dried blood spot samples from four further counties tance genes will allow for more robust assessment of these were used: Ghana (n= 194), Tanzania (n= 120), Uganda (n= 63), and Zambia selected regions. (n= 121). Samples from Ghana were collected in 2014 from symptomatic RDT DRC’s location in central Africa and the enormous number of and/or microscopy positive individuals presenting at health care facilities in Begoro malaria cases in the country means that malaria control in Africa (n= 94) and Cape Coast (n= 98)34. Samples from Tanzania were collected in 2015 likely depends on improving our understanding on Congolese from symptomatic RDT-positive patients of all ages at Kharumwa Health Center inNorthwest Tanzania35. Samples from Uganda were collected in 2013 from RDT- malaria. This represents the largest study of falciparum popula- positive symptomatic patients at Kanungu in Southwest Uganda36. Finally, samples tion genetics in the DRC and, unlike other large population from Zambia were collected in 2013 from RDT-positive individuals from a com- genetic studies of malaria in Africa, leverages a nationally munity survey of all ages in Nchelenge District in northeast Zambia on the border representative sampling approach. Thus, this study provides the with the DRC. All non-DRC samples were Chelex extracted, except for the Gha- naian samples which were extracted using QiaQuick per protocol (Qiagen, Hilden, first data on fine-scale genetic structure of parasites at a national Germany). This study was approved by the Internal Review Board at UNC and the scale in Africa, and provides a baseline that can be used to study Ethics Committee of the Kinshasa School of Public Health. how implementation programs impact parasite populations in the region. The MIP platform represents a highly scalable and cost- Design of MIP panels. We used two distinct MIP panels—a genome-wide panel effective means of providing genome-wide genetic data, relative to designed to capture overall levels of differentiation and relatedness, and a drug whole-genome sequencing10. The highly flexible nature of the resistance panel designed to target polymorphic sites known to be associated with platform allows it to be rapidly scaled in terms of targets and antimalarial resistance (Supplementary Note 1). When selecting targets for the samples leading it to be applicable across malaria-endemic genome-wide panel, we used the publicly available P. falciparum whole-genomesequences provided by the Pf3k and P. falciparum Community projects from the countries. MalariaGEN Consortium. This consisted of sample sets from Cameroon (n= 134), NATURE COMMUNICATIONS | ( 2020) 11:2107 | https://doi.org/10.1038/s41467-020-15779-8 | www.nature.com/naturecommunications 7 Intersection size Intersection size Latitude Latitude ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-15779-8 a b K76T in DRC-East K76T in DRC-West 1.00 1.00 0.75 0.75 Allele Allele 0.50 Ancestral 0.50 Ancestral Derived Derived 0.25 0.25 0.00 0.00 8 12 16 20 8 12 16 20 Position (cM) Position (cM) c d 395 400 405 410 395 400 405 410 Position (kbp) Position (kbp) Fig. 7 Extended haplotype homozygosity and bifurcation plots for pfcrt K76T. a, b Display extended haplotype homozygosity (EHH) curves from the monoclonal samples with no missing genotype data 200 kb upstream and downstream from the K76T core single-nucleotide polymorphism in centimorgans among the samples from the eastern Democratic Republic of the Congo (DRC) and western DRC. c, d Show haplotype bifurcation plots with respect to the core allele ancestry and the eastern DRC and western DRC for a subsetted region. Position is considered in kilobases, and segregating sites for each haplotype are displayed at the nodes. Overall, there is strong evidence for recent positive selection of the pfcrt CVIET haplotype in the west that is mitigated in the east. DRC (n= 285), Kenya (n= 52), Malawi (n= 369), Nigeria (n= 5), Tanzania (n= 150 bp paired end sequencing with dual indexing using Nextseq 500/550 Mid- 66) and Uganda (n= 12) (Supplementary Data 1). The genomic sequence from output Kit v2. these samples underwent alignment, variant calling, and variant-filtering following the Pf3k strategy consistent with the Genome Analysis Toolkit (GATK, version 3.6 unless otherwise indicated) Best Practices with minor modifications37–40. Full MIP variant calling and filtering. MIP variant calling is summarized in the details of the bioinformatic pipeline used in MIP design are given in the Supple- Supplementary Note 110. Within each sample, variants were dropped if they had a mentary Note 1. Samples from Nigeria and Uganda were dropped after variant Phred-scaled quality score of <20. Across samples, variant sites were dropped if calling due to small sample sizes, and the final filtered sequences were used to they were observed only in one sample, or if they had a total UMI count of <5 calculate Weir and Cochran’s F 41ST with respect to country for each biallelic locus. across all samples. This data set was considered the final raw data used for addi- The 1000 loci with the highest FST values were considered for MIP design as tional filtering. phylogeographically informative loci. Of these 1000 potential loci, 739 were Additional filters were applied to both genome-wide and drug resistance identified as regions that were suitable for MIP-probe design. Separately, from the datasets prior to carrying out analysis. Sites were restricted to SNPs, and in the case combined SNP file, we identified 1595 loci that had a minor-allele frequency >5%, of the genome-wide panel these were filtered to the pre-designed biallelic target had an FST value between 0.005 and 0.2, and were annotated by SnpEff (version 4- SNP sites. Any variant that was represented by a single UMI in a sample, or that 3t) as functionally silent mutations. These loci were identified as putatively neutral had a within-sample allele frequency (WSAF), UMI count of allele/total UM less SNPs, and 1151 were found to be suitable for MIP design. The distribution of MIPs than 1%, was eliminated. Any site that was invariant across the entire dataset after is shown in Supplementary Fig. 14 and MIP sequences and targets are shown in this procedure was dropped. Samples were assessed for quality in terms of the Supplementary Data 3. proportion of low-coverage sites, where low-coverage was defined as fewer than 10 supporting UMIs. Samples with >50% low-coverage loci were dropped. Variant sites were then assessed by the same means in terms of the proportion of low- MIP capture and sequencing. In addition to patient samples, control samples coverage samples, and sites with >50% low-coverage samples were dropped. were known mixtures of 4 strains of genomic DNA from malaria at the following Samples were then combined with metadata, including geographic information, ratios: 67% 3D7 (MRA-102, BEI Resources, Manasas, VA), 14% HB3 (MRA-155), and were only retained if there were at least 10 samples in a given country. This 13% 7G8 (MRA-154) and 6% DD2 (MRA-156). They were also represented at two resulted in dropping Tanzanian samples from the drug resistance dataset, but no different parasite densities (29 and 467 parasites/µl). MIP capture and sequencing other countries were dropped. Post-filtering, genome-wide data consisted of library preparation were carried out as described in the Supplementary Note 110. 1382 samples (DRC= 1111, Ghana= 114, Tanzania= 30, Uganda= 45, Zambia Drug resistance libraries were sequenced on Illumina MiSeq instrument using = 82) and 1079 loci, and drug resistance data consisted of 674 samples (DRC= 250 bp paired end sequencing with dual indexing using MiSeq Reagent Kit v2. 557, Ghana= 29, Uganda= 43, Zambia= 45) and 1000 loci. The complete Genome-wide libraries were sequenced on Illumina Nextseq 500 instrument using bioinformatic pipeline is shown in Supplementary Fig. 15. 8 NATURE COMMUNICATIONS | ( 2020) 11:2107 | https://doi.org/10.1038/s41467-020-15779-8 | www.nature.com/naturecommunications EHH statistic EHH statistic Ancestral Derived Ancestral Derived NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-15779-8 ARTICLE Complexity of infection. We applied THE REAL McCOIL (v2) categorical method the drug resistance filtered biallelic SNPs into a single dataset (Supplementary to the SNP genotyped samples to estimate the COI of each individual13. Details of Note 1). All associated EHH calculations were carried out using the R-package rehh the analysis are in the Supplementary Note 1. (version 2.0.4), and were truncated when fewer than two haplotypes were present or the EHH statistic fell below 0.0544,45. In addition, we allowed EHH integration Analysis of population structure. WSAFs were calculated for all genome-wide calculations to be made without respect to “borders,” which were frequent due to SNPs, with missing values imputed as the mean per locus. Principal component the MIP-probe design. Although this would result in an inflated integration statistic analysis (PCA) was carried out on WSAFs using the prcomp function in R version if the EHH statistic had not yet reached 0 within the region of investigation, this 3.5.1. The relative contribution of each locus was calculated from the loading values problem was mitigated by only comparing between subpopulations, and not j j PL j j between loci. EHH decay, bifurcation plots, and haplotype plots were adapted fromas li = i¼1 li , where jlij is the absolute value of the loading at locus i, and L is the the rehh package objects and modified using ggplot46. total number of loci. PCA results were explored in a spatial context by taking the mean of the raw principal component values over all samples in a given DHS cluster, and plotting this against the geoposition of the cluster. Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article. Identity by descent analysis. Pairwise IBD was calculated between all samples from the genome-wide SNPs. We used Malécot’s42 definition of f as the probability Data availability of identity by descent, where fuv can be defined as the probability of a randomly DHS data for the 2013 DRC DHS is available here: https://dhsprogram.com/what-we-do/ chosen locus being IBD between samples u and v. At locus i, let A denote the survey/survey-display-421.cfm. This includes clinical and GPS information and is reference allele, which occurs at population allele frequency pi , and let a denote the available upon request from the DHS program. All raw sequencing data is available at the non-reference allele, which occurs at population allele frequency qi ¼ 1 pi . NCBI SRA (Accession numbers: PRJNA454490, PRJNA545345, and PRJNA545347). Assuming that both samples u and v are monoclonal, let Xui denote the observed allele at locus i in sample u, and equivalently let Xvi denote the observed allele in sample v. Then the probabilities of all possible observed allele combinations Code availability between the two samples can be written: Tools for MIP variant calling and filtering are available at https://github.com/bailey-lab/ PrðXui ¼ A;Xvi ¼ AjfuvÞ ¼ fuvpi þ ð MIPTools (v.0.19.12.13) and https://github.com/Mrc-ide/mipanalyzer (v.1.0.0). Code 1 fuvÞp2i and data are available for each figure at https://github.com/bobverity/ PrðXui ¼ A;Xvi ¼ ajfuvÞ ¼ ð1 fuvÞpiqi ð Þ antimalarial_resistance_DRC. Code access is unrestricted. PrðXui ¼ a;Xvi ¼ j 1 A fuvÞ ¼ ð1 fuvÞpiqi PrðXui ¼ a;Xvi ¼ ajfuvÞ ¼ fuvqi þ ð1 f 2uvÞqi Received: 31 May 2019; Accepted: 28 March 2020; from which we can calculate the likelihood of a given value of fuv over all loci as: YL Lðfuv jXu;XvÞ ¼ PrðXui;XvijfuvÞ: ð2Þ i¼1 In practice, population allele frequencies (pi) were calculated using the mean References WSAF for that locus over all samples. Samples were then coerced to monoclonal by 1. WHO. World Malaria Report 2017 (WHO, 2018). calling the dominant allele at every locus. The likelihood was evaluated using 2. Neafsey, D. E. & Volkman, S. K. Malaria genomics in the era of eradication. Eq. (2) in log-space for a range of values of fuv distributed between 0 and 1 in equal Cold Spring Harb. Perspect. Med. 7, https://doi.org/10.1101/cshperspect. increments of 0.02. The maximum likelihood estimate f̂uv ¼ argmaxf Lðf jXu;XvÞ a025544 (2017). was calculated between all sample pairs. Hereafter the terms IBD and f̂ are used 3. Bill & Melinda Gates Foundation. Malaria, https://gatesfoundation.org/What-uv interchangeably. We-Do/Global-Health/Malaria (2019). The validity of this method of coercing samples to monoclonal before 4. World Health Organization. WHO: High Burden to High Impact. A Targeted estimating IBD via maximum likelihood was rigorously explored in a simulation- Malaria Response (WHO, 2019). based analysis. First, a simulation framework was created that permitted simulating 5. Pearce, R. J. et al. Multiple origins and regional dispersal of resistant dhps in samples with variable polyclonality. This framework is described in detail in African Plasmodium falciparum malaria. PLoS Med. 6, e1000055 (2009). Supplementary Note 2. Second, true vs. estimated IBD were plotted for a range of 6. Ocholla, H. et al. Whole-genome scans provide evidence of adaptive evolution in polyclonal settings and a range of sub-sampled data sizes going down from the true Malawian Plasmodium falciparum isolates. J. Infect. Dis. 210, 1991–2000 (2014). data to 500, 100, and 20 SNPs. Any positive or negative bias introduced by forcing 7. Carrel, M. et al. The geography of malaria genetics in the Democratic Republic samples to be monoclonal would be reflected and quantified in this plot. of Congo: a complex and fragmented landscape. Soc. Sci. Med. 133, 233–241 Mean IBD was calculated within and between DHS clusters, and compared (2015). using a two-sample t-test. Sample pairs were also binned into groups based on 8. Taylor, S. M. et al. Plasmodium falciparum sulfadoxine resistance is geographic separation (great circle distance) in 100 km bins, with an additional bin geographically and genetically clustered within the DR Congo. Sci. Rep. 3, at distance 0 km to capture within-cluster comparisons. Mean and 95% confidence 1165 (2013). intervals of IBD were calculated for each group. Finally, sample pairs with IBD > 9. Antonia, A. L. et al. A cross-sectional survey of Plasmodium falciparum pfcrt 0.9 were identified, and explored in terms of their WSAFs and their spatial mutant haplotypes in the Democratic Republic of Congo. Am. J. Trop. Med. distribution. Hyg. 90, 1094–1097 (2014). 10. Aydemir, O. et al. Drug resistance and population structure of Plasmodium Estimating mutation prevalence from drug resistance panel. Given previous falciparum across the Democratic Republic of Congo using high-throughput findings of an East–West divide in molecular markers of antimalarial resistance in molecular inversion probes. J. Infect. Dis. https://doi.org/10.1093/infdis/ the DRC8,9, all samples in the DRC were divided by geographically weighted K- jiy223 (2018). means clustering into two populations. The prevalence of every mutation identified 11. Verity, R. et al. Plasmodium falciparum genetic variation of var2csa in the by the drug resistance MIP panel was then calculated in East and West DRC, as Democratic Republic of the Congo. Malar. J. 17, 46 (2018). well as at the country level. Prevalences in each DHS cluster were used to produce 12. O’Roak, B. J. et al. Multiplex targeted sequencing identifies recurrently smooth prevalence maps using PrevMap version 1.4.2 in R43. mutated genes in autism spectrum disorders. Science 338, 1619–1622 (2012). 13. Chang, H.-H. et al. THE REAL McCOIL: a method for the concurrent estimation of the complexity of infection and SNP allele frequency for malaria Analysis of monoclonal haplotypes. Results of the previous COI analysis on the genome-wide SNPs with THE REAL McCOIL were used to identify samples that parasites. PLoS Comput. Biol. 13, e1005348 (2017). were monoclonal with a high degree of con dence. Samples were de ned as 14. Bethke, L. L. et al. Duplication, gene conversion, and genetic diversity in thefi fi monoclonal if the upper 95% credible interval did not include any COI greater than species-specific acyl-CoA synthetase gene family of Plasmodium falciparum. one. This resulted in 408 monoclonal samples, of which 143 overlapped with the drug Mol. Biochem. Parasitol. 150, 10–24 (2006). resistance MIP dataset and therefore could be used to explore the joint distribution of 15. Taylor, A. R., Jacob, P. E., Neafsey, D. E. & Buckee, C. O. Estimating mutations in drug resistance genes. 107 of these were from DRC. Analysis focussed relatedness between malaria parasites. Genetics https://doi.org/10.1534/ on the dhps and crt genes. Raw combinations of mutations were visualized using the genetics.119.302120 (2019). UpSet package in R21, and the spatial distribution of haplotypes was explored by 16. Rousset, F. Genetic differentiation and estimation of gene flow from F- plotting these same mutant combinations against DHS cluster geoposition. statistics under isolation by distance. Genetics 145, 1219–1228 (1997). 17. Crow, J. F. & Kimura, M. in An Introduction to Population Genetics Theory Ch. 9.9 (Harper & Row, New York, 1970). Extended haplotype homozygosity analysis. In order to improve our power to 18. Talundzic, E. et al. Molecular epidemiology of Plasmodium falciparum detect hard-sweeps and capture patterns of linkage-disequilibrium with EHH kelch13 mutations in senegal determined by using targeted amplicon deep statistics among putative drug resistance SNPs, we combined the genome-wide and NATURE COMMUNICATIONS | (2020)1 1:2107 | https://doi.org/10.1038/s41467-020-15779-8 | www.nature.com/naturecommunications 9 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-15779-8 sequencing. Antimicrob. Agents Chemother. 61, https://doi.org/10.1128/ 43. Giorgi, E. & Diggle, P. J. PrevMap: An R package for prevalence apping. J. Stat. AAC.02116-16 (2017). Softw. 78, https://doi.org/10.18637/jss.v078.i08 (2017). 19. Torrentino-Madamet, M. et al. Limited polymorphisms in k13 gene in 44. Gautier, M., Klassmann, A. & Vitalis, R. rehh 2.0: a reimplementation of the R Plasmodium falciparum isolates from Dakar, Senegal in 2012–2013. Malar. J. package rehh to detect positive selection from haplotype structure. Mol. Ecol. 13, 472 (2014). Resour. 17, 78–90 (2017). 20. Dahlström, S. et al. Diversity of the sarco/endoplasmic reticulum Ca(2+)-ATPase 45. Gautier, M. & Vitalis, R. rehh: an R package to detect footprints of selection in orthologue of Plasmodium falciparum (PfATP6). Infect. Genet. Evol. 8, 340–345 genome-wide SNP data from haplotype structure. Bioinformatics 28, (2008). 1176–1177 (2012). 21. Lex, A., Gehlenborg, N., Strobelt, H., Vuillemot, R. & Pfister, H. UpSet: 46. Wickham, H. ggplot2: Elegant Graphics for Data Analysis (Springer, New Visualization of intersecting sets. IEEE Trans. Vis. Comput. Graph. 20, York, 2009). 1983–1992 (2014). 22. Turner, E. H., Lee, C., Ng, S. B., Nickerson, D. A. & Shendure, J. Massively parallel exon capture and library-free resequencing across 16 genomes. Nat. Acknowledgements Methods 6, 315–316 (2009). This work was supported by the National Institutes of Health (R01AI107949, 23. Mvumbi, D. M. et al. Falciparum malaria molecular drug resistance in the R01AI139520, K24AI134990, R21AI121465, F30AI143172, U19AI089680). R.V is Democratic Republic of Congo: a systematic review. Malar. J. 14, 354 (2015). funded by a Skills Development Fellowship: this award is jointly funded by the UK 24. Leroy, D. et al. African isolates show a high proportion of multiple copies of Medical Research Council (MRC) and the UK Department for International Devel- the Plasmodium falciparum plasmepsin-2 gene, a piperaquine resistance opment (DFID) under the MRC/DFID Concordat agreement and is also part of the marker. Malar. J. 18, 126 (2019). EDCTP2 programme supported by the European Union. We would also like to thank 25. Nkoli Mandoko, P. et al. Prevalence of Plasmodium falciparum parasites everyone who participated in the studies and all members of the study teams in Ghana, resistant to sulfadoxine/pyrimethamine in the Democratic Republic of the Zambia, Uganda, and Tanzania. We would like to thank the DHS Program and Congo: emergence of highly resistant pfdhfr/pfdhps alleles. J. Antimicrob. USAID for the collection and access to samples from the DRC Demographic Health Chemother. 73, 2704–2715 (2018). Survey. 26. Baraka, V. et al. Impact of treatment and re-treatment with artemether- lumefantrine and artesunate-amodiaquine on selection of Plasmodium Author contributions falciparum multidrug resistance gene-1 polymorphisms in the Democratic R.V., O.A., N.F.B., J.A.B., and J.J.J. contributed to data analysis, writing and experimental Republic of Congo and Uganda. PLoS One 13, e0191922 (2018). design. O.J.W. contributed to data analysis and writing. N.J.H. and A.P.M. contributed 27. Ruh, E., Bateko, J. P., Imir, T. & Taylan-Ozkan, A. Molecular identification of software design. M.K.M, J.B.P., P.K.T, M.C., P.J.R., D.S.I., J.N., J.G., M.M., D.E.N., sulfadoxine-pyrimethamine resistance in malaria infected women who W.J.M., B.A.M., J.L.M.H., A.G., and A.K.T. contributed samples from studies conducted received intermittent preventive treatment in the Democratic Republic of at their sites and reviewed the paper. A.C.G. contributed to analysis design and reviewed Congo. Malar. J. 17, 17 (2018). the manuscript. P.W.M., K.T., T.F., and M.D. contributed laboratory analysis. S.R.M. 28. Mvumbi, D. M. et al. Molecular surveillance of Plasmodium falciparum contributed coordination with DRC investigators, experimental design, and writing. resistance to artemisinin-based combination therapies in the Democratic Republic of Congo. PLoS One 12, e0179142 (2017). 29. Taylor, S. M. et al. Absence of putative artemisinin resistance mutations Competing interests among Plasmodium falciparum in Sub-Saharan Africa: a molecular The authors declare no competing interests. epidemiologic study. J. Infect. Dis. 211, 680–688 (2015). 30. World Health Organization. Status Report on Artemisinin and ACT Resistance (WHO, 2017). Additional information 31. Malaria GEN Plasmodium falciparum Community Project. Genomic Supplementary information is available for this paper at https://doi.org/10.1038/s41467- epidemiology of artemisinin resistant malaria. eLife e08714 (2016). 020-15779-8. 32. Ministère du Plan et Suivi de la Mise en œuvre de la Révolution de la Modernité (MPSMRM), Ministère de la Santé Publique (MSP) et ICF Correspondence and requests for materials should be addressed to R.V. or J.J.J. International. Enquête Démographique et de Santé en République Démocratique du Congo (MPSMRM, MSP & ICF, 2014). Peer review information Nature Communications thanks Roberto Amato and the other, 33. Pickard, A. L. et al. Resistance to antimalarials in Southeast Asia and genetic anonymous, reviewer(s) for their contribution to the peer review of this work. Peer polymorphisms in pfmdr1. Antimicrob. Agents Chemother. 47, 2418 2423 (2003). reviewer reports are available.– 34. Abuaku, B. K. et al. Efficacy of Artesunate/Amodiaquine in the treatment of uncomplicated malaria among children in Ghana. Am. J. Trop. Med. Hyg. 97, Reprints and permission information is available at http://www.nature.com/reprints 690–695 (2017). 35. Ngondi, J. M. et al. Surveillance for sulfadoxine-pyrimethamine resistant Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in malaria parasites in the Lake and Southern Zones, Tanzania, using pooling published maps and institutional affiliations. and next-generation sequencing. Malar. J. 16, 236 (2017). 36. Tumwebaze, P. et al. Changing antimalarial drug resistance patterns identified by surveillance at three sites in Uganda. J. Infect. Dis. 215, 631–635 (2017). Open Access This article is licensed under a Creative Commons 37. McKenna, A. et al. The genome analysis Toolkit: a MapReduce framework for Attribution 4.0 International License, which permits use, sharing, analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 adaptation, distribution and reproduction in any medium or format, as long as you give (2010). appropriate credit to the original author(s) and the source, provide a link to the Creative 38. Van der Auwera, G. A. et al. From FastQ data to high confidence variant calls: Commons license, and indicate if changes were made. The images or other third party the Genome Analysis Toolkit best practices pipeline. Curr. Protoc. Bioinform. material in this article are included in the article’s Creative Commons license, unless 43, 11.10.1–33 (2013). indicated otherwise in a credit line to the material. If material is not included in the 39. DePristo, M. A. et al. A framework for variation discovery and genotyping article’s Creative Commons license and your intended use is not permitted by statutory using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011). regulation or exceeds the permitted use, you will need to obtain permission directly from 40. The Pf3K Project. Pf3k pilot data release 5, https://malariagen.net/data/pf3k-5 the copyright holder. To view a copy of this license, visit http://creativecommons.org/ (2016). licenses/by/4.0/. 41. Weir, B. S. & Cockerham, C. C. Estimating F-statistics for the analysis of population structure. Evolution 38, 1358–1370 (1984). 42. Malécot, G. The Mathematics of Heredity (W.H. Freeman, 1970). © The Author(s) 2020 10 NATURE COMMUNICATIONS | (2020) 11:2107 | https://doi.org/10.1038/s41467-020-15779-8 | www.nature.com/naturecommunications