TYPE Original Research PUBLISHED 01 June 2023 DOI 10.3389/fgene.2023.1071896 Performance of SNP barcodes to OPEN ACCESS determine genetic diversity and EDITED BY Charles Masembe, population structure of Makerere University, Uganda REVIEWED BY Plasmodium falciparum in Africa Kenji Hirayama, Nagasaki University, Japan Jaishree Raman, Dionne C. Argyropoulos  1, MunHua Tan  1, Courage Adobor2, National Institute of Communicable Diseases (NICD), South Africa Benedicta Mensah2, Frédéric Labbé  3, Kathryn E. Tiedje  1, Anders Björkman, 4 2 1 Karolinska Institutet (KI), Sweden Kwadwo A. Koram , Anita Ghansah  and Karen P. Day  * *CORRESPONDENCE 1Department of Microbiology and Immunology, Bio21 Institute and Peter Doherty Institute, The University Karen P. Day, of Melbourne, Melbourne, VIC, Australia, 2Department of Parasitology, Noguchi Memorial Institute for karen.day@unimelb.edu.au Medical Research, College of Health Sciences, University of Ghana, Accra, Ghana, 3Department of Ecology and Evolution, The University of Chicago, Chicago, IL, United States, 4Epidemiology Department, RECEIVED 17 October 2022 Noguchi Memorial Institute for Medical Research, University of Ghana, Accra, Ghana ACCEPTED 17 May 2023 PUBLISHED 01 June 2023 CITATION Argyropoulos DC, Tan MH, Adobor C, Mensah B, Labbé F, Tiedje KE, Koram KA, Panels of informative biallelic single nucleotide polymorphisms (SNPs) have been Ghansah A and Day KP (2023), proposed to be an economical method to fast-track the population genetic Performance of SNP barcodes to determine genetic diversity and analysis of Plasmodium falciparum in malaria-endemic areas. Whilst used population structure of Plasmodium successfully in low-transmission areas where infections are monoclonal and highly falciparum in Africa. related, we present the first study to evaluate the performance of these 24- and 96- Front. Genet. 14:1071896. doi: 10.3389/fgene.2023.1071896 SNP molecular barcodes in African countries, characterised by moderate-to-high transmission, where multiclonal infections are prevalent. For SNP barcodes it is COPYRIGHT © 2023 Argyropoulos, Tan, Adobor, generally recommended that the SNPs chosen i) are biallelic, ii) have a minor allele Mensah, Labbé, Tiedje, Koram, Ghansah frequency greater than 0.10, and iii) are independently segregating, to minimise bias in and Day. This is an open-access article the analysis of genetic diversity and population structure. Further, to be standardised distributed under the terms of the Creative Commons Attribution License and used in many population genetic studies, these barcodes should maintain (CC BY). The use, distribution or characteristics i) to iii) across various iv) geographies and v) time points. Using reproduction in other forums is haplotypes generated from the MalariaGEN P. falciparum Community Project permitted, provided the original author(s) and the copyright owner(s) are credited version six database, we investigated the ability of these two barcodes to fulfil these and that the original publication in this criteria in moderate-to-high transmission African populations in 25 sites across journal is cited, in accordance with 10 countries. Predominantly clinical infections were analysed, with 52.3% found to accepted academic practice. No use, distribution or reproduction is permitted be multiclonal, generating high proportions of mixed-allele calls (MACs) per isolate which does not comply with these terms. thereby impeding haplotype construction.Of the 24- and 96-SNPs, loci were removed if they were not biallelic and had low minor allele frequencies in all study populations, resulting in 20- and 75-SNPbarcodes respectively for downstreampopulationgenetics analysis. Both SNP barcodes had low expected heterozygosity estimates in these African settings and consequently biased analyses of similarity. Both minor and major allele frequencies were temporally unstable. These SNP barcodes were also shown to identifyweakgenetic differentiation across largegeographic distances basedonMantel Test and DAPC. These results demonstrate that these SNP barcodes are vulnerable to ascertainment bias and as such cannot be used as a standardised approach for malaria surveillance in moderate-to-high transmission areas in Africa, where the greatest genomic diversity of P. falciparum exists at local, regional and country levels. KEYWORDS malaria, high-transmission, molecular surveillance, population genetics, Pf6, single nucleotide polymorphisms, minor allele frequencies, ascertainment bias Frontiers in Genetics 01 frontiersin.org Argyropoulos et al. 10.3389/fgene.2023.1071896 1 Introduction Small molecular barcodes have been applied to evaluate changes in diversity and population structure of P. falciparum as a result of Plasmodium falciparum malaria remains a persistent threat for malaria interventions; detect geographic origins of infection, sub-Saharan Africa, where approximately 95% of total malaria cases whether local or imported; distinguish parasite clones from one and 96% of all malaria deaths occur (World Health Organisation, another, using neutral theory; as well as identify spatial 2022). With unprecedented rebounds in prevalence since 2016, differentiation between parasite populations. 24-SNP (Daniels made worse with the COVID-19 pandemic (World Health et al., 2008) and 96-SNP (Nkhoma et al., 2013) barcodes have Organisation, 2022), elimination targets that have been set to be been successfully deployed in low-transmission countries such as achieved by 2030 are ambitious. The potential contribution of those in Southeast Asia (Thailand (Daniels et al., 2008), Thai- molecular surveillance to determine changes in population Cambodia border (Nkhoma et al., 2013)), South America diversity and structure in routine monitoring and evaluation of (Charles et al., 2016), and also in areas of Africa having control and elimination strategies is a topic of active research, with a undergone intense malaria control programmes (Senegal (Daniels variety of approaches using putatively neutral variation or antigen- et al., 2008; Daniels et al., 2013; Daniels et al., 2015; Bei et al., 2018), encoding loci being explored. Ndirande, Malawi (Sisya et al., 2015) and Madagascar (Rice et al., In the microbiological world, P. falciparum presents a special 2016)). Other genome-wide SNP genotyping panels have been case in the use of these molecular surveillance methods for several successful to detect intercontinental (Neafsey et al., 2008) and reasons. There is a spectrum of population structures from clonal in within-country (Aydemir et al., 2018; Tessema et al., 2020; Verity epidemic settings, to highly diverse in the high burden countries of et al., 2020) population structure but require many more SNPs Africa. This is directly related to transmission intensity (Anderson (>500) for the same purpose. However, their utility in highly diverse et al., 2000). With frequent exposure to infected mosquitoes in moderate-to-high transmission settings, where the burden of moderate and high transmission settings, the majority of infections malaria remains the highest, has not been rigorously assessed. in humans contain multiple distinct P. falciparum genomes (ranging The immediate problem with the use of SNP barcodes on from 1 to 20 diverse genomes in a microlitre of blood), which can samples from moderate-to-high transmission settings is the high frequently recombine due to the obligatory sexual (meiotic) phase of prevalence of multiclonal infections and whether haplotypes can be the life cycle in the mosquito (Babiker et al., 1994; Paul et al., 1995). accurately constructed for population genetic analysis. This is Identifying markers that are informative, regardless of known as phasing and is more challenging with biallelic SNPs recombination intensity, which remain stable across time is (Chang et al., 2017; Zhu et al., 2018; Gerlovina et al., 2022), challenging, due to the high rate of genetic recombination in P. compared to more polyallelic microsatellite markers (Anderson falciparum populations (Escalante et al., 2015). Given the “many et al., 1999). The standard empirical solution in malaria epidemiologies of malaria” with associated diverse population population genetics (Anderson et al., 2000; Tessema et al., 2020) structures, the development and performance of molecular used by the originators of the 24-SNP barcode (Daniels et al., 2008) surveillance methods need to be evaluated in a range of is to use only single-clone infections with the consequence of transmission settings (see (Escalante and Pacheco, 2019) for an drastically reducing the numbers of loci and sample size for extensive review of population genetics in Plasmodium spp.). One analysis. Here we illustrate this point with an analysis of a 24- methodmay not be the solution for all malaria endemic areas nor for SNP barcode dataset of asymptomatic infections from a high- comparative studies. transmission malaria endemic region in Obuasi, Ghana “Molecular barcodes” of single nucleotide polymorphisms (Supplementary Material, ethics approval: CPN 11/04-05). In this (SNPs) have been proposed as a molecular surveillance tool and dataset, approximately 80% of infections were multiclonal, resulting heralded as the new frontier of malaria surveillance, revisiting in a median of 25%–33% of loci with mixed-allele calls (MACs) research in human, animal, and plant genetics almost 20 years (i.e., heteroallelic calls) per haplotype. These MACs severely limited ago (Syvänen, 2001; Vignal et al., 2002; Ohashi and Tokunaga, the number of isolates available for haplotype construction, 2003; Langridge and Chalmers, 2005). This has been prompted by necessary to perform population genetics analysis. Motivated by the needs of scientists in endemic countries for genotyping the difficulties in analysing the Obuasi dataset due to the high methods that can be used with standard laboratory equipment, prevalence of multiclonal infections, we decided to explore further at reasonable costs and without specialised skills. As malaria whether this issue was more widespread in other endemic areas in control and elimination interventions are actioned locally, it is Africa. We tested the suitability of two published SNP barcodes therefore imperative for analyses of genetic diversity and (Daniels et al., 2008; Nkhoma et al., 2013) to identify genetic population structure to be performed in-country (Vignal et al., diversity and population structure in 25 moderate-to-high 2002). SNPs are typically biallelic and the benefits of using SNPs transmission settings in Africa. include the abundance of annotated markers, low-scoring error It is recommended that SNP barcodes i) are biallelic, ii) have a rates, transferability of data across laboratories, the ability to minor (least frequent) allele frequency greater than 0.10 and iii) genotype neutral and non-neutral regions in the same run, and, independently segregating, so that genetic diversity and in contrast to multiallelic markers such as microsatellites, can population structure analyses are not biased. Further, for these largely be fully automated (Khlestkina and Salina, 2006). While barcodes to be standardised as a one-size-fits-all panel, they iv) microsatellites have been successfully used in moderate-to-high should work across a range of geographies and v) be temporally transmission, genotyping these markers are more laborious and stable. SNP genotypes of isolates obtained from the MalariaGEN cannot be fully automated. Therefore, we wish to evaluate P. falciparum community Project version 6 (MalariaGEN et al., whether SNP barcodes would be useful in these settings. 2021) were used to test these criteria in SNP barcodes across Frontiers in Genetics 02 frontiersin.org Argyropoulos et al. 10.3389/fgene.2023.1071896 Frontiers in Genetics 03 frontiersin.org TABLE 1 Epidemiological and Study Population Information. Genetic data were obtained for N = 2,317 isolates from the Pf6 MalariaGen repository and epidemiological metadata were obtained from study references as indicated in the table. Region Country Study Year Latitude Longitude Isolates Endemicity Transmission Malaria disease References location status West Benin Homel 2014 6.3607027 2.4381709 36 Moderate Double Peak Clinical Bertin et al. (2013) The Gambia Basse 2014 13.30944 −14.21925 81 High Seasonal Clinical Amambua-Ngwa et al. (2018) Brikama 2014 13.27479 −16.64092 42 Moderate Seasonal Clinical Amambua-Ngwa et al. (2018) Ghana Cape-Coast 2014 5.55602 −0.1969 100 High Perennial Clinical Kamau et al. (2015), Mensah et al. (2020) Kintampo 2012 8.0564 −1.72446 35 High Perennial Clinical Mensah-Brown et al. (2015) Navrongo 2009 10.885568 −1.086617 46 High Seasonal Clinical MalariaGEN et al. (2021) Navrongo 2010 10.885568 −1.086617 135 High Seasonal Clinical MalariaGEN et al. (2021) Navrongo 2011 10.885568 −1.086617 93 High Seasonal Clinical MalariaGEN et al. (2021) Navrongo 2012 10.885568 −1.086617 39 High Seasonal Clinical Duffy et al. (2015) Navrongo 2013 10.885568 −1.086617 241 High Seasonal Clinical Kamau et al. (2015) Navrongo 2015 10.885568 −1.086617 57 High Seasonal Clinical MalariaGEN et al. (2021) Guinea Faranah 2011 10.0438 −10.7351 37 High Perennial Clinical Mobegi et al. (2014) Nzerekore 2011 7.753857 −8.818703 112 High Perennial Clinical Mobegi et al. (2014) Mali Faladje 2013 13.1333 −8.3333 124 Moderate Seasonal Clinical Kone et al. (2013), Kone et al. (2020), Ghansah et al. (2014), Kamau et al. (2015) Nioro du Sahel 2014 15.23199 −9.58863 49 Moderate Unstable Clinical Duffy et al. (2018), Diakité et al. (2019) Central Cameroon Buea 2013 4.14638 9.245531 235 High Seasonal Clinical/ Apinjoh et al. (2015) Asymptomatic Democratic Republic of Kinshasa 2012 −4.36939 15.320977 171 High Double Peak Clinical Onyamboko et al. (2014) Congo (DRC) Kinshasa 2013 −4.36939 15.320977 108 High Double Peak Clinical Onyamboko et al. (2014) East Kenya Kisumu 2014 −0.0917 34.76796 34 High Perennial Clinical Ngalah et al. (2015), U.S. President’s Malaria Initiative (2015), U.S. President’s Malaria Initiative (2017), Laurent et al. (2018) Kombewa 2014 -0.1035 34.5183 26 High Perennial Clinical Ngalah et al. (2015), U.S. President’s Malaria Initiative (2015), U.S. President’s Malaria Initiative (2017), Laurent et al. (2018) Malawi Chikwawa 2011 -16.193575 34.7715 221 High Perennial Clinical Ocholla et al. (2014), Ravenhall et al. (2016) Zomba 2011 −15.3891 35.3292 33 High Perennial Clinical U.S. President’s Malaria Initiative (2012), Ravenhall et al. (2016) (Continued on following page) Argyropoulos et al. 10.3389/fgene.2023.1071896 African populations. We describe high levels of multiclonal infections and MACs that hindered accurate haplotype construction for population genetics analyses. Nonetheless, there was sufficient data to show haplotype variation with large-scale geographic distance across Africa. Whilst proven to be practical and meaningful in low-transmission settings with a high proportion of monoclonal infections, we suggest that other molecular surveillance methods, not restricted by these limitations, are needed to guide malaria control programmes in endemic settings characterised by moderate-to-high transmission in Africa. 2 Methods 2.1 MalariaGEN Africa P. falciparum dataset SNP genotypes in African countries were obtained from the MalariaGEN Plasmodium falciparum Community Project (version 6, https://www.malariagen.net/resource/26) (MalariaGEN et al., 2021), hereinafter referred to as the “Pf6 dataset”. All samples in the Pf6 dataset were obtained from blood samples from patients with P. falciparum malaria with informed consent from the patient or parent/ guardian with ethical approval as described in (MalariaGEN et al., 2021). Standard laboratory protocols were used to determine the DNA quantity and proportion of human DNA per sample (Manske et al., 2012; Miles et al., 2016). As P. falciparum samples were obtained from human blood samples, the parasite is in its haploid stage. Available metadata included the study ID, country, location and year that each isolate was collected. Isolates were filtered for the following criteria: i) used Whole Genome Sequencing library strategy, ii) passed the quality control (“QC pass”), and iii) sequencing was performed using the Illumina HiSeq 2000 paired- end sequencing platform (MalariaGEN et al., 2021). We used the term “study population” to represent isolates collected from the same location and year. From a total of 2,922 African isolates in the database, study populations that had greater than or equal to 25 isolates and were from study populations defined as moderate- or high- transmission by their respective study and, if not specified, defined by us using the World Health Organisation (WHO/GMP, 2017) were then selected to undergo further analysis (N = 2,317 isolates) (Table 1; Supplementary Figure S1). This threshold was used to minimise statistical bias while maximising the number of populations included in the study (Pruett and Winker, 2008; Hoban and Schlarbaum, 2014; Flesch et al., 2018; Qu et al., 2020). These isolates were sampled across 10 countries from 25 study populations in West Africa (Benin, The Gambia, Ghana, Guinea, and Mali), Central Africa (Cameroon and DRC), and East Africa (Kenya, Malawi, and Tanzania) (Figure 1). Supplementary Figure S1 outlines the inclusion/exclusion criteria used to filter isolates and SNP loci to generate final datasets for downstream analyses. 2.2 Description of SNP barcodes To be able to understand the genetic diversity and population structure of each parasite isolate and test whether small panels or “barcodes” provide enough information, we chose to analyse Frontiers in Genetics 04 frontiersin.org TABLE 1 (Continued) Epidemiological and Study Population Information. Genetic data were obtained for N = 2,317 isolates from the Pf6 MalariaGen repository and epidemiological metadata were obtained from study references as indicated in the table. Region Country Study Year Latitude Longitude Isolates Endemicity Transmission Malaria disease References location status Tanzania Mkuzi-Muheza 2013 −5.241083 38.82872 145 High Seasonal Clinical Baraka et al. (2015) Muleba 2013 −1.750317 31.61992 52 Moderate Double Peak Clinical West et al. (2013), Baraka et al. (2015) Nachingwea 2013 −10.36795 38.75465 65 High Seasonal Clinical Baraka et al. (2015) Argyropoulos et al. 10.3389/fgene.2023.1071896 FIGURE 1 Map of countries and locations in the Pf6 database from Africa included in this study. 2,317 isolates were chosen from locations per year where there was a minimum of 25 isolates (see Methods). Colours indicate the country that isolates were obtained from, and diamonds indicate the specific regions that individuals were sampled with P. falciparum infections. Themap is segregated into three regions: West Africa (green hues; n = 5), Central Africa (blue hues; n = 2), and East Africa (red hues; n = 3). Latitude/Longitude coordinates for study locations were obtained from the MalariaGEN Plasmodium falciparum community Project version 6 (MalariaGEN et al., 2021) isolate study and metadata. published 24- and 96-SNP barcodes (Daniels et al., 2008; Nkhoma polymorphic for parasites from the Thai-Burma border, et al., 2013) that were found to successfully work in low- assayable, not in genes encoding surface proteins (e.g., var, rifin, transmission settings. surfin, stevor), transporters or telomeric genes that may be under strong selection, were distributed across all 14 chromosomes and 2.2.1 24-SNP barcode were found to have MAFs between 0.10 and 0.50. No formal linkage We mined each parasite isolate genome for their genotype at or neutrality analysis was reported in regard to the generation of the the 24 genome-wide SNPs in the “molecular barcode” Taq-Man SNP barcode. The 96-SNP panel was used to analyse genetic assay as described (Daniels et al., 2008). Briefly, Daniels et al. diversity and population structure of asymptomatic and clinical (2008) first genotyped over 2,100 SNPs that were discovered isolates from pregnant women and children younger than 5 years through comparative genome sequencing (Volkman et al., 2007) old at the Thai-Burma border (N = 1,731) from 2001 to 2010 before developing a panel of 24-SNPs that were found to be (Nkhoma et al., 2013). biallelic, with a high minor allele frequency (MAF >0.35), and had a conserved region around the SNP to design locus-specific primers for amplification (i.e., type-able for genotyping). These 2.3 Genotype extraction from the 24-SNPs were also chosen as they were unlinked and Pf6 database independently segregating from each other as determined by linkage disequilibrium analysis. These 24-SNPs were verified to Published positions of the 24- and 96-SNP barcodes (Daniels detect genetic diversity and population structure of 22 and et al., 2008; Nkhoma et al., 2013) were based on versions 5.0 and 16 clinical isolates from Senegal and Thailand, respectively 6.2 of the P. falciparum 3D7 genome on PlasmoDB (Bahl et al., (Daniels et al., 2008). 2003), respectively (Supplementary Table S1). Variants in the Pf6 database were called through read mapping to the P. 2.2.2 96-SNP barcode falciparum 3D7 v3 reference genome (see Methods in We also examined each parasite isolate genome for their (MalariaGEN et al., 2021)). Using blastn (Altschul et al., 1990), genotypes at the 96 SNPs in a genome-wide panel using the we aligned sequences containing the SNP loci of interest to the Illumina GoldenGate platform as described (Nkhoma et al., Pf3D7 v3 reference genome to obtain their corresponding positions 2013). These SNPs were gleaned from PlasmoDB version 6.2 in the Pf6 dataset. Genotypes with read depths of five or greater were (www.plasmodb.org) and were chosen if they were highly retained (read depth, DP ≥ 5). In addition, alleles were only included Frontiers in Genetics 05 frontiersin.org Argyropoulos et al. 10.3389/fgene.2023.1071896 if supported by at least two reads (allelic depth, AD ≥ 2) or 5% of (i.e., “dominant allele”method) attempts to include both mono- and reads for genotypes with higher read depths (DP > 50) (Hamilton multiclonal infections (N = 2,317) in analyses by constructing the et al., 2019). Alleles for a locus were excluded if they were single “dominant” haplotype for each isolate that has a MAC. This nucleotide insertions or deletions (indels), as they are strictly not artificially generates a monoclonal infection for all genotypes. A defined as SNPs (Khlestkina and Salina, 2006). This excluded “dominant” allele was defined as an allele call with the highest 0.0014% (N = 1) and 0.0036% (N = 10) of alleles in the filtered number of supporting reads (i.e., higher AD) per SNP locus using 24- and 96-SNP datasets, respectively. the ratio of AD (dividing the larger AD by the smaller AD). For loci where both alleles were supported equally (i.e., AD ratios = 1), an allele was selected at random to complete the construction of 2.4 Addressing multiple P. falciparum haplotypes without MACs (Manske et al., 2012). Higher AD ratio infections values (i.e., AD ratios >1) indicated that one allele had more supporting reads than the other. 2.4.1 Defining monoclonal and multiclonal P. The second more “conservative” method removes all falciparum infections multiclonal infections, as defined by FWS, retaining only To determine the clonality of infections, we obtained data on the monoclonal infection data for subsequent analysis (N = 1,105). within-host inbreeding index (FWS) for each isolate from the The percentage of data loss for the latter method was calculated as Pf6 dataset (MalariaGEN et al., 2021). This metric estimates the the number of multiclonal infections divided by the total number of allele frequency of parasites within an individual isolate (HW) infections per study population. relative to the allele frequency within the total parasite population (HS) using the read count for each locus in the Pf6 dataset. FWS is presented as a proportion that ranges from 2.5 Using the performance criteria to analyse 0 to 1, where FWS values closer to 1 indicate high inbreeding rates SNP barcodes (less genetically diverse) and lower FWS values indicate low inbreeding rates (more diverse/mixed genotypes) in the parasite Performance of the 24- and 96-SNP molecular barcodes were population. An infection is said to predominantly contain a single analysed to estimate genetic diversity and population structure as genotype when FWS ≥ 0.95 (Manske et al., 2012; Mobegi et al., 2014; described below. Duffy et al., 2018; Amambua-Ngwa et al., 2019; Amegashie et al., 2020). Based on this,N = 1,105 isolates were found to predominantly 2.5.1 Minor allele frequency (MAF) calculation have a monoclonal infection (Figure 2A). To maintain study MAFs are central to analyses using SNP data and is therefore population sizes ≥25, nine study populations with <25 isolates important to accurately estimate. Subsequent to our investigation of were removed from analysis, resulting in N = 956 isolates methods for handling multiclonal infections that found the (Supplementary Figure S1). conservative approach (Anderson et al., 2005; Taylor et al., 2017; Amegashie et al., 2020; Han et al., 2022) as the more stringent and 2.4.2 Mixed-allele calls (MACs) reliable method, MAFs in downstream analyses were estimated To determine whether multiclonal infections could be used using only monoclonal infections. A custom R script was used to for downstream population genetics analyses, we needed to calculate the MAFs according to the genotype data that was input ensure constructed multilocus haplotypes did not include (available on GitHub at: https://github.com/UniMelb-Day-Lab/ more than 5% of the barcode with mixed-allele calls (MACs, SNP_MinorAlleleFreq). In short, MAFs for each locus were reported as “N” in other studies e.g. (Daniels et al., 2008)). calculated by removing MACs from the numerator and Including haplotypes with many MACs would consequently denominator to reduce bias. This custom script generates a table introduce a high degree of uncertainty into each haplotype describing in each row a locus with the number of isolates with data, and affect subsequent results. Further, in studies where whole- the number of MACs, the major and minor alleles, and the minor genome sequence data is not available, the clonality of isolates is allele frequency calculated. Because samples were haploid, Hardy- determined by the percentage of MACs for an isolate. Isolates in Weinberg Equilibrium was not applicable in this study. which more than one allele was observed for greater than or equal to 5% of loci are conventionally termed as multiclonal infections, 2.5.2 Spatial analysis of MAFs and monoclonal infections are those with less than 5%, e.g., A MAF <0.10 indicated that a locus was not representative (Daniels et al., 2008; Rice et al., 2016). We therefore kept a tally of and that alleles were moving towards fixation in the population, the number of MACs per locus to understand the genetic while a MAF ≥0.10 indicated that the locus can discriminate complexity per locus and if it was evenly distributed. The between isolates in the population. As MAFs impact the Pearson’s correlation coefficient was calculated using the inference of population structure (Anderson et al., 2005), function “cor.test” in the R package “stats” v. 3.6.2, to test the MAFs were analysed by region, country and study population association between MACs and FWS. (study location per year). Four loci were removed from the 24- SNP panel and 21 loci were removed from the 96-SNP panel as 2.4.3 Investigating two approaches to handling they were not strictly biallelic and/or had MAFs <0.10, resulting multiclonal infections in 20-SNP and 75-SNP barcodes analysed downstream for We tested two common methods of accounting for multiclonal informative population genetics analyses (Supplementary infections in SNP or whole-genome data analysis. The first approach Figure S1). Frontiers in Genetics 06 frontiersin.org Argyropoulos et al. 10.3389/fgene.2023.1071896 FIGURE 2 Clonality of infections inAfrican studypopulationswithmoderate-to-highmalaria transmission. (A)Within-host diversity usingwithin-host inbreeding index (FWS). The dotted red line indicates the FWS ≥ 0.95 threshold below which isolates were considered to have diverse multiclonal infections. The country for the study population is indicated for reference on the top with total number (N) of isolates represented for that country. (B, C) Correlation between FWS and the proportion of mixed-allele calls (MACs) per isolate for the (B) 24 SNPbarcode and (C)96 SNPbarcode. Each dot represents one isolate per study population (location by year). For thebothbarcodes, the FWS andMACsper isolatewere significantly negatively correlated (Pearson’s correlationcoefficient (r) andp-value are shown). (D)Positively- skewed distributions of allelic depth ratios (AD ratio) from exploring the potential use of the “dominant allele”method. AD ratios close to one indicate approximately similar read coverage for both alleles whereas large AD ratio values represent a substantial difference in read coverage for two alleles. Numbers above the box plots represent thenumberof genotypeswithMACsconsidered in thesecalculations.Horizontal central solid line represents themedian, thebox represents the interquartile range (IQR) from the 25th to 75th percentiles, thewhiskers indicate themost extremedata point, which is nomore than 1.5 times the interquartile range from the box, and thedots show theoutliers. (E)Data loss in all studypopulations fromusing the “conservative”approachof excludingmulticlonal infections. Total numberof isolates per study population with monoclonal and multiclonal infections are shown as light and dark orange bars, respectively. Data is separated by study population (study location by year) and values above each bar indicate the percent of data lost when removing multiclonal infections (FWS < 0.95) from the analysed datasets. Frontiers in Genetics 07 frontiersin.org Argyropoulos et al. 10.3389/fgene.2023.1071896 2.5.3 Testing multilocus association within SNP differentiation between groups and was calculated using the R barcodes package “adegenet” v. 2.1.5 (Jombart, 2008). The DAPC can detect The standardised index of association (rd) was used to estimate the population structure below a threshold detectable by FST, providing an extent of multilocus linkage disequilibrium (LD) i.e., the non-random estimate of how much data was required to find population structure association of alleles (Agapow and Burt, 2001), across the 20- and 75- given genetic differentiation in the population (Patterson et al., 2006). SNP barcodes. Pairwise rd was calculated to determine whether any Pairwise distance matrices (i.e., PCA) were first built from evaluating patterns of LD were due to any pairs of SNP loci, or if there were any the proportion of SNPs that had different alleles for two isolates. The significantly associated pairs of loci masked by an overall LD. If the SNP outputs were a series of uncorrelated eigenvectors (principal loci were putatively neutral, thenmultilocus LDwould provide evidence components) that determined the directionality of space in the PCA of past and/or current selection on the local parasite population plot, and eigenvalues that determined the magnitude or variation of (Ruybal-Pesántez et al., 2017a). The rd and pairwise rd among loci genetic diversity along the axis. Eigenvalues greater than one accounted were estimated using a Monte Carlo simulation method of for more variance than one of the original variables in the data. 999 samplings, where alleles were reshuffled at random among Discriminant analyses of these matrices identified the contribution haplotypes, using the R package poppr v. 2.7.1 (Kamvar et al., of alleles to possible clusters that may have been driving genetic 2014). To calculate rd and pairwise rd, only isolates with complete differentiation between populations. Ellipses were drawn that infection haplotypes (i.e., no missing data) were used so that the contained 95% of the genotypes per population. The term permutation analysis shuffled the alleles per haplotype without bias. “discriminant function” (DF) was used to explain the principal components input to calculate the DAPC. Plots of DF eigenvalues 2.5.4 Genetic diversity and the contribution of each allele to explain population structuring In order to calculate genetic diversity estimates of the number of were generated using “adegenet” for each corresponding DAPC multilocus haplotypes (h), expected heterozygosity (He) and population (Jombart, 2008). genetics analyses,MACswere replacedwith no value (“NA”), hereinafter known as the “cleaned monoclonal infections” dataset (Supplementary 2.5.7 Genetic similarity Figure S1). The mean values of h and He were calculated for both To identify finer-scale levels of structure without geospatial location barcodes across each region, country and study population using the data for each individual isolate, we calculated the pairwise allele sharing “cleaned monoclonal infection” haplotypes (where MACs were (PAS) score for isolates within each study population for each barcode removed) via R package “poppr” v. 2.7.1 (Kamvar et al., 2014). using the “completemonoclonal infections” haplotypes with nomissing data. PAS is an identity-by-state (IBS) measure of genetic similarity that 2.5.5 Allelic differentiation by locus and over spatial can be used across relatively few loci and was calculated as the number scales of alleles shared between two multilocus haplotypes (NAB) divided by Pairwise population genetic distances across each population the number of SNP loci (NL) (PAS = NAB/NL) (Ruybal-Pesántez et al., scale were determined by Weir and Cockerham’s FST using the R 2017a; Argyropoulos et al., 2021). The PAS score characterised variation package “hierfstat” v. 0.5-10 (Winter 2012). FST is a measure of in multilocus haplotypes from clones (PAS = 1.0) to genetically the extent an allele is fixed between populations (Jost et al., 2018), dissimilar (PAS ≤ 0.25) (Argyropoulos et al., 2021). Larger-scale and was calculated as the proportion of allelic variance between genomic measures like identity-by-descent (IBD) are performed for loci for the 20- and 75-SNP genotypes. FST values range from 0 to larger genome sequences (a minimum of 200 biallelic loci) to infer 1, where values close to 1 indicated that populations were fixed similarity or “relatedness” over a range of DNA segments (Henden for different alleles, while values close to 0 denote that allele et al., 2018; Schaffner et al., 2018; Taylor et al., 2019) and therefore were frequencies were identical in both populations. Pairwise FST was unable to be pursued. used to calculate estimates of allele differentiation between pairs of regional, country and study population levels. Only cleaned 2.5.8 Temporal analysis of genetic diversity and multilocus haplotypes (i.e., with no missing data) were used to similarity calculate FST and pairwise FST, referred to as the “complete Study locations with isolate data in more than one time point were monoclonal infections” dataset (Supplementary Figure S1), used to investigate whether the SNP loci in each panel were able to be which resulted in 653 and 690 isolates for the 20- and 75-SNP used longitudinally. Temporal data using the “cleaned monoclonal barcodes respectively. A Mantel test was calculated using the R infections” dataset were available for Navrongo, Ghana (2010, 2011, package “vegan” v. 1.3.3 (Oksanen et al., 2020) with 999 iterations and 2013) and Kinshasa, DRC (2012 and 2013) (Supplementary Figure to evaluate the relationship between geographic distance (latitude S1). MAFs and He were compared within each study location over time and longitude, Table 1) and genetic divergence (pairwise FST). using the cleaned monoclonal infections data. The function “Hs.test” in Given that there were only three regions to generate a matrix, “adegenet”was used to test the difference inHe between two time points comparisons between regions could not be performed. (x and y) using the equation He(x) - He(y) using 999 Monte-Carlo test simulations (Jombart, 2008). Subsequent analysis of variation of loci on 2.5.6 Population structure analysis chromosome 7 of the 20-SNP barcode led to a closer investigation with Population differentiation between study populations, countries, its association to a known gene under selection, Plasmodium falciparum and regions was evaluated by discriminant analysis of principal chloroquine resistance transporter (pfcrt), which may be in close components (DAPC) using the “complete monoclonal infections” proximity to these SNP loci. We obtained data on the drug resistance dataset of the 20- and 75-SNP barcodes (Supplementary Figure S1). classification (sensitive/resistant/undetermined) and marker genotypes DAPC is a multivariate method that aims to summarise genetic for each isolate from the Pf6 dataset (MalariaGEN et al., 2021). Frontiers in Genetics 08 frontiersin.org Argyropoulos et al. 10.3389/fgene.2023.1071896 Resistance against chloroquine (CQ) and other 4-aminoquinolines, predominantly contained a single genome.We showed that 52.3% of including the artemisinin drug combination amodiaquine (AQ), is overall infections were found to be multiclonal and that these primarily governed by the K76T mutation in pfcrt on chromosome 7. multiclonal infections dominated in most study populations A chi-squared test (χ2) was used for univariate analyses of discrete (Figure 2A). Similarly, more than half of infections in the 24- variables to compare proportions. PAS were compared between study SNP barcode (53.0%) and the 96-SNP barcode (56.4%) had more locations over time using the “complete monoclonal infections” data. As than 5% mixed-allele calls (MACs) in a haplotype (Supplementary such, isolates from Navrongo 2011 were removed as there Table S4), which is the threshold typically used to determine were <25 complete infection haplotypes for analysis. A non- clonality of infections (see Methods). Comparisons of FWS values parametric Wilcoxon rank-sum test was used to compare the PAS to proportions of MACs for each same infection revealed a between two time points in Base R v. 3.5.0 (R Core Team, 2018). significant negative correlation between the two metrics for both barcodes (24-SNP: r = -0.948 [95% CI: −0.952, −0.944], p < 0.001; 96-SNP: r = -0.977 [95% CI: −0.979, −0.975], p < 0.001), 2.6 Statistical tests demonstrating that proportions of MACs in an isolate’s haplotype is a reliable predictor of within-host diversity for an All statistical analysis were carried out in R (R Core Team, 2018) isolate for cases where FWS is unavailable (Figures 2B,C). implemented in RStudio v. 1.1.383 (RStudio Team, 2015) with Base Given that such large proportions of infections in all study R and the R package “tidyverse” v. 1.3.1 (Wickham et al., 2019) for populations were reported as multiclonal, we further explored data curation and visualisation. A test was deemed statistically two prevailing approaches that have been used in the literature to significant if the p-value was <0.05. either include or exclude multiclonal infections in downstream analyses (see Methods for detailed descriptions of both approaches). For the “dominant allele” method, distributions 3 Results of AD ratios were both positively skewed for both barcodes (Figure 2D). The median of AD ratios for genotypes with 3.1 Description of the Pf6 database study MACs of the 24-SNP and 96-SNP barcode was 2.63 (IQR: populations and epidemiology 1.58-4.68) and 2.62 (IQR: 1.59-4.75), respectively, indicating that most MACs were due to alleles that were found in The availability of SNP genotypes in the Pf6 database allowed us to approximately similar proportions. Given this result, the use test the performance of the 24- and 96-SNP barcodes to examine of this “dominant allele” method potentially introduces population diversity and structure. There were 2,922 isolates sampled in uncertainty in downstream calculations of MAF as the Africa thatmet the selection criteria (seeMethods). Of these, haplotypes assignments of most alleles would be at random or possibly were generated for 2,317 (79%) isolates from 25 study populations confounded by systematic biases in read coverage. This poses (study location by year) across 10 moderate-to-high transmission the risk of reconstructing inaccurate haplotypes for the majority countries in Africa. Study population sample sizes varied from of infections. 26 isolates (Kombewa, Kenya, 2014) to 235 isolates (Buea, Consequently, we chose to perform all subsequent analyses Cameroon, 2013) (Table 1). There were seven study populations using the “conservative” approach of excluding isolates with across three countries in East Africa (Kenya, Malawi, and Tanzania), multiclonal infections (FWS < 0.95). While this approach ensured three study populations across two countries in Central Africa a higher confidence in the constructed haplotypes, the result was a (Cameroon and DRC), and the remaining 15 study populations reduction in the total number of isolates from 2,317 to 1,105 across five countries in West Africa (Benin, The Gambia, Ghana, (Supplementary Figure S1). When inspected by study Guinea, and Mali) (Table 1; Figure 1). The number of isolates, populations, the exclusion of multiclonal infections resulted in MACs, major and minor alleles, and minor allele frequencies data loss for every study population analysed (Figure 2E). The (MAFs) were generated per locus for each study population smallest reduction in the number of isolates was observed for (Supplementary Tables S2, 3 for the 24- and 96-SNP barcodes, Homel 2014, Benin (30.6% of infections) whereas the largest respectively). Malaria transmission in these study populations was reduction in the number of isolates was reported for Navrongo predominantly seasonal and year-round (perennial), with few 2013, Ghana (64.3% of infections). populations exhibiting double peak (two higher-transmission seasons) and unstable (large variation year-to-year) transmission (Table 1). All isolates in these studies were obtained from clinical 3.3 Criteria I and II: low minor allele malaria cases across all ages, from newborns to above 65 years old, with frequencies (MAFs) and non-biallelic nature only one study collecting additional data from individuals across all ages of multiple SNP loci resulted in reduced with asymptomatic malaria infections (Table 1). barcode sizes and lower expected heterozygosity 3.2 Majority of overall infections in African Themonoallelic, triallelic, andmultiallelic loci observed in >70% study populations were multiclonal of the study populations were removed from downstream analysis, resulting in 20-SNP and 81-SNP barcodes (Supplementary Table We investigated the clonality of infections using the within-host S5). See Supplementary Results section 1.2.2 for a detailed inbreeding index, FWS, where values ≥0.95 indicated that infections description of the observed polymorphisms in the two molecular Frontiers in Genetics 09 frontiersin.org Argyropoulos et al. 10.3389/fgene.2023.1071896 barcodes. Using the “cleaned monoclonal infections” dataset, six loci 3.5 Criteria IV: genetic differentiation over in the 81-SNP barcode had MAFs below 0.10 (Supplementary Table geographic space found to be consistent S6), indicating that these loci would not be informative to with isolation-by-distance despite high differentiate isolates from each other in the population. These genetic similarity (PAS) loci were removed from downstream analyses, resulting in a 20- SNP and 75-SNP barcode, respectively. Table 2 shows MAFs by We investigated the level of allelic differentiation by calculating region, country, and study population. The median MAF across all pairwise Weir and Cockerham’s FST between regions, countries, and loci was 0.352 (IQR: 0.254-0.422) and 0.333 (IQR: 0.234-0.419) for study populations using the complete monoclonal infections dataset the 20- and 75-SNP panels, respectively, across all 25 study (i.e., no missing data, Supplementary Figure S1). Overall, FST was low populations, and was similar across all regions, countries, and for each locus for the 20-SNP (mean FST: 0.0165) and 75-SNP (mean study populations for both barcodes (Table 2). FST: 0.00339) barcodes (Supplementary Table S10) and pairwise FST There were 96.8% and 97.2% unique multilocus haplotypes (h) values were very low across regions, countries, and study populations observed in the 20-SNP and 75-SNP molecular barcodes, per SNP barcode (Supplementary Figure S3). The greatest genetic respectively, using the “cleaned monoclonal infections” dataset differentiation was between East and West Africa (20-SNP: FST = for all locations (Table 2; Supplementary Figure S2). Of the 0.0026, 75-SNP: FST = 0.0046) at the regional-level, between Guinea haplotypes that were repeated, they were only found in two or and Tanzania (20-SNP: FST = 0.0078) and Malawi and Cameroon (75- three isolates for both barcodes in Basse 2014, The Gambia and SNP: FST = 0.0080) at the country-level, and between Nzerekore 2011 Mkuzi-Muheza 2013, Tanzania (Supplementary Figure S2). Despite (Guinea) and Mkuzi-Muheza 2013 (Tanzania) (20-SNP: FST = 0.0087) finding many unique haplotypes, the mean expected heterozygosity and Chikwawa 2011 (Malawi) and Buea 2013 (Cameroon) (75-SNP: (He) was low when using both barcodes (20-SNP: He = 0.433; 75- FST = 0.0080) at the study population level (Supplementary Figure S3). SNP: He = 0.432) (Table 2) and did not vary between regions Genetic and geographic variation were found to be positively correlated, (Kruskal–Wallis: p = 0.368; p = 0.368), countries signifying that genetic variation increased across greater geographic (Kruskal–Wallis: p = 0.444; p = 0.444) nor study populations distance and vice versa, by country (20-SNP: Mantel: r = 0.373, p = (Kruskal–Wallis: p = 0.451; p = 0.451). This is best explained by 0.038, Figure 3A; 75-SNP: Mantel: r = 0.794, p = 0.006; Figure 3B) and the low minor allele frequencies for individual loci per barcode by study population (75-SNP: Mantel: r = 0.657, p < 0.001, Figure 3D), across the continent. consistent with a pattern of isolation-by-distance, except for the 20-SNP barcode at the study population level (Mantel: r = -0.068, p = 0.661, Figure 3C). 3.4 Criteria III: overall, loci in the 20- and 75- DAPC was used to explore the extent of population structure of SNP barcodes were found to be P. falciparum across the African continent using “complete independently segregating from each other monoclonal infection” haplotypes. All principal components of the PCA were retained during the preliminary variable The standardised index of association was used to assess transformation which accounted for 100% of the total genetic multilocus linkage disequilibrium (LD), or non-random variability. Genetic structure was captured by the first two DFs associations among SNP loci, using “complete monoclonal for the 20-SNP (Figure 3E, inset) and 75-SNP (Figure 3F, inset) infection” haplotypes with no missing data. Overall, there was barcodes. The first DF separates West and East Africa, and the no evidence of linkage disequilibrium for both SNP barcodes (rd: second DF separates Central Africa from West and East Africa. The p < 0.05, Supplementary Table S7). However, at the regional-, same patterns were reflected when the DAPCs were calculated with country- and study population scale, there was significant LD prior information for the country and study population per isolate when using the 20-SNP barcode in Basse 2014 (The Gambia), for the 20-SNP (Supplementary Figures 4A, B) and 75-SNP Cape-Coast 2014 and Navrongo 2013 (Ghana), and when using (Supplementary Figures 4C, D) barcodes. We observed a sharp the 75-SNP barcode in Basse 2014 (The Gambia), Navrongo decrease in DFs when calculating DAPC by country and study 2010 and 2013 (Ghana), Kinshasa 2013 (DRC) and Mkuzi- populations for both barcodes, but ellipses were removed due to high Muheza 2013 (Tanzania) (Supplementary Table S7). For the overlap, indicating that smaller scale structure was not as easily 20-SNP barcode, significant pairwise rd values (p < 0.05) were identifiable (Supplementary Figure S4). found in 63 pairs of loci across all populations; the most To understand local population structure, we calculated the common pairs were Pf3D7_02_v3_842805 vs. Pf3D7_10_v3_ genetic similarity of barcode haplotypes within the same study 1402510, and Pf3D7_07_v3_628392 vs. P3D7_10_v3_82375 that population using PAS scores, an IBS method. To minimise bias, were observed in only 3/63 pairs (4.76%) each (Supplementary “complete monoclonal infection” haplotypes with no missing data Table S8). For the 75-SNP barcode, significant pairwise rd (p < were used to generate PAS scores (Supplementary Figure S1). Across 0.05) was found in 1,179 pairs of loci across all populations, with each study population, using both 20- and 75-SNP barcodes, we the most common pair, Pf3D7_06_v3_1184506 vs. Pf3D7_06_ found that the majority of infection haplotypes shared more than v3_1206498, found in only 12/1,179 pairs (1.02%), indicating 50% of their alleles (20-SNP: median PAS = 0.550; 75-SNP: median weak evidence of physical linkage of two markers on PAS = 0.573) (Figure 4; Supplementary Table S11). Using the 75-SNP chromosome 6 (Supplementary Table S9). Overall, there was barcode, we saw an absence of isolate pairs that did not share any no evidence of prevalent LD when using the 20- and 75-SNP alleles (i.e., PAS = 0) and very few (8.9%) sharing up to 50% of alleles barcodes in these populations. (0.2 ≤ PAS < 0.5) (Supplementary Figure S12). Frontiers in Genetics 10 frontiersin.org Argyropoulos et al. 10.3389/fgene.2023.1071896 TABLE 2 Patterns of P. falciparum genetic diversity of monoclonal infections in African study populations in Pf6 database for the 20- and 75-SNP barcodes. N h He MAFs Population 20-SNP 75-SNP 20-SNP 75-SNP 20-SNP 75-SNP West Africa 0.425 0.429 Benin 0.432 0.421 0.320 [0.270–0.410] 0.320 [0.208–0.400] Homel 2014 25 25 25 0.432 0.421 0.320 [0.270–0.410] 0.320 [0.208–0.400] The Gambia 0.411 0.433 0.326 [0.186–0.452] 0.333 [0.255–0.431] Basse 2014 51 45 45 0.411 0.433 0.326 [0.186–0.452] 0.333 [0.255–0.431] Ghana 0.422 0.427 0.345 [0.258–0.423] 0.337 [0.240–0.419] Cape-Coast 2014 58 57 57 0.422 0.417 0.362 [0.272–0.448] 0.293 [0.198–0.400] Navrongo 2010 69 69 69 0.426 0.428 0.350 [0.247–0.407] 0.348 [0.261–0.398] Navrongo 2011 39 39 39 0.408 0.439 0.315 [0.250–0.433] 0.359 [0.266–0.436] Navrongo 2013 86 84 84 0.421 0.422 0.345 [0.262–0.384] 0.349 [0.238–0.424] Guinea 0.407 0.432 0.327 [0.239–0.458] 0.373 [0.239–0.436] Nzerekore 2011 59 58 59 0.407 0.432 0.327 [0.239–0.458] 0.373 [0.239–0.436] Mali 0.436 0.429 0.381 [0.226–0.421] 0.355 [0.230–0.419] Faladje 2013 62 62 62 0.435 0.427 0.377 [0.280–0.407] 0.355 [0.232–0.419] Nioro du Sahel 2014 31 31 31 0.442 0.436 0.392 [0.226–0.452] 0.355 [0.226–0.419] Central Africa 0.439 0.431 Cameroon 0.446 0.425 0.353 [0.287–0.429] 0.336 [0.234–0.408] Buea 116 113 112 0.446 0.425 0.353 [0.287–0.429] 0.336 [0.234–0.408] Democratic Republic of Congo (DRC) 0.430 0.430 0.345 [0.261–0.417] 0.336 [0.247–0.429] Kinshasa 2012 73 72 72 0.436 0.430 0.388 [0.268–0.431] 0.356 [0.243–0.434] Kinshasa 2013 49 47 48 0.420 0.429 0.310 [0.261–0.366] 0.327 [0.265–0.418] East Africa 0.436 0.420 Malawi 0.421 0.416 0.358 [0.244–0.409] 0.310 [0.228–0.409] Chikwawa 2011 88 86 86 0.421 0.416 0.358 [0.244–0.409] 0.310 [0.228–0.409] Tanzania 0.441 0.419 0.354 [0.259–0.411] 0.312 [0.198–0.406] Mkuzi-Muheza 2013 91 82 83 0.436 0.415 0.380 [0.201–0.426] 0.333 [0.222–0.418] Muleba 2013 27 26 26 0.436 0.415 0.321 [0.259–0.416] 0.333 [0.222–0.434] Nachingwea 2013 32 32 32 0.439 0.429 0.344 [0.281–0.406] 0.312 [0.188–0.375] Total 956 925 929 0.433 0.432 0.352 [0.254–0.422] 0.333 [0.234–0.419] h = multilocus haplotypes; He = mean expected heterozygosity; MAF = minor allele frequency. He andMAF are provided for each study population, country and region;MAF are presented as medians with interquartile ranges (IQRs). Bold values signify the rows that correspond to regions and countries. 3.6 Criteria V: temporal analysis found an interchanging bases spread across 10 of the 14 chromosomes interchange of major and minor alleles for (Figure 5B). Therefore, we analysed the level of drug resistance many loci and dynamic PAS scores marker pfcrt that is also found on chromosome 7, moderated by the K76T mutation, in the two study locations over time. For Given the likelihood of high outcrossing in these moderate- pfcrt, 59.3% of the overall African population included in this to-high transmission settings, we investigated the trends of the study had the sensitive K76 allele (Figure 5C). Over time, we SNP barcodes over time. We analysed the two study locations observed near fixation of this allele in Navrongo and Kinshasa with available temporal data: Navrongo, Ghana (2010, 2011 and (Figure 5D). Only the Pf3D7_07_v3_435497 ‘A’ allele was 2013) and Kinshasa, DRC (2012 and 2013). Firstly, we used the significantly related to the prevalence of CQ sensitivity in “cleaned monoclonal infections” dataset to investigate whether Navrongo and Kinshasa (χ2: p < 0.001, Supplementary Table the genetic diversity was stable over time. The mean He values S13) and should be reconsidered for population genetic analyses were not significantly different over time (Hs test: p > 0.05, using neutral theory. Table 3). MAFs across loci were similar over time in Navrongo Moreover, to understand whether isolates were genetically and Kinshasa for both 20- and 75-SNP barcodes (Supplementary similar over time, we calculated the PAS scores between study Tables S2, 3). Interestingly, for 8/20 and 23/75 SNP loci, locations over time using the “complete monoclonal infections” respectively, the nucleotide base that was defined as the minor data. PAS scores were significantly different in Kinshasa over time allele changed to the major allele from one year to the next in (Wilcoxon: 20-SNP: p = 0.031 and 75-SNP: p < 0.001) and in both Navrongo and Kinshasa (Figure 5). There were five SNP loci Navrongo from 2010 to 2013 for the 20-SNP barcode (Wilcoxon: with interchanging bases on chromosome 7 for the 20-SNP p = 0.032) but not the 75-SNP barcode (Wilcoxon: p = 0.078) barcode (Figure 5A), while the 75-SNP barcode had loci with (Table 3). Frontiers in Genetics 11 frontiersin.org Argyropoulos et al. 10.3389/fgene.2023.1071896 FIGURE 3 Pairwise allelic differentiation formonoclonal infections using the 20-SNP and 75-SNP barcodes. Pairwise FST values betweenWest, Central and East Africa were calculated fromN = 653 andN = 690 isolates for the 20- and 75-SNP barcodes, respectively. Pairwise FSTwas calculated as the proportion of allelic variance for the (A, C) 20- and (B, D) 75-SNP genotypes by (A, B) country or (C, D) by study population (study location per year). The Mantel’s test (r) and p-values are indicated for each SNP panel and population level (country and study population). Further, a discriminant analysis of principal components (DAPC) based onmonoclonal infections with no missing data for the (E) 20-SNP and (F) 75-SNP barcode are shown. Two eigenvalues were used to plot the DAPC, as indicated in the bottom insert. The plot can be segregated into three regions (ellipticals): West Africa (green dots), Central Africa (blue dots), and East Africa (red dots). 4 Discussion as shown by Mantel Test and DAPC, consistent with minimal divergence of loci with high gene flow across the African Here we present the first study to critically evaluate the use of continent (Mobegi et al., 2012; Mobegi et al., 2014; Duffy et al., two SNP barcodes in moderate-to-high transmission countries in 2017; MalariaGEN et al., 2021). But finer-scale estimates of genetic Africa with a high proportion of multiclonal P. falciparum diversity (He) and similarity (PAS) were not reflective of highly infections. Both 24- and 96-SNP barcodes could recapitulate a outcrossing populations, likely because these small molecular signal of large-scale genetic differentiation by geographic distance barcodes were not strictly biallelic and had similar and low Frontiers in Genetics 12 frontiersin.org Argyropoulos et al. 10.3389/fgene.2023.1071896 FIGURE 4 Genetic similarity within each study population using the (A) 20-SNP and (B) 75-SNP barcodes. 653 and 690 complete multilocus monoclonal haplotypes (i.e., with no missing data) for the 20- and 75-SNP barcodes were used to calculate the pairwise allele sharing (PAS) scores comparing isolates within each study population. Therewere 18,346 and 20,266 pairwise comparisons between haplotypes using the 20- and 75-SNP barcodes, respectively (see Supplementary Table S11). Colours represent populations in West Africa (green hues), Central Africa (blue hues) and East Africa (red hues). Horizontal central solid line represents the median, the box represents the interquartile range (IQR) from the 25th to 75th centiles, the whiskers indicate the most extreme data point, which is no more than 1.5 times the interquartile range from the box, and the dots show the outliers. minor allele frequencies (Table 4). Although multilocus SNP transmission settings, where the number of multiclonal infections haplotypes were found to be largely unique, they only differed at outweighs monoclonal infections, the use of SNP barcodes as a one or two loci. This paucity of informative loci led to the erroneous molecular marker for surveillance is constrained. We demonstrated conclusion that they appeared to be clonal or genetically similar, that reconstructing haplotypes from assigning a dominant allele is which may result in a less suitable solution to control. Additionally, random as alleles in a mixed infection are found at equal analysing two study locations with temporal data showed that the proportions. This led to only retaining monoclonal infections, allele frequencies per locus changed rapidly over short one-year removing half of the isolate data to perform reliable genetic periods, concordant with a large effective population size and high diversity and population genetics analyses. This is concerning outcrossing rates (Anderson et al., 2000). Our results highlight two due to possible introduced bias in reducing sample size when key points for SNP barcodes in moderate-to-high transmission performed in the real-world, seen with our case study in Obuasi settings in Africa, i) the high number of multiclonal infections where only approximately 15%–20% of the surveyed population had led to approximately half of the data loss and ii) the low minor monoclonal infections. This is further exacerbated when accounting allele frequencies across SNP loci biased genetic diversity and for the cost of equipment, reagents, and labour involved in the data population genetic estimates. generation. An additional cost that has not been considered is the Biallelic SNP markers have proven highly informative in need to survey large numbers of individuals to get enough molecular surveillance for P. falciparum in low-transmission monoclonal infections. The lack of useable data differs from settings. But our results underline that in moderate-to-high many other scenarios in the literature where SNPs have been Frontiers in Genetics 13 frontiersin.org Argyropoulos et al. 10.3389/fgene.2023.1071896 TABLE 3 Temporal changes in genetic diversity (expected heterozygosity, He) and genetic similarity (pairwise allele sharing scores, PAS) in Navrongo, Ghana (2010, 2011, and 2013) and Kinshasa, Democratic Republic of Congo (DRC) (2012 and 2013) for the 20- and 75-SNP barcodes. He* PAS ‡ Study populations over time 20-SNP 75-SNP 20-SNP 75-SNP Navrongo (Ghana) 2010 and 2013 0.656 0.500 0.032 0.078 2010 and 2011 0.125 0.494 2011 and 2013 0.204 0.465 Kinshasa (DRC) 2012 and 2013 0.135 0.631 0.013 <0.001 He = expected heterozygosity, PAS = pairwise allele sharing, N = number of isolates. *Data are presented as the p-value calculated by “Hs.test” function. ‡ Data are presented as the p-value calculated by Wilcoxon test. He was calculated using all multilocus haplotypes for both 20- and 75-SNP, barcodes: Navrongo N = 194, Kinshasa N = 122. PAS was calculated using complete multilocus haplotypes (no missing data): 20-SNP: Navrongo N = 108, Kinshasa N = 86; 75-SNP: Navrongo N = 120, Kinshasa N = 81. Navrongo 2011 was removed as there were ≤25 complete infection haplotypes for analysis. used in low-transmission regions with predominantly monoclonal et al., 2000; Wakeley et al., 2001; Helyar et al., 2011). While these loci infections. were polymorphic in Senegal and Thailand (24-SNP barcode While software packages such as THE REAL McCOIL (Chang (Daniels et al., 2008)) and along the Thai-Burma border over et al., 2017), DEploid (Zhu et al., 2018), and DEploidIBD (Zhu et al., 10 years (96-SNP barcode (Nkhoma et al., 2013)), when applied 2019) attempt to phase or reconstruct SNP datasets with multiclonal to these African populations, some loci were mono or triallelic, infections using Bayesian and/or Markov Chain Monte Carlo indicating fixation or hypermutable sites respectively, and other loci methods, they introduce a large degree of uncertainty and had low average MAFs than would be useful. This consequently assumptions, particularly when there are three or more genotypes biases estimates which rely on allele frequencies, such as expected per infection with a high number of MACs (Labbé et al., 2023). In heterozygosity, linkage disequilibrium, genetic similarity, and fact, in areas of such high transmission and endemicity, it is not population structure (Wakeley et al., 2001; Nielsen and uncommon for infections to contain five or more distinct P. Signorovitch, 2003; Helyar et al., 2011; Speed and Balding, 2015; falciparum clones per microlitre of blood (Chang et al., 2017; Taylor et al., 2019) To minimise these biases and for barcodes to Tiedje et al., 2017; 2022; World Health Organisation, 2018). This potentially work across multiple populations, loci must be carefully drawback extends to larger SNP-based panels (>500 SNPs) due to selected by local- and large-scale geospatial sampling and whole- the high occurrence of MACs, frequent outcrossing, and large genome sequencing of multiple isolates (Helyar et al., 2011); if these effective population size in high-transmission settings. For loci were to be analysed using neutral theory, as with these barcodes example a study by Verity et al. (2020) sequenced 2,537 isolates discussed, then these SNP loci must also be assessed for signals of in the Democratic Republic of Congo, Ghana, Tanzania, Uganda selection (e.g., using Tajima’s D). A study of this magnitude is and Zambia using a panel of 739 geographically informative SNPs currently very expensive (approximately $86 USD per isolate) and another panel of 1,151 putatively neutral SNPs across the P. (Tessema et al., 2020), laborious, and is not guaranteed to falciparum genome. Of these isolates, only 1,382 (54.5%) and 674 produce a SNP barcode that is temporally stable, particularly in (26.6%) respectively passed the quality control and filtering steps, highly recombining settings (as reviewed in (Escalante et al., 2015)), resulting in an enormous loss of data and expense. These issues of due to the profound effects of sexual recombination. cost-effectiveness are of relevance to public health where only Longitudinal investigations using SNP barcodes must err on the approximately $1-10 per person per annum is spent on malaria side of caution. Given the recent changes in antimalarial drug policy control in endemic countries in Africa (World Health Organisation, and use (World Health Organisation, 2022), it is possible that 2022). selection of the K76 allele of pfcrt (i.e., chloroquine sensitivity) is Of the remaining monoclonal samples that were able to be driving variation at Pf3D7_07_v3_435497 in the 20-SNP barcode. analysed, our results from SNP barcodes did not reflect diversity, This observation corresponds to the policy change to artemether- similarity and structure estimates as found in other studies using a lumefantrine (AL) and artesunate-amodiaquine (ASAQ) in Ghana higher magnitude of genome-wide SNPs (Mobegi et al., 2014; in 2007, where reports have indicated a higher use of AL (World Daniels et al., 2015; Amambua-Ngwa et al., 2019; Moser et al., Health Organisation, 2015) that selects for the K76 allele (Sisowath 2020; Verity et al., 2020; MalariaGEN et al., 2021), putatively neutral et al., 2009; Venkatesan et al., 2014); increased prevalence of K76 has microsatellites (Anderson et al., 2000; Mobegi et al., 2012; Duffy also been reported in a nearby region of Bongo District, Ghana et al., 2017; Argyropoulos et al., 2021) and antigenic markers (Narh et al., 2020). In DRC, there has been low yet steady increase in (Ruybal-Pesántez et al., 2017b; Day et al., 2017; Rorick et al., ACT use from 2% in 2010 to 30% in 2017–2018 (U.S. President’s 2018). One possible explanation for these observed discrepancies Malaria Initiative, 2020), coinciding with the slow increase in is the “ascertainment bias” phenomenon, where polymorphisms chloroquine sensitivity (pfcrt K76). This provides an example of that were discovered in few samples or locations can result in a how important longitudinal investigations of molecular panels are to deviation from an expected allele frequency distribution (Kuhner ensure population genetics theories are being upheld. Any temporal Frontiers in Genetics 14 frontiersin.org Argyropoulos et al. 10.3389/fgene.2023.1071896 FIGURE 5 Longitudinal changes in two P. falciparum populations over time with monoclonal infections and chloroquine (CQ) resistance in Africa over time. Temporal study locations include Navrongo, Ghana (2010, 2011 and 2013) and Kinshasa, DRC (2012 and 2013). Changes in the minor allele using (A) 20- SNP and (B) 75-SNP barcodes are shown for Navrongo, Ghana and Kinshasa, DRC, and include interchanging base from 1 year to the next (red), changes from or to a minor allele frequency (MAF) below 0.10 approximating fixation at that locus (black), and no base or significant MAF change over time (grey). Chloroquine (CQ) drug resistance patterns are shown (C)within each study population and (D) temporal study locations. Note, colours correspond to the classification on the Pf6 database for resistance (pink), sensitive (green), and undetermined (blue). See methods for resistance classification for these drugs. variation in allele frequencies related to outcrossing must complicate able to detect two previously identified bases (alleles) per locus (e.g., the calculation of priors for Bayesian inference. Taq-Man or Illumina GoldenGate). How then can we monitor genetic A key assumption when analysing SNPs in population genetics is diversity and population structure in moderate-to-high transmission that they are biallelic (Schlötterer, 2004), but as shown when using the settings? The answer likely lies in the use of polymorphic markers such whole-genome sequence data to generate our barcode haplotypes, this is as putatively neutral markers that permit the inclusion of “dominant” not always the case. The two SNP barcodes used in our analysis, infections (e.g., short tandem repeats (STRs) or microsatellites) however, were designed to be genotyped using platforms that are only (Anderson et al., 1999; Tessema et al., 2020). For example, Frontiers in Genetics 15 frontiersin.org Argyropoulos et al. 10.3389/fgene.2023.1071896 TABLE 4 Summary of SNP barcode performance in Africa to determine the genetic diversity and population structure of P. falciparum. Metric of interest Purpose 24-SNP barcode 96-SNP barcode Clonality (mono- or To perform downstream analyses on Only useful for MOI = 1 (Due to high Only useful for MOI = 1 (Due to high multiclonal) genetic diversity and population MACs >5%); >50% of isolates removed from MACs >5%); >50% of isolates removed from structure data analysis data analysis Biallelic (polymorphism) Common assumption and is required to Must remove 16.7% loci: 20-SNP barcode Must remove 25% of loci: 81-SNP barcode capture variation Minor Allele Frequencies Required to capture variation No further SNPs removed Must remove 7.5% of loci: 75-SNP barcode (MAFs) > 0.10 Independent segregation of Variation maintained irrespective of SNPs on chromosome 7 may be in LD to pfcrt Yes loci/selectively neutral areas under genetic selection Genetic diversity (He) Reflects variation in the gene pool of Due to low MAFs, He is also lower Due to low MAFs, He is also lower population Spatial genetic Reflects local evolutionary history of the Association at low resolution Association with slightly better resolution differentiation (FST) populations Genetic similarity (PAS) Detect presence of clonal/similar Due to low He, PAS is higher Due to low He, PAS is higher with smaller IQR parasites (clone outbreak) Temporal stability Similar allele frequencies maintained MAFs change over one-year MAFs change over one-year over time for longitudinal studies Abbreviations: MOI = multiplicity of infection; MACs = mixed allele calls; LD = linkage disequlibrium; MAF = minor allele frequency. microsatellites were able to resolve global P. falciparum structure with Molecular barcodes are a practical and low-cost solution to avoid only 12 markers (Anderson et al., 2000), while 9-10 microsatellite relying on whole-genome sequencing for surveillance. However here we markers were able to give realistic assessments of these measures related show the application of SNP barcodes encounters challenges in sub- to both long-lasting insecticidal net (LLIN) (Kattenberg et al., 2019) and Saharan Africa in moderate-to-high transmission settings due to the indoor residual spraying (IRS) (Argyropoulos et al., 2021) interventions, high number of multiclonal infections, frequent outcrossing, and large respectively, in moderate-to-high transmission settings. With respect to effective population size of P. falciparum as well as spatial and temporal neutral variation, STR loci are more useful to detect recent population variation. Alternative markers such as STRs and microhaplotypes are expansions than SNPs as they accumulate new mutations at a faster possible solutions to study P. falciparum population structure using rate, are multiallelic often in excess of 10 alleles, and have more private neutral theory. alleles; thus they remain the most informative putatively neutral markers in population genetic studies across many organisms (Ellegren, 2004; Selkoe and Toonen, 2006; Guichoux et al., 2011), Data availability statement including in P. falciparum and P. vivax genomes across various geographic populations (Han et al., 2022). Microhaplotypes, regions The datasets presented in this study can be found in online of 100–200 bp with high genetic diversity unbroken by recombination, respositories. The names of the repository/repositories and of SNPs and STR loci are currently proposed as a high-throughput and accession number(s) can be found at: https://datadryad.org/stash/ automated alternative to microsatellite genotypingmethods that rely on dataset/doi:10.5061/dryad.zw3r228bc, and https://www.malariagen. capillary electrophoresis (Tessema et al., 2020). However current net/resource/26. microhaplotype genotyping for P. falciparum is largely SNP-based and yet to be deployed in high-transmission settings in the field. Alternatively, adaptive genes may present an innovative Ethics statement approach (Barton, 2010) consistent with the large parasite population size seen within and between human hosts in sub- The studies involving human participants were reviewed and Saharan Africa. Antigenic markers, which rely on size- or approved by Noguchi Memorial Institute for Medical Research coding-sequence polymorphisms (e.g., msp2, csp, ama1, var), can Institutional Review Board. Written informed consent to participate distinguish highly diverse multiclonal infections, but cannot in this study was provided by the participants’ legal guardian/next of kin. construct haplotypes (Snounou et al., 1999; Ruybal-Pesántez et al., 2017b; Nelson et al., 2019). A recent study (Ghansah et al., 2023) compared the use of SNPs, microsatellites and var DBLa Author contributions typing (“varcoding”) to evaluate genetic diversity and population structure in a high-transmission setting in Ghana and found that KD conceptualised the research idea and project design. DA and while microsatellites provided greater resolution than SNPs, MHT designed the analysis. KK and KD acquired funding. KK and AG varcoding was superior in identifying finer-scale relatedness and completed the Obuasi field studies and 24-SNP barcode genotyping population structuring. with BA. KT assisted with analysis of Obuasi data. CA and MHT Frontiers in Genetics 16 frontiersin.org Argyropoulos et al. 10.3389/fgene.2023.1071896 extracted data from Pf6 database. DA completed population genetics Wellcome Open Research 2021642 DOI: 10.12688/ and data analysis, with support fromMHT. DAwrote the original draft wellcomeopenres.16168”. This research was supported by The of themanuscript. DA,MHT, and KD critically revised the manuscript. University of Melbourne’s Research Computing Services and the FL and KT provided critical review of the manuscript. All authors Petascale Campus Initiative. contributed to the article and approved the submitted version. Conflict of interest Funding The authors declare that the research was conducted in the TheObuasi project was funded by theAngloGoldAshanti, awarded absence of any commercial or financial relationships that could be to KK. The subsequent researchwas supported by the National Institute construed as a potential conflict of interest. of Allergy and Infectious Diseases, National Institutes of Health (Grant number: R01-AI084156 awarded to KD and KK). The funders had no role in study design, data collection and analysis, decision to publish, or Publisher’s note preparation of the manuscript. All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or Acknowledgments those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its We wish to thank the participants, communities and the Ghana manufacturer, is not guaranteed or endorsed by the publisher. Health Service in Obuasi, Ghana for their willingness to participate in this study. We would like to thank the field teams for their technical assistance in the field and laboratory personnel for sample Supplementary material collection and parasitological assessments. This publication uses data from the MalariaGEN Plasmodium falciparum Community The Supplementary Material for this article can be found online Project as described in “An open dataset of Plasmodium falciparum at: https://www.frontiersin.org/articles/10.3389/fgene.2023.1071896/ genome variation in 7,000 world-wide samples. MalariaGEN et al., full#supplementary-material References Agapow, P., and Burt, A. (2001). Indices of multilocus linkage disequilibrium. Mol. Aydemir, O., Janko, M., Hathaway, N. J., Verity, R., Mwandagalirwa, M. K., Tshefu, A. Ecol. Notes 1, 101–102. doi:10.1046/j.1471-8278.2000.00014.x K., et al. (2018). Drug-resistance and population structure of plasmodium falciparum across the democratic republic of Congo using high-throughput molecular inversion Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990). Basic probes. J. Infect. Dis. 218, 946–955. doi:10.1093/infdis/jiy223 local alignment search tool. J. Mol. Biol. 215, 403–410. doi:10.1016/s0022-2836(05) 80360-2 Babiker, H. A., Ranford-Cartwright, L. C., Currie, D., Carlwood, J. D., Billingsley, P., Teuscher, T., et al. (1994). Random mating in a natural population of the malaria Amambua-Ngwa, A., Jeffries, D., Amato, R., Worwui, A., Karim, M., Ceesay, S., et al. parasite Plasmodium falciparum. Parasitology 109, 413–421. doi:10.1017/ (2018). Consistent signatures of selection from genomic analysis of pairs of temporal s0031182000080665 and spatial Plasmodium falciparum populations from the Gambia. Sci. Rep. 8, 9687. doi:10.1038/s41598-018-28017-5 Bahl, A., Brunk, B., Crabtree, J., Fraunholz, M. J., Gajria, B., Grant, G. R., et al. (2003). PlasmoDB: The plasmodium genome resource. A database integrating experimental Amambua-Ngwa, A., Amenga-Etego, L., Kamau, E., Amato, R., Ghansah, A., Golassa, and computational data. Nucleic Acids Res. 31, 212–215. doi:10.1093/nar/gkg081 L., et al. (2019). Major subpopulations of Plasmodium falciparum in sub-Saharan Africa. Science 365, 813–816. doi:10.1126/science.aav5427 Baraka, V., Ishengoma, D. S., Fransis, F., Minja, D. T. R., Madebe, R. A., Ngatunga, D., et al. (2015). High-level Plasmodium falciparum sulfadoxine-pyrimethamine resistance Amegashie, E. A., Amenga-Etego, L., Adobor, C., Ogoti, P., Mbogo, K., Amambua-Ngwa, with the concomitant occurrence of septuple haplotype in Tanzania. Malar. J. 14, 439. A., et al. (2020). Population genetic analysis of the Plasmodium falciparum circumsporozoite doi:10.1186/s12936-015-0977-8 protein in two distinct ecological regions inGhana.Malar. J. 19, 437. doi:10.1186/s12936-020- 03510-3 Barton, N. (2010). Understanding adaptation in large populations. PLoS Genet. 6, e1000987. doi:10.1371/journal.pgen.1000987 Anderson, T., Su, X. Z., Bockarie, M., Lagog, M., and Day, K. P. (1999). Twelve microsatellite markers for characterization of Plasmodium falciparum from finger-prick Bei, A. K., Niang, M., Deme, A. B., Daniels, R. F., Sarr, F. D., Sokhna, C., et al. (2018). blood samples. Parasitology 119, 113–125. doi:10.1017/S0031182099004552 Dramatic changes in malaria population genetic complexity in dielmo and ndiop, Senegal, revealed using genomic surveillance. J. Infect. Dis. 217, 622–627. doi:10.1093/ Anderson, T., Haubold, B., Williams, J. T., Estrada-Franco, J. G., Richardson, L., infdis/jix580 Mollinedo, R., et al. (2000). Microsatellite markers reveal a spectrum of population structures in the malaria parasite plasmodium falciparum. Mol. Biol. Evol. 17, Bertin, G. I., Lavstsen, T., Guillonneau, F., Doritchamou, J., Wang, C. W., Jespersen, 1467–1482. doi:10.1093/oxfordjournals.molbev.a026247 J. S., et al. (2013). Expression of the domain cassette 8 plasmodium falciparum Erythrocyte membrane protein 1 is associated with cerebral malaria in Benin. PLoS Anderson, T., Nair, S., Sudimack, D., Williams, J. T., Mayxay, M., Netwon, P. One 8, e68368. doi:10.1371/journal.pone.0068368 N., et al. (2005). Geographical distribution of selected and putatively neutral SNPs in Southeast asian malaria parasites. Mol. Biol. Evol. 22, 2362–2374. doi:10. Chang, H.-H., Worby, C. J., Yeka, A., Nankabirwa, J., Kamya, M. R., Staedke, S. G., 1093/molbev/msi235 et al. (2017). The REAL McCOIL: A method for the concurrent estimation of the complexity of infection and SNP allele frequency for malaria parasites. PLOS Comput. Apinjoh, T. O., Tata, R. B., Anchang-Kimbi, J. K., Chi, H. F., Fon, E. M., Mugri, R. N., Biol. 13, e1005348. doi:10.1371/journal.pcbi.1005348 et al. (2015). Plasmodium falciparum merozoite surface protein 1 block 2 gene polymorphism in field isolates along the slope of mount Cameroon: A Charles, M., Das, S., Daniels, R., Kirkman, L., Delva, G. G., Destine, R., et al. (2016). cross – sectional study. BMC Infect. Dis. 15, 309. doi:10.1186/s12879-015-1066-x Plasmodium falciparum K76T pfcrt gene mutations and parasite population structure, Haiti, 2006–2009. Emerg. Infect. Dis. 22, 786–793. doi:10.3201/eid2205.150359 Argyropoulos, D. C., Ruybal-Pesántez, S., Deed, S. L., Oduro, A. R., Dadzie, S. K., Appawu, M. A., et al. (2021). The impact of indoor residual spraying on Plasmodium Daniels, R. F., Volkman, S. K., Milner, D. A., Mahesh, N., Neafsey, D. E., Park, D. J., falciparum microsatellite variation in an area of high seasonal malaria transmission in et al. (2008). A general SNP-based molecular barcode for Plasmodium falciparum Ghana, West Africa. Mol. Ecol. 30, 3974–3992. doi:10.1111/mec.16029 identification and tracking. Malar. J. 7, 223. doi:10.1186/1475-2875-7-223 Frontiers in Genetics 17 frontiersin.org Argyropoulos et al. 10.3389/fgene.2023.1071896 Daniels, R., Chang, H. H., Séne, P. D., Park, D. C., Neafsey, D. E., Schaffner, S. F., et al. Kamau, E., Campino, S., Amenga-Etego, L., Drury, E., Ishengoma, D., Johnson, K., (2013). Genetic surveillance detects both clonal and epidemic transmission of malaria et al. (2015). K13-Propeller polymorphisms in plasmodium falciparum parasites from following enhanced intervention in Senegal. PLoS One 8, e60780. doi:10.1371/journal. sub-saharan Africa. J. Infect. Dis. 211, 1352–1355. doi:10.1093/infdis/jiu608 pone.0060780 Kamvar, Z. N., Tabima, J. F., and Grünwald, N. J. (2014). Poppr: an R package for Daniels, R. F., Schaffner, S. F., Wenger, E. A., Proctor, J. L., Chang, H. H., Wong, W., genetic analysis of populations with clonal, partially clonal, and/or sexual reproduction. et al. (2015). Modeling malaria genomics reveals transmission decline and rebound in PeerJ 2, e281. doi:10.7717/peerj.281 Senegal. Proc. Natl. Acad. Sci. U. S. A. 112, 7067–7072. doi:10.1073/pnas.1505691112 Kattenberg, J. H., Razook, Z., Keo, R., Koepfli, C., Jennison, C., Lautu-Ninda, D., et al. Day, K. P., Artzy-Randrup, Y., Tiedje, K. E., Rougeron, V., Chen, D. S., Rask, T. S., (2019). Monitoring of Plasmodium falciparum and Plasmodium vivax using et al. (2017). Evidence of strain structure in Plasmodium falciparum var gene repertoires microsatellite markers indicates limited changes in population structure after in children from Gabon, West Africa. Proc. Natl. Acad. Sci. 114, E4103–E4111. doi:10. substantial transmission decline in Papua New Guinea. Biorxiv. doi:10.1101/817320 1073/pnas.1613018114 Khlestkina, E. K., and Salina, E. A. (2006). SNP markers: Methods of analysis, ways of Diakité, S. A. S., Traoré, K., Sanogo, I., Clark, T. G., Campino, S., Sangaré, M., et al. development, and comparison on an example of common wheat. Russ. J. Genet. 42, (2019). A comprehensive analysis of drug resistance molecular markers and 585–594. doi:10.1134/s1022795406060019 Plasmodium falciparum genetic diversity in two malaria endemic sites in Mali. Kone, A., Mu, J., Maiga, H., Beavogui, A. H., Yattara, O., Sagara, I., et al. (2013). Malar. J. 18, 361. doi:10.1186/s12936-019-2986-5 Quinine treatment selects the pfnhe–1 ms4760–1 polymorphism in Malian patients Duffy, C. W., Assefa, S. A., Abugri, J., Amoako, N., Owusu-Agyei, S., Anyorigiya, T., with falciparum malaria. J. Infect. Dis. 207, 520–527. doi:10.1093/infdis/jis691 et al. (2015). Comparison of genomic signatures of selection on Plasmodium falciparum Kone, A., Sissoko, S., Fofana, B., Sangare, C. O., Dembele, D., Haidara, A. S., et al. between different regions of a country with high malaria endemicity. BMCGenomics 16, (2020). Different Plasmodium falciparum clearance times in two Malian villages 527. doi:10.1186/s12864-015-1746-3 following artesunate monotherapy. Int. J. Infect. Dis. 95, 399–405. doi:10.1016/j.ijid. Duffy, C. W., Ba, H., Assefa, S. A., Ahouidi, A. D., Deh, Y. B., Tandia, A., et al. (2017). 2020.03.082 Population genetic structure and adaptation of malaria parasites on the edge of endemic Kuhner, M. K., Beerli, P., Yamato, J., and Felsenstein, J. (2000). Usefulness of single distribution. Mol. Ecol. 26, 2880–2894. doi:10.1111/mec.14066 nucleotide polymorphism data for estimating population parameters. Genetics 156, Duffy, C. W., Amambua-Ngwa, A., Ahouidi, A. D., Diakite, M., Awandare, G. A., Ba, 439–447. doi:10.1093/genetics/156.1.439 H., et al. (2018). Multi-population genomic analysis of malaria parasites indicates local Labbé, F., He, Q., Zhan, Q., Tiedje, K. E., Argyropoulos, D. C., Tan, M. H., et al. selection and differentiation at the gdv1 locus regulating sexual development. Sci. Rep. 8, (2023). Neutral vs. non-neutral genetic footprints of Plasmodium falciparum 15763. doi:10.1038/s41598-018-34078-3 multiclonal infections. PLOS Comput. Biol. 19, e1010816. doi:10.1371/journal.pcbi. Ellegren, H. (2004). Microsatellites: Simple sequences with complex evolution. Nat. 1010816 Rev. Genet. 5, 435–445. doi:10.1038/nrg1348 Langridge, P., and Chalmers, K. (2005). “Molecular marker systems in plant breeding Escalante, A. A., and Pacheco, M. A. (2019). Malaria molecular epidemiology: An and crop improvement,” in Biotechnology in agriculture and forestry, 3–22. doi:10.1007/ evolutionary genetics perspective. Microbiol. Spectr. 7. doi:10.1128/microbiolspec.ame- 3-540-26538-4_1 0010-2019 Laurent, Z. R. D., Chebon, L. J., Ingasia, L. A., Akala, H. M., Andagalu, B., Ochola- Escalante, A. A., Ferreira, M. U., Vinetz, J. M., Cui, L., Volkman, S. K., Pacheco, M. A., Oyier, L. I., et al. (2018). Polymorphisms in the K13 gene in plasmodium falciparum et al. (2015). Malaria molecular epidemiology: Lessons from the international centers of from different malaria transmission areas of Kenya. Am. J. Trop. Med. Hyg. 98, excellence for malaria research network. Am. J. Trop. Med. Hyg. 93, 79–86. doi:10.4269/ 1360–1366. doi:10.4269/ajtmh.17-0505 ajtmh.15-0005 MalariaGENAhouidi, A. D., Ali, M., Almagro-Garcia, J., Amambua-Ngwa, A., Flesch, E. P., Rotella, J. J., Thomson, J. M., Graves, T. A., and Garrott, R. A. (2018). Amaratunga, C., et al. (2021). An open dataset of Plasmodium falciparum genome Evaluating sample size to estimate genetic management metrics in the genomics era. variation in 7,000 worldwide samples. Wellcome Open Res. 6, 42. doi:10.12688/ Mol. Ecol. Resour. 18, 1077–1091. doi:10.1111/1755-0998.12898 wellcomeopenres.16168.2 Gerlovina, I., Gerlovin, B., Rodríguez-Barraquer, I., and Greenhouse, B. (2022). Manske, M., Miotto, O., Campino, S., Auburn, S., Almagro-Garcia, J., Maslen, G., Dcifer: An IBD-based method to calculate genetic distance between polyclonal et al. (2012). Analysis of Plasmodium falciparum diversity in natural infections by deep infections. bioRxiv. doi:10.1101/2022.04.14.488406 sequencing. Nature 487, 375–379. doi:10.1038/nature11174 Ghansah, A., Amenga-Etego, L., Amambua-Ngwa, A., Andagalu, B., Apinjoh, T., Mensah, B. A., Aydemir, O., Myers-Hansen, J. L., Opoku, M., Hathaway, N. J., Marsh, Bouyou-Akotet, M., et al. (2014). Monitoring parasite diversity for malaria P. W., et al. (2020). Antimalarial drug resistance profiling of plasmodium falciparum elimination in sub-Saharan Africa. Science 345, 1297–1298. doi:10.1126/science. infections in Ghana using molecular inversion probes and next-generation sequencing. 1259423 Antimicrob. Agents Chemother. 64, 014233–e1519. doi:10.1128/aac.01423-19 Ghansah, A., Tiedje, K. E., Argyropoulos, D. C., Onwona, C. O., Deed, S. L., Labbé, F., Mensah-Brown, H. E., Amoako, N., Abugri, J., Stewart, L. B., Agongo, G., Dickson, E. et al. (2023). Comparison of molecular surveillance methods to assess changes in the K., et al. (2015). Analysis of erythrocyte invasion mechanisms of plasmodium population genetics of Plasmodium falciparum in high transmission. Front. Parasitol. 2, falciparum clinical isolates across 3 malaria-endemic areas in Ghana. J. Infect. Dis. 1067966. doi:10.3389/fpara.2023.1067966 212, 1288–1297. doi:10.1093/infdis/jiv207 Guichoux, E., Lagache, L., Wagner, S., Chaumeil, P., Léger, P., Lepais, O., et al. (2011). Miles, A., Iqbal, Z., Vauterin, P., Pearson, R., Campino, S., Theron, M., et al. (2016). Current trends in microsatellite genotyping. Mol. Ecol. Resour. 11, 591–611. doi:10. Indels, structural variation, and recombination drive genomic diversity in Plasmodium 1111/j.1755-0998.2011.03014.x falciparum. Genome Res. 26, 1288–1299. doi:10.1101/gr.203711.115 Hamilton, W. L., Amato, R., van der Pluijm, R. W., Jacob, C. G., Quang, H. H., Thanh, Mobegi, V. A., Loua, K. M., Ahouidi, A. D., Satoguina, J., Nwakanma, D. C., T.-N. N., et al. (2019). Evolution and expansion of multidrug-resistant malaria in Amambua-Ngwa, A., et al. (2012). Population genetic structure of Plasmodium Southeast Asia: A genomic epidemiology study. Lancet Infect. Dis. 19, 943–951. doi:10. falciparum across a region of diverse endemicity in West Africa. Malar. J. 11, 223. 1016/s1473-3099(19)30392-5 doi:10.1186/1475-2875-11-223 Han, J., Munro, J. E., Kocoski, A., Barry, A. E., and Bahlo, M. (2022). Population-level Mobegi, V. A., Duffy, C. W., Amambua-Ngwa, A., Loua, K. M., Laman, E., genome-wide STR discovery and validation for population structure and genetic Nwakanma, D. C., et al. (2014). Genome-wide analysis of selection on the malaria diversity assessment of Plasmodium species. PLOS Genet. 18, e1009604. doi:10.1371/ parasite plasmodium falciparum in West african populations of differing infection journal.pgen.1009604 endemicity. Mol. Biol. Evol. 31, 1490–1499. doi:10.1093/molbev/msu106 Helyar, S. J., Hemmer-Hansen, J., Bekkevold, D., Taylor, M. I., Ogden, R., Limborg, M. Moser, K. A., Madebe, R. A., Aydemir, O., Chiduo, M. G., Mandara, C. I., Rumisha, S. T., et al. (2011). Application of SNPs for population genetics of nonmodel organisms: F., et al. (2020). Describing the current status of Plasmodium falciparum population New opportunities and challenges. Mol. Ecol. Resour. 11, 123–136. doi:10.1111/j.1755- structure and drug resistance within mainland Tanzania using molecular inversion 0998.2010.02943.x probes. Mol. Ecol. 30, 100–113. doi:10.1111/mec.15706 Henden, L., Lee, S., Mueller, I., Barry, A., and Bahlo, M. (2018). Identity-by-descent Narh, C. A., Ghansah, A., Duffy, M. F., Ruybal-Pesántez, S., Onwona, C. O., Oduro, A. analyses for measuring population dynamics and selection in recombining pathogens. R., et al. (2020). Evolution of antimalarial drug resistance markers in the reservoir of PLOS Genet. 14, e1007279. doi:10.1371/journal.pgen.1007279 Plasmodium falciparum infections in the Upper East Region of Ghana. J. Infect. Dis. 222, 1692–1701. doi:10.1093/infdis/jiaa286 Hoban, S., and Schlarbaum, S. (2014). Optimal sampling of seeds from plant populations for ex-situ conservation of genetic biodiversity, considering realistic Neafsey, D. E., Schaffner, S. F., Volkman, S. K., Park, D., Montgomery, P., Milner, D. population structure. Biol. Conserv. 177, 90–99. doi:10.1016/j.biocon.2014.06.014 A., et al. (2008). Genome-wide SNP genotyping highlights the role of natural selection in Plasmodium falciparumpopulation divergence. Genome Biol. 9, R171. doi:10.1186/gb- Jombart, T. (2008). adegenet: a R package for the multivariate analysis of genetic 2008-9-12-r171 markers. Bioinformatics 24, 1403–1405. doi:10.1093/bioinformatics/btn129 Nelson, C. S., Sumner, K. M., Freedman, E., Saelens, J. W., Obala, A. A., Mangeni, Jost, L., Archer, F., Flanagan, S., Gaggiotti, O., Hoban, S., and Latch, E. (2018). J. N., et al. (2019). High-resolution micro-epidemiology of parasite spatial and temporal Differentiation measures for conservation genetics. Evol. Appl. 11, 1139–1148. doi:10. dynamics in a high malaria transmission setting in Kenya. Nat. Commun. 10, 5615. 1111/eva.12590 doi:10.1038/s41467-019-13578-4 Frontiers in Genetics 18 frontiersin.org Argyropoulos et al. 10.3389/fgene.2023.1071896 Ngalah, B. S., Ingasia, L. A., Cheruiyot, A. C., Chebon, L. J., Juma, D. W., Muiruri, P., Snounou, G., Zhu, X., Siripoon, N., Jarra, W., Thaithong, S., Brown, K. N., et al. et al. (2015). Analysis of major genome loci underlying artemisinin resistance and (1999). Biased distribution of msp1 andmsp2 allelic variants in Plasmodium falciparum pfmdr1 copy number in pre- and post-ACTs in western Kenya. Sci. Rep. 5, 8308–8316. populations in Thailand. Trans. R. Soc. Trop. Med. Hyg. 93, 369–374. doi:10.1016/ doi:10.1038/srep08308 s0035-9203(99)90120-7 Nielsen, R., and Signorovitch, J. (2003). Correcting for ascertainment biases when Speed, D., and Balding, D. J. (2015). Relatedness in the post-genomic era: Is it still analyzing SNP data: Applications to the estimation of linkage disequilibrium. Theor. useful? Nat. Rev. Genet. 16, 33–44. doi:10.1038/nrg3821 Popul. Biol. 63, 245–255. doi:10.1016/S0040-5809(03)00005-4 Syvänen, A.-C. (2001). Accessing genetic variation: Genotyping single nucleotide Nkhoma, S. C., Nair, S., Al-Saai, S., Ashley, E. A., McGready, R., Phyo, A. P., et al. polymorphisms. Nat. Rev. Genet. 2, 930–942. doi:10.1038/35103535 (2013). Population genetic correlates of declining transmission in a human pathogen. Taylor, A. R., Schaffner, S. F., Cerqueira, G. C., Nkhoma, S. C., Anderson, T., Mol. Ecol. 22, 273–285. doi:10.1111/mec.12099 Sriprawat, K., et al. (2017). Quantifying connectivity between local Plasmodium Ocholla, H., Preston, M. D., Mipando, M., Jensen, A. T. R., Campino, S., MacInnis, B., falciparum malaria parasite populations using identity by descent. PLOS Genet. 13, et al. (2014). Whole-genome scans provide evidence of adaptive evolution in Malawian e1007065. doi:10.1371/journal.pgen.1007065 plasmodium falciparum isolates. J. Infect. Dis. 210, 1991–2000. doi:10.1093/infdis/ Taylor, A. R., Jacob, P. E., Neafsey, D. E., and Buckee, C. O. (2019). Estimating jiu349 relatedness between malaria parasites. Genetics 212, 1337–1351. doi:10.1534/genetics. Ohashi, J., and Tokunaga, K. (2003). Power of genome-wide linkage disequilibrium 119.302120 testing by using microsatellite markers. J. Hum. Genet. 48, 487–491. doi:10.1007/ Tessema, S. K., Hathaway, N. J., Teyssier, N. B., Murphy, M., Chen, A., Aydemir, O., s10038-003-0058-7 et al. (2020). Sensitive, highly multiplexed sequencing of microhaplotypes from the Oksanen, J., Blanchet, F. G., Friendly, M., Kindt, R., Legendre, P., McGlinn, D., et al. Plasmodium falciparum heterozygome. J. Infect. Dis. 225, 1227–1237. doi:10.1093/ (2020). vegan: Community ecology package. infdis/jiaa527 Onyamboko, M. A., Fanello, C. I., Wongsaen, K., Tarning, J., Cheah, P. Y., Tshefu, K. Tiedje, K. E., Oduro, A. R., Agongo, G., Anyorigiya, T., Azongo, D., Awine, T., et al. A., et al. (2014). Randomized comparison of the efficacies and tolerabilities of three (2017). Seasonal variation in the epidemiology of asymptomatic plasmodium artemisinin-based combination treatments for children with acute plasmodium falciparum infections across two catchment areas in Bongo District, Ghana. Am. falciparum malaria in the democratic republic of the Congo. Antimicrob. Agents J. Trop. Med. Hyg. 97, 199–212. doi:10.4269/ajtmh.16-0959 Chemother. 58, 5528–5536. doi:10.1128/aac.02682-14 Tiedje, K. E., Oduro, A. R., Bangre, O., Amenga-Etego, L., Dadzie, S. K., Appawu, M. Patterson, N., Price, A. L., and Reich, D. (2006). Population structure and A., et al. (2022). Indoor residual spraying with a non-pyrethroid insecticide reduces the eigenanalysis. PLoS Genet. 2, e190. doi:10.1371/journal.pgen.0020190 reservoir of Plasmodium falciparum in a high-transmission area in northern Ghana. PLOS Glob. Public Heal 2, e0000285. doi:10.1371/journal.pgph.0000285 Paul, R. E., Packer, M. J., Walmsley, M., Lagog, M., Ranford-Cartwright, L. C., Paru, R., et al. (1995). Mating patterns in malaria parasite populations of Papua New Guinea. U.S. President’s Malaria Initiative (2012). FY 2012 Malawi malaria operational plan. Science 269, 1709–1711. doi:10.1126/science.7569897 U.S. President’s Malaria Initiative (2015). FY 2015 Kenya malaria operational plan. Pruett, C. L., and Winker, K. (2008). The effects of sample size on population genetic U.S. President’s Malaria Initiative (2017). FY 2017 Kenya malaria operational plan. diversity estimates in song sparrows Melospiza melodia. J. Avian Biol. 39, 252–256. doi:10.1111/j.0908-8857.2008.04094.x U.S. President’s Malaria Initiative (2020). FY 2020 democratic republic of Congo malaria operational plan. Qu, W., Liang, N., Wu, Z., Zhao, Y., and Chu, D. (2020). Minimum sample sizes for invasion genomics: Empirical investigation in an invasive whitefly. Ecol. Evol. 10, 38–49. Venkatesan, M., Gadalla, N. B., Stepniewska, K., Dahal, P., Nsanzabana, C., Moriera, doi:10.1002/ece3.5677 C., et al. (2014). Polymorphisms in plasmodium falciparum chloroquine resistance transporter and multidrug resistance 1 genes: Parasite risk factors that affect treatment R Core Team (2018). R: A language and environment for statistical computing. outcomes for P. Falciparum malaria after artemether-lumefantrine and artesunate- Vienna, Austria: R Foundation for Statistical Computing. amodiaquine. Am. J. Trop. Med. Hyg. 91, 833–843. doi:10.4269/ajtmh.14-0031 Ravenhall, M., Benavente, E. D., Mipando, M., Jensen, A. T. R., Sutherland, C. J., Verity, R., Aydemir, O., Brazeau, N. F., Watson, O. J., Hathaway, N. J., Mwandagalirwa, M. Roper, C., et al. (2016). Characterizing the impact of sustained sulfadoxine/ K., et al. (2020). The impact of antimalarial resistance on the genetic structure of Plasmodium pyrimethamine use upon the Plasmodium falciparum population in Malawi. Malar. falciparum in the DRC. Nat. Commun. 11, 2107. doi:10.1038/s41467-020-15779-8 J. 15, 575. doi:10.1186/s12936-016-1634-6 Vignal, A., Milan, D., SanCristobal, M., and Eggen, A. (2002). A review on SNP and Rice, B. L., Golden, C. D., Anjaranirina, E. J. G., Botelho, C. M., Volkman, S. K., and other types of molecular markers and their use in animal genetics. Genet. Sel. Evol. 34, Hartl, D. L. (2016). Genetic evidence that theMakira region in northeasternMadagascar 275–305. doi:10.1186/1297-9686-34-3-275 is a hotspot of malaria transmission. Malar. J. 15, 596. doi:10.1186/s12936-016-1644-4 Volkman, S. K., Sabeti, P. C., DeCaprio, D., Neafsey, D. E., Schaffner, S. F., Milner, D. Rorick, M. M., Artzy-Randrup, Y., Ruybal-Pesántez, S., Tiedje, K. E., Rask, T. S., A., et al. (2007). A genome-wide map of diversity in Plasmodium falciparum. Nat. Oduro, A., et al. (2018). Signatures of competition and strain structure within the major Genet. 39, 113–119. doi:10.1038/ng1930 blood-stage antigen of Plasmodium falciparum in a local community in Ghana. Ecol. Evol. 8, 3574–3588. doi:10.1002/ece3.3803 Wakeley, J., Nielsen, R., Liu-Cordero, S. N., and Ardlie, K. (2001). The discovery of single-nucleotide polymorphisms—And inferences about human demographic history. RStudio Team (2015). RStudio: Integrated development for R. Boston, MA: PBC. Am. J. Hum. Genet. 69, 1332–1347. doi:10.1086/324521 Ruybal-Pesántez, S., Tiedje, K. E., Rorick, M. M., Amenga-Etego, L., Ghansah, A., R West, P. A., Protopopoff, N., Rowland, M., Cumming, E., Rand, A., Drakeley, C., et al. Oduro, A., et al. (2017a). Lack of geospatial population structure yet significant linkage (2013). Malaria risk factors in north west Tanzania: The effect of spraying, nets and disequilibrium in the reservoir of plasmodium falciparum in Bongo District, Ghana. wealth. PLoS One 8, e65787. doi:10.1371/journal.pone.0065787 Am. J. Trop. Med. Hyg. 97, 1180–1189. doi:10.4269/ajtmh.17-0119 WHO/GMP (2017). A framework for malaria elimination. Geneva: World Health Ruybal-Pesántez, S., Tiedje, K. E., Tonkin-Hill, G., Rask, T. S., Kamya, M. R., Organization, 100. Available at: https://apps.who.int/iris/bitstream/handle/10665/ Greenhouse, B. R., et al. (2017b). Population genomics of virulence genes of 254761/9789241511988-eng.pdf. Plasmodium falciparum in clinical isolates from Uganda. Sci. Rep. 7, 11810. doi:10. 1038/s41598-017-11814-9 Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R., et al. (2019). Welcome to the tidyverse. J. Open Source Softw. 4, 1686. doi:10.21105/joss.01686 Schaffner, S. F., Taylor, A. R., Wong, W., Wirth, D. F., and Neafsey, D. E. (2018). HmmIBD: Software to infer pairwise identity by descent between haploid genotypes. Winter, D. J. (2012). mmod: an R library for the calculation of population Malar. J. 17, 196–213. doi:10.1186/s12936-018-2349-7 differentiation statistics. Mol. Ecol. Resour. 12, 1158–1160. doi:10.1111/j.1755-0998. 2012.03174.x Schlötterer, C. (2004). The evolution of molecular markers— Just a matter of fashion? Nat. Rev. Genet. 5, 63–69. doi:10.1038/nrg1249 World Health Organisation (2015). Guidelines for case management of malaria in Ghana. Third Edition. Selkoe, K. A., and Toonen, R. J. (2006). Microsatellites for ecologists: A practical guide to using and evaluating microsatellite markers. Ecol. Lett. 9, 615–629. doi:10.1111/j. World Health Organisation (2018). High burden to high impact: A targeted malaria 1461-0248.2006.00889.x response. doi:10.1071/EC12504 Sisowath, C., Petersen, I., Veiga, M. I., Mårtensson, A., Premji, Z., Björkman, A., et al. World Health Organisation (2022). World malaria report 2022. (2009). In vivo selection of plasmodium falciparum parasites carrying the chloroquine- Zhu, S. J., Almagro-Garcia, J., and McVean, G. (2018). Deconvolution of multiple susceptible pfcrt K76 allele after treatment with artemether-lumefantrine in Africa. infections in Plasmodium falciparum from high throughput sequencing data. J. Infect. Dis. 199, 750–757. doi:10.1086/596738 Bioinformatics 34, 9–15. doi:10.1093/bioinformatics/btx530 Sisya, T. J., Kamn’gona, R. M., Vareta, J. A., Fulakeza, J. M., Mukaka, M. F. J., Seydel, Zhu, S. J., Hendry, J. A., Almagro-Garcia, J., Pearson, R. D., Amato, R., Miles, A., et al. K. B., et al. (2015). Subtle changes in Plasmodium falciparum infection complexity (2019). The origins and relatedness structure of mixed infections vary with local following enhanced intervention in Malawi. Acta Trop. 142, 108–114. doi:10.1016/j. prevalence of P. falciparum malaria. Elife 8, e40845. doi:10.7554/elife.40845 actatropica.2014.11.008 Frontiers in Genetics 19 frontiersin.org