999640 EVB0010.1177/1176934321999640Evolutionary BioinformaticsTandoh et al research-article2021 Plasmodium falciparum Malaria Parasites in Ghana Show Evolutionary BioinformaticsVolume 17: 1–9 Signatures of Balancing Selection at Artemisinin © The Author(s) 2021Article reuse guidelines: Resistance Predisposing Background Genes sagepub.com/journals-permissions hDttOpsI:: /1/d0o.i1.o1rg7/71/01.1177679/131473629134939291694909640 Kwesi Z Tandoh1 , Lucas Amenga-Etego1, Neils B Quashie2,3, Gordon Awandare1, Michael Wilson4 and Nancy O Duah-Quashie2 1West African Centre for Cell Biology of Infectious Pathogens, Department of Biochemistry, Cell and Molecular Biology, College of Basic and Applied Sciences, University of Ghana, Accra, Ghana. 2Department of Epidemiology, Noguchi Memorial Institute for Medical Research, College of Health Sciences, University of Ghana, Accra, Ghana. 3Centre for Tropical Clinical Pharmacology and Therapeutics, School of Medicine and Dentistry, College of Health Sciences, University of Ghana, Accra, Ghana. 4Department of Parasitology, Noguchi Memorial Institute for Medical Research, College of Health sciences, University of Ghana, Accra, Ghana. ABSTRACT: Sub-Saharan Africa is courting the risk of artemisinin resistance (ARTr) emerging in Plasmodium falciparum malaria parasites. Current molecular surveillance efforts for ARTr have been built on the utility of P. falciparum kelch13 (pfk13) validated molecular markers. However, whether these molecular markers will serve the purpose of early detection of artemisinin-resistant parasites in Ghana is hinged on a pfk13 dependent evolution. Here, we tested the hypothesis that the background pfk13 genome may be present before the pfk13 ARTr-conferring variant(s) is selected and that signatures of balancing selection on these genomic loci may serve as an early warning signal of ARTr. We ana- lyzed 12 198 single nucleotide polymorphisms (SNPs) in Ghanaian clinical isolates in the Pf3K MalariaGEN dataset that passed a stringent filter- ing regimen. We identified signatures of balancing selection in 2 genes (phosphatidylinositol 4-kinase and chloroquine resistance transporter) previously reported as background loci for ARTr. These genes showed statistically significant and high positive values for Tajima’s D, Fu and Li’s F, and Fu and Li’s D. This indicates that the biodiversity required to establish a pfk13 background genome may have been primed in clinical isolates of P. falciparum from Ghana as of 2010. Despite the absence of ARTr in Ghana to date, our finding supports the current use of pfk13 for molecular surveillance of ARTr in Ghana and highlights the potential utility of monitoring malaria parasite populations for balancing selection in ARTr precursor background genes as early warning molecular signatures for the emergence of ARTr. KeywoRdS: Population genomics, artemisinin resistance, molecular surveillance, malaria, Plasmodium falciparum, signatures of balanc- ing selection ReCeIVed: August 29, 2020. ACCePTed: February 5, 2021. deClARATIon oF ConFlICTInG InTeReSTS: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this TyPe: Original Research article. FundInG: The author(s) disclosed receipt of the following financial support for the CoRReSPondInG AuTHoRS: Nancy O Duah-Quashie, Department of Epidemiology, research, authorship, and/or publication of this article: KZT is supported by PhD fellowships Noguchi Memorial Institute for Medical Research, College of Health Sciences, University from Building a New Generation of Academics (BANGA-Africa, University of Ghana, and of Ghana, Accra, Ghana. Email: NDuah@noguchi.ug.edu.gh Carnegie Corporation of New York); and WACCBIP-World Bank ACE. The DELTAS Africa Initiative is an independent funding scheme of the African Academy of Sciences (AAS)’s Alliance for Accelerating Excellence in Science in Africa (AESA) and supported by the New Kwesi Z Tandoh, West African Centre for Cell Biology of Infectious Pathogens, Department Partnership for Africa’s Development Planning and Coordinating Agency (NEPAD Agency) of Biochemistry, Cell and Molecular Biology, College of Basic and Applied Sciences, with funding from the Wellcome Trust (107755/Z/15/Z: Awandare) and the UK government. University of Ghana, Accra, Ghana. Email: kztandoh@st.ug.edu.gh The views expressed in this publication are those of the author(s) and not necessarily those of AAS, NEPAD Agency, Wellcome Trust, or the UK government. Introduction (AL), and dihydroartemisinin-piperaquine (DHAPQ).6 The global Plasmodium falciparum malaria burden is highest in Although artemisinin-resistant (ARTr) P. falciparum has not sub-Saharan Africa (SSA). In 2018, the World Health been confirmed in Ghana or SSA yet, there is a real risk of such Organization (WHO) estimated that 213 million people in SSA parasites emerging and severely undermining malaria control suffered from malaria, with an estimated 380 000 deaths.1 In efforts.7-9 Since the discovery of ARTr parasites in South-East Ghana, malaria transmission is hyperendemic, with a high Asia (SEA),10,11 there is global concern about the spread of these malaria attributable morbidity and mortality, particularly among parasites to SSA. Currently, molecular surveillance for ARTr is pregnant women and children under 5 years of age.2 Until 2005, achieved with validated P. falciparum kelch-13 (pfk13) molecular chloroquine was the main drug used for malaria treatment in markers.12-14 However, with the possibility of involvement of Ghana, however, widespread P. falciparum resistance led to the multiple mechanisms in ARTr development, there are concerns replacement of chloroquine with artemisinin-based combination about the universal utility of pfk13 as a marker of ARTr.15-21 therapy (ACT) as the first-choice antimalarial chemotherapy for Additionally, SSA has been shown to harbor a biodiverse the management of uncomplicated malaria in line with the P. falciparum population with sufficient variation at the pfk13 WHO recommendations.3-5 Currently, ACTs recommended by locus to allow a soft selection sweep under ART drug the national malaria control program (NMCP) of Ghana include p ressure.9,18,22 Despite this, ARTr malaria parasites are yet to artesunate–amodiaquine (AS AQ), artemether-lumefantrine emerge in SSA.8,9 Creative Commons Non Commercial CC BY-NC: This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (https://creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage). 2 Evolutionary Bioinformatics A possible explanation for the delay of artemisinin resist- Results ance emergence in SSA is the prerequisite for a pfk13 ARTr- Multiplicity of infection conferring background genome. This model is framed on the hypothesis that the pfk13 ARTr-conferring variants only Multiplicity of infection (MOI) can result in spurious signa- 22 express the phenotype in the presence of a particular pfk13 tures of selection metrics if it is not evaluated. The proportion genomic background that weathers any fitness cost of the pfk13 of samples with multiple infections was determined by calcu- ARTr conferring variant(s) and may augment any baseline lating the genome-wide within sample F statistic (Fws). We partner drug resistance required for the parasites to be trans- found 56% (343) of sample to have Fws < 0.95, which is indica- mitted.23,24 Whether this pfk13 genome background evolves tive of multi-genomic infection (Figure 1). These samples were before, with, or after the pfk13 ARTr variant is selected remains excluded from tests of neutrality analysis to determine signa- unclear.18 This ARTr genomic model presents a high genetic tures of balancing selection. barrier for the emergence of ARTr and may explain the delay in the manifestation of the phenotype in SSA. The pfk13 back- P. falciparum population structure ground genome has been characterized by population associa- Next, we interrogated the hypothesis that there is no popula- tion studies to involve single nucleotide polymorphisms (SNPs) tion structure in the malaria parasite population in Ghana. in the following loci: ferredoxin (fd-D193Y), apicoplast ribo- Population structure analysis was undertaken using principal somal protein S10 (arps10-V127M), multidrug resistance pro- component analysis (PCA). We sought to uncover any effect of tein2 (mdr2-T484I), and chloroquine resistance transporter 23 varying local transmission and geographical distribution on the (crt-N326S). A retrospective longitudinal genomic study also parasite population structure. We adopted an approach that identified background SNPs that were associated with the evaluates population structure by utilizing within-sample allele ARTr phenotype in malaria parasites from North-Western 25 Thailand.24 frequencies (WSAF) across all variable genomic loci. This study used genotype-phenotype association Principal components were plotted using ggplot2 package in R tests and genomic scans for signals of selection to determine version 4.0. No significant parasite population structure was genomic loci in P. falciparum that may contribute to the pfk13 observed (Figure 2). ARTr background genome. These variants followed a similar PCA analysis for samples with single clonal infection non-reference allele frequency temporal trajectory as the domi- (N = 274). Sample coordinates for PCA were obtained using nant pfk13 ARTr conferring variant C580Y in SEA and vcfdo. A subsequent plot was generated in R.4.0 with the included SNPs in the following genes: Phosphatidylinositol ggplot2 package. Principal components 1 and 2 show no genetic 4-kinase (S915G), Sec14 domain-containing protein (L498F differentiation within the Ghana samples of P. falciparum and N615D), Ubiquitin-protein ligase (S57T), Ubiquitin car- boxyl-terminal hydrolase (R3138H), and Sentrin-specific pro- tease 2, putative protein (H423Y).24 Summary of genomic diversity among single clone To mitigate the risk of ARTr spreading and taking hold in infections SSA, molecular surveillance for parasites with ARTr is criti- We subsequently only analyzed sequence data for samples with cal and a molecular-based early warning metric that predicts single clone infections (Fws > 0.95, N = 274) with 12 198 bial- ARTr will be desirable. A tool that may serve this purpose is lelic SNPs. Sixty percent of SNPs in the population had a non- the detection of signatures of balancing selection on loci reference allele frequency of less than 25% (Figure 3). Forty-five demonstrated to act as a background for ARTr emergence. To percent of SNPs were missense variants, 28% of SNPs were in explore this, here we hypothesized that the background pfk13 intergenic regions, 22% of SNPs were synonymous variants, genome may be present before the pfk13 ARTr-conferring and 4% were in intronic regions (Figure 4). The SNPs analyzed variant is selected and that signatures of balancing selection were distributed across 2256 protein-coding genes. on these genomic loci may serve as an early warning signal. The most abundant biological effect of the SNPs analyzed Our analysis has identified loci previously reported as pfk13 were missense variants, variants in intergenic regions, synony- background mutations required for the emergence of ARTr to mous variants, and intron variants in decreasing order. be under balancing selection in P. falciparum clinical samples from Ghana. Our findings show that 2 of these genomic loci required for ARTr to emerge have sufficient standing varia- Pfk13 ARTr genome background loci under tion to allow a soft selective sweep and emergence of ART balancing selection tolerant parasites under ART pressure. Our data also provides We used clonal samples (Fws > 0.95) to screen for signatures of empirical evidence for malaria molecular surveillance in selection in the P. falciparum genome. We found 579 P. falcipa- Ghana continue to use pfk13 as the marker for ARTr despite rum genes with Tajima’s D greater than 1 (Figure 5, the possibility of pfk13 independent emergence of ARTr Supplemental Table S1).26-28 Of these, genomic positions p arasites in Ghana. above the 90th percentile of the Tajima’s D distribution showed Tandoh et al 3 Figure 1. Distribution of multiplicity of infection in Ghana samples of P. falciparum analyzed (n = 617): (A) is a histogram showing the number of samples with within sample F statistic (Fws) on the vertical axis within the range specified on the horizontal axis and (B) is a scatter plot that shows the distribution of the samples with Fws. The red line is at Fws = 0.95, the cutoff point for MOI. Figure 2. P. falciparum population structure analysis for Ghana samples. gene ontology biochemical processes spanning multiple path- genomic background on which ARTr-conferring pfk13 vari- ways (Supplemental Table S2). There were no SNPs in the ants emerged.23,24 Among the 9 genomic loci implicated in this pfk13 gene and none of the reported SNPs that form the back- background phenomenon (Table 1), 2 were found to be under ground genome for ARTr to emerge were identified in this balancing selection in the Ghana sample set collected in 2010 analysis (Supplemental Table S3). (Table 2). One of these genes (phosphatidylinositol 4-kinase) Five hundred seventy-nine P. falciparum genes had Tajima’s stand out in their gene ontology terms (http://plasmodb.org) D > 1. Chromosomes are identified by the alternately black as directly connected to the molecular mechanisms of ARTr and blue coloring, with SNPs plotted as individual points based development.15 on their position in the chromosome. The red dashed line The phosphatidylinositol 4-kinase (PI4K) gene product marks Tajima’s D = 1. shares similar gene ontology terms (phosphatidylinositol meta- Among the genes under balancing selection, we looked for bolic process, kinase activity, and cellular membrane compo- loci previously reported to be part of the required multi-locus nent) with phosphatidylinositol 3-kinase (PI3K), which has 4 Evolutionary Bioinformatics Figure 3. Summary of SNPs characteristics: (A) Frequency distribution of the non-reference allele for each of the biallelic SNPs in the sample of P. falciparum clinical isolates from Ghana (N = 274) and (B) distribution of numbers of protein-coding genes (N = 2256) with each given number of SNPs in the Ghana population sample of P. falciparum clinical isolates. Figure 4. Distribution of SNPs effect across all genomic positions analyzed. been associated with the primary ARTr-conferring variant 20 missense variants, and 2 intron variants. The ubiquitin car- pfk13 C580Y. The gene is located on chromosome 4 and has 6 boxyl-terminal hydrolase 1 locus (UBP1) gene is located on exons that encode for a protein with 5035 amino acid residues. chromosome 3 and has 3 exons that encode for a protein with We found 38 SNPs in this gene of which 16 were synonymous, 3499 amino acid residues. We found 6 SNPs in this gene of Tandoh et al 5 Figure 5. Genome-wide distribution of Tajima’s D values summarizing the site frequency spectra for P. falciparum genes. Table 1. List of genes of interest reported to be associated with the genomic background for artemisinin resistance in P. falciparum. GENE NAME GENE ID SNP Ferredoxin PF3D7_1318100 D193Y Apicoplast ribosomal protein S10 PF3D7_1460900 V127M Multidrug resistance protein 2 PF3D7_1447900 T484I Chloroquine resistance transporter PF3D7_0709000 N326S Phosphatidylinositol 4-kinase PF3D7_0419900 S915G Sec14 domain-containing protein PF3D7_0626400 L498F, N615D Ubiquitin-protein ligase PF3D7_1448400 S57T Ubiquitin carboxyl-terminal hydrolase PF3D7_0104300 R3138H Sentrin-specific protease 2, putative PF3D7_0801700 H423Y which 2 were synonymous and 4 missense variants. The chlo- with ARTr genotypes. Given the absence of confirmed ARTr roquine resistance transporter (CRT) gene is located on chro- in SSA and the complex evolution of ARTr, which includes mosome 7 and has 13 exons that encode for a protein with 424 precursor genomic background, we explored the plausibility of amino acid residues. We found 6 SNPs in this gene of which 3 using molecular early warning signatures of balancing selection were missense variants and 3 were intron variants. One mis- in 9 genes associated with the genomic background on which sense variant was the chloroquine-resistance K76T mutation ARTr conferring variants may emerge.18,23,24 We found signa- with an allele frequency of 0.57%.29 tures of balancing selection in 2 loci reported to contribute to The Sentrin-specific protease 2, putative (SENP2) gene is the genomic background on which definitive ARTr-conferring located on chromosome 8 and has 3 exons that encode for a variants arise (Table 2). Genomic loci under balancing selec- protein with 1784 amino acid residues. We found 2 SNPs in tion provide a standing variation on which a soft selective this gene of which one was a missense variant and the other a sweep can occur, especially in the presence of high transmission synonymous variant (Table 3). rates, which is common in Ghana.30-32 Although we did not find any of the reported multi-loci ARTr predisposing back- Discussion ground SNPs (Supplemental Table S3) and none of the vari- It is important to improve malaria molecular surveillance, par- ants identified at these loci have been reported to be associated ticularly detection and monitoring of P. falciparum parasites with ARTr,33 the increased genetic diversity observed at these 6 Evolutionary Bioinformatics Table 2. Tests of neutrality on 4 genes of interest under balancing selection in Ghana population of Plasmodium falciparum. GENE NAME GENOMIC SITES1 π (×10−5) K (×10−3) T’ D F D HKA (π/K) MK P LENGTH (BP) VALUE Chloroquine resistance 3096 6 64 559.71 2.7* 1.96* 1.04 0.11 .68 transporter Phosphatidylinositol 4-kinase 15 845 38 67 645.02 2.2 * 2.6* 2.18* 0.10 .32 Ubiquitin carboxyl-terminal 10 962 6 17 615.51 1.8 1.5 1.04 0.03 .64 hydrolase Sentrin-specific protease 2, 5579 2 0.07 591.25 1.1 0.9 0.62 11 × 10−3 1 putative 1Number of polymorphic sites; π is nucleotide diversity index; K is nucleotide divergence; T’D is Tajima’s D; F is Fu and Li’s F; D is Fu and Li’s D. *P value < .05. Table 3. Summary of number of SNPs and effect at loci of interest. GENE NAME ANNOTATION EFFECT MISSENSE VARIANT SYNONYMOUS VARIANT INTRON VARIANT Chloroquine resistance transporter (crt) 3 0 3 Phosphatidylinositol 4-kinase 20 16 2 Ubiquitin carboxyl-terminal hydrolase (ubp1) 4 2 0 Sentrin-specific protease 2, putative (senp2) 1 1 0 loci as of 2010 (Table 2) suggests that the biodiversity required of balancing selection in loci, that predispose to the emergence to establish a pfk13 background genome may be primed in of pfk13-dependent ARTr, highlights the need for molecular Ghana. Since balanced alleles may be randomly lost over long surveillance of parasite populations in SSA to generate data for periods, whether these variants will be maintained, increase in interrogating the risk of ARTr emerging in SSA whether de frequency or be lost over time remains unclear. Against the novo or imported by gene flow from SEA. Regardless of how backdrop of the absence of ARTr in Ghana despite this back- ARTr arises in SSA, our analysis suggests that the requisite ground, the delay may be attributed to the use of ACTs or adaptation for ARTr in Ghana has occurred and conditions are breakdown of standing variation by rampant recombination ripe for a soft selective sweep either via importation or inde- due to the high transmission intensity in Ghana. pendent emergence occur sooner than later. It is in the light of PI4K may have an impact on the levels of phosphatidylino- this that we recommend prospective genomic surveillance for sitol-3-phosphate (PI3P) which has been associated with ARTr signatures of balancing selection in ARTr precursor background in vitro.16 PI4K may also contribute to ARTr development by genes as they accumulate more variation and could serve as a modulating the activity of members of the phosphoinositol molecular early warning system for ARTr emergence. pathways in triggering the unfolded protein response to ART However, it is worth noting some limitations of this study. A stress via the inositol requiring enzyme 1α/β (IRE1).24,34 UBP1 key weakness of this study is the absence of statistical power is a putative protein predicted to function in a ubiquitin- evaluations to ascertain whether the hard filtering pipeline and dependent protein catabolic process (https://plasmodb.org/ the resulting number of SNPs were sufficient to interrogate the plasmo/app/record/gene/PF3D7_0104300#category:function- central hypothesis for this analysis. It may be argued that the analysis). The ubiquitination/proteasome pathway has been size of the relatively small variant set was not powered enough associated with ARTr16,20,35 via the cellular stress response to to capture the veracity or otherwise of our hypothesis. In addi- ART assault. Thus our finding of signatures of balancing selec- tion, the evidence from Africa on pfk13 ARTr suggests that dif- tion in UBP1 gene suggests an association with these pathways ferent variants from those seen in SEA may emerge.36 However, related to the molecular mechanisms of ARTr. Our data sup- we are confident, the absence of any known ARTr–conferring port the point conveyed by Cerqueira et al24 that the complex variants in the samples analyzed is a strong indication that the multi-loci nature of ARTr makes it less likely to occur de novo background genome may be established before the emergence in SSA. This is because of the high rate of outcrossing and of the definitive ARTr-conferring variant(s). Although tests of recombination that may derail the stability of the required com- departures from neutrality are best suited for evaluating devia- plex genomic background. However, the presence of signatures tions of nucleotide diversity from expectations under a neutral Tandoh et al 7 model of population evolution, they may also suggest a demo- discovery. After mapping each sample to the 3D7 reference graphically driven change such as population bottleneck (eg, genome, PCR, and sequencing duplicates were marked using drug pressure).37,38 What remains unknown and merits further Picard SortSam and the MarkDuplicates function of GATKs investigation is the time scale between these events. Given that pipeline. Realignment around InDels (insertion and deletion the effective population size is greater in SSA compared to sequence variants) was done using GATK’s RealignerTargetCreator/ SEA, the high recombination rates in SSA may prolong the IndelRealigner. Then base quality score recalibration was done evolutionary time scale to ARTr emergence. Therefore, the with GATK’s BaseRecalibrator. The following P. falciparum departures from neutrality from different complementary genetic crosses 1.0 databases were used for the base quality recali- approaches observed in this study, though indicative of balanc- bration: ftp://ngs.sanger.ac.uk/production/malaria/pf-crosses/1.0/ ing selection in some of these genes, are restricted by this fact. 7g8_gb4.combined.final.vcf.gz; ftp://ngs.sanger.ac.uk/produc- However, the evidence of balancing selection supported by tion/malaria/pf-crosses/1.0/hb3_dd2.combined.final.vcf.gz; and Tajima’s D, Fu and Li’s D, Fu and Li’s F is quite strong for 2 of ftp://ngs.sanger.ac.uk/production/malaria/pf-crosses/1.0/3d7_ our genes of interest (Table 2). The lack of concordance between hb3.combined.final.vcf.gz. Tajima’s D and HKA in our analysis is consistent with previous Variants calling was done using GATK’s CombineGVCFs findings for genes widely reported to be under balancing selec- and GenotypeGVCFs. This was followed by variant quality tion in African parasite populations (Supplemental Table score recalibration using GATKs VariantRecalibrator/ S5).26,27 This may be attributed to high recombination rates, ApplyRecalibration functions. The same set of P. falciparum changing effective population size, and the presence of muta- crosses 1.0 databases above were used. Finally, variants annota- tions that may not be selectively neutral.39,40 tion was done using SnpEff (v4.1) with a database created from In general, we have detected loci under balancing selection, ftp://ftp.sanger.ac.uk/pub/project/pathogens/gff3/2015- in Ghanaian P. falciparum isolates, which intersect with the 08/Pfalciparum.gff3.gz; annotate variants by regions identified genome background required for ARTr to emerge in SEA. in Pf crosses 1.0 data (ftp://ngs.sanger.ac.uk/production/ Although Ghana and SSA are yet to report solid evidence of malaria/pf-crosses/1.0/regions-20130225.onebased.txt). ARTr, our analysis shows that molecular surveillance efforts Variants were filtered out where VQSLOD ⩽ 0 or RegionType may be improved by scanning the background of such loci ! = “Core” to generate a final VCF file available at the associated with pfk13 ARTr. This approach can signal the MalariaGEN Pf3k data repository. emergence of the ARTr phenotype but may require optimiza- tion to serve as the missing tool needed in the current land- Hard filtering pipeline of Ghana 2009/2010 scape of ARTr surveillance of P. falciparum malaria parasites in clinical isolates variants Africa. Thus, we have suggested the potential utility of survey- ing the genome of parasites for balancing selection in ARTr Bcftools was used in a custom bash script to extract the Ghana precursor background genes as a potential molecular early 2009/2010 samples and concatenate all the variant information warning system for ARTr surveillance efforts. into a single VCF without structural variants such as InDels. Quality hard filtering of single nucleotide polymorphisms Methods (SNPs) was done to filter out SNPs with the following fea- Sequencing and genotyping tures: SNPs with very poor sequencing depth (<10 reads in 1 sample) and coverage across samples; SNPs that had more than DNA sequences were retrieved from the Malaria Genomic 2 alleles; and SNPs located in the highly polymorphic var, Epidemiology Network (MalariaGEN) Pf3k data repository rif in, and stevor regions.22 Further snpEFF annotation was (https://www.malariagen.net/projects/parasite/pf3k). Paired- done using the global barcode SNPs from a genome-wide end sequencing was done using the Illumina HiSeq platform41 study of P. falciparum population divergence by natural selec- by Sanger sequencing core. Read lengths ranged between 200 tion.42 The final VCF file was filtered again using the default and 300 bp and approximately 1 Gbp of data per sample was parameters of the GATK SelectVariants function. Samples produced as described.18 Quality control, variant discovery, and with missing genotype calls of >5% were filtered out. VCFtools sample genotyping were done using custom pipelines, details of (0.1.16) was used to determine the genome-wide average which may be reviewed in Manske et  al.22 Briefly, short nucleotide diversity. sequence reads from P. falciparum isolates that passed stringent quality filters were aligned against the P. falciparum 3D7 refer- ence sequence V3 (ftp://ftp.sanger.ac.uk/pub/pathogens/ Evaluation of multiplicity of infection and population structure Plasmodium/falciparum/3D7/3D7.latest_version/version3/), using the bwa program (Li and Durbin, 2009) (http://bio-bwa. We used the within-sample F statistic (Fws) to determine sourceforge.net/) as previously described.23 MOI in the Ghana cohort of clinical isolates of P. falcipa- Genotyping was done with the Genome Analysis Tool Kit rum.23,43 The moimix package (https://github.com/bahlolab/ (GATK 3.3-46) best practices pipeline for de novo variant moimix) was used to determine the Fws in R version 4.0. 8 Evolutionary Bioinformatics Principal component analysis (PCA) was done using vcfdo Author Contributions (https://github.com/IDEELResearch/vcfdo) function “pca.” KZT conceptualized the research question, did the computa- tional data analysis, and wrote the manuscript. LA, MW, NBQ, Tests for departures from neutrality GA, and NOD reviewed and supervised the work. All authors read and approved the final manuscript. We used the R package PopGenome for genome scans and determination of the allele frequency distribution selection ORCID iD metric Tajima’s D at each SNP site.44 Subsequently, we used Kwesi Z Tandoh https://orcid.org/0000-0002-1628-2845 custom bash scripts to extract the DNA sequences of the 9 genes of interest (Supplemental Table S1). DnaSP6.0 was used to determine departures from neutrality, for the 9 genes of Supplemental Material interest, using allele frequency distribution measures Tajima’s Supplemental material for this article is available online. D, Fu and Li’s F, and Fu and Li’s D, and comparisons of varia- tions within and between species with Hudson–Kreitman– Aguade (HKA) and McDonald–Kreitman (MK) ratios.45 RefeRenCeS Tajima’s D measures the difference between the average pair- 1. Organization WH. World Malaria Report 2019. 2019. wise nucleotide diversity between sequences ( ) and Watterson’s 2. Ministry of Health G. NMCP: Annual Report of the National Malaria Control π Programme of Ghana, 2017. 2017. population nucleotide diversity (θ).27,46 This will make the 3. Koram KA, Abuaku B, Duah N, Quashie N. Comparative efficacy of antima- expected value of Tajima’s D for populations under neutrality to larial drugs including ACTs in the treatment of uncomplicated malaria among children under 5 years in Ghana. Acta Trop. 2005;95:194-203. be zero. When conditions of neutrality are violated, Tajima’s D 4. Quashie NB, Duah NO, Abuaku B, Koram KA. The in-vitro susceptibilities of will capture a skew in the allele frequency distribution. These Ghanaian Plasmodium falciparum to antimalarial drugs. Ann Trop Med Parasi-tol. 2007;101:391-398. significant deviations from zero can either be positive or nega- 5. Organization WH. Guidelines for the Treatment of Malaria. 2006. tive. Positive values of Tajima’s D indicate an excess of interme- 6. Quashie NB, Ranford-Cartwright LC, de Koning HP. Uptake of purines in Plasmodium falciparum-infected human erythrocytes is mostly mediated by the diate frequency alleles. This can be caused by population human equilibrative nucleoside transporter and the human facilitative nucleo- bottlenecks, population structure, and/or balancing selection.47 base transporter. Malar J. 2010;9:36. Fu and Li’s F statistic is determined by the difference 7. Amambua-Ngwa A, Amenga-Etego L, Kamau E, et al. Major subpopulations of Plasmodium falciparum in sub-Saharan Africa. Science. 2019;365:813-816. between the number of single nucleotide variants observed 8. Conrad MD, Rosenthal PJ. Antimalarial drug resistance in Africa: the calm only once in a sample and the total number expected under before the storm? Lancet Infect Dis. 2019;19:e338-e351. 9. Taylor SM, Parobek CM, DeConti DK, et al. Absence of putative artemisinin neutrality given the number of segregating sites and Watterson’s resistance mutations among Plasmodium falciparum in Sub-Saharan Africa: a estimate of nucleotide diversity (θ); Fu and Li’s D is the differ- molecular epidemiologic study. J Infect Dis. 2015;211:680-688.1 0. Dondorp AM, Nosten F, Yi P, et al. Artemisinin resistance in Plasmodium fal- ence between the number of derived nucleotide variants ciparum malaria. N Engl J Med. 2009;361:455-467. observed only once in a sample with the total number of 11. Noedl H, Se Y, Schaecher K, Smith BL, Socheat D, Fukuda MM. Evidence of artemisinin-resistant malaria in western Cambodia. N Engl J Med. derived nucleotide variants.47,48 HKA ratio was used to deter- 2008;359:2619-2620. mine P. falciparum genes with high ratios of polymorphism(p) 1 2. Ariey F, Witkowski B, Amaratunga C, et al. A molecular marker of artemisinin- over divergence(K) from the closely related chimpanzee para- resistant Plasmodium falciparum malaria. Nature. 2014;505:50. 13. Nyunt MH, Wang B, Aye KM, et al. Molecular surveillance of artemisinin site Plasmodium reichenowi.40,49 The MK test is based on the resistance falciparum malaria among migrant goldmine workers in Myanmar. ratios of non-synonymous (NS) and synonymous (S) nucleo- Malar J. 2017;16:97. 14. Mvumbi DM, Bobanga TL, Kayembe JN, et al. Molecular surveillance of Plas- tide divergence and polymorphism within species and a closely modium falciparum resistance to artemisinin-based combination therapies in the related species, using Fisher’s exact test on the 2 × 2 contin- Democratic Republic of Congo. PLoS One. 2017;12:e0179142. 15. Suresh N, Haldar K. Mechanisms of artemisinin resistance in Plasmodium fal- gency table.50 ciparum malaria. Curr Opin Pharmacol. 2018;42:46-54. Finally, we used the PlasmoDB Gene Ontology (GO) anal- 16. Mbengue A, Bhattacharjee S, Pandharkar T, et al. A molecular mechanism of artemisinin resistance in Plasmodium falciparum malaria. Nature. ysis tool (http://plasmodb.org) to identify the biological pro- 2015;520:683-687. cesses of genes significantly deviating from neutrality. 17. Bhattacharjee S, Coppens I, Mbengue A, et al. Remodeling of the malaria para- site and host human red cell by vesicle amplification that induces artemisinin resistance. Blood. 2018;131:1234-1247. Acknowledgements 18. Project MPfC. Genomic epidemiology of artemisinin resistant malaria. Elife. This publication uses data generated by the Pf3k project (www. 2016;5:e08714. 19. Mok S, Ashley EA, Ferreira PE, et al. Population transcriptomics of human malariagen.net/pf3k). We thank the MalariaGEN Consortium malaria parasites reveals the mechanism of artemisinin resistance. Science. for allowing the use of this data. KZT is also indebted to C.M. 2015;347:431-435. Morang’a and V. Appiah of the bioinformatics laboratory at 20. Rocamora F, Zhu L, Liong KY, et al. Oxidative stress and protein damage responses mediate artemisinin resistance in malaria parasites. PLoS Pathogens. WACCBIP, Department of Biochemistry, Cell and Molecular 2018;14:e1006930. Biology for their invaluable help in troubleshooting some parts 21. Birnbaum J, Scharf S, Schmidt S, et al. A Kelch13-defined endocytosis pathway mediates artemisinin resistance in malaria parasites. Science. 2020;367:51-59. of the computational analysis pipeline. We acknowledge the 2 2. Manske M, Miotto O, Campino S, et al. Analysis of Plasmodium falciparum University of Ghana for providing the high-performance com- diversity in natural infections by deep sequencing. Nature. 2012;487:375-379. 23. Miotto O, Amato R, Ashley EA, et al. Genetic architecture of artemisinin- puting resources (the ZUPUTO) used for this work. resistant Plasmodium falciparum. Nat Genet. 2015;47:226. Tandoh et al 9 2 4. Cerqueira GC, Cheeseman IH, Schaffner SF, et al. Longitudinal genomic sur- 3 6. Uwimana A, Legrand E, Stokes BH, et al. Emergence and clonal expansion of in veillance of Plasmodium falciparum malaria parasites reveals complex genomic vitro artemisinin-resistant Plasmodium falciparum kelch13 R561H mutant par- architecture of emerging artemisinin resistance. Genome Biol. 2017;18:78. asites in Rwanda. Nat Med. 2020;26:1602-1608. 2 5. Verity R, Aydemir O, Brazeau NF, et al. The impact of antimalarial resistance on 37. Kloch A, Wenzel MA, Laetsch DR, et al. Signatures of balancing selection in the genetic structure of Plasmodium falciparum in the DRC. Nat Commun. toll-like receptor (TLRs) genes – novel insights from a free-living rodent. Sci 2020;11:2107. Rep. 2018;8:8361. 26. Ochola LI, Tetteh KK, Stewart LB, Riitho V, Marsh K, Conway DJ. Allele fre- 38. Ramírez-Soriano A, Ramos-Onsins SE, Rozas J, Calafell F, Navarro A. Statisti- quency-based and polymorphism-versus-divergence indices of balancing selec- cal power analysis of neutrality tests under demographic expansions, contrac- tion in a new filtered set of polymorphic genes in Plasmodium falciparum. Mol tions and bottlenecks with recombination. Genetics. 2008;179:555-567. Biol Evol. 2010;27:2344-2351. 39. Hudson RR. Gene genealogies and the coalescent process. Oxford Surv Evol Biol. 27. Ochola-Oyier LI, Wamae K, Omedo I, et al. Few Plasmodium falciparum mer- 1990;7:44. ozoite ligand and erythrocyte receptor pairs show evidence of balancing selec- 40. Hudson RR, Kreitman M, Aguadé M. A test of neutral molecular evolution tion. Infect Genet Evol. 2019;69:235-245. based on nucleotide data. Genetics. 1987;116:153-159. 2 8. Amambua-Ngwa A, Tetteh KK, Manske M, et al. Population genomic scan for 41. Bentley DR, Balasubramanian S, Swerdlow HP, et al. Accurate whole human candidate signatures of balancing selection to guide antigen characterization in genome sequencing using reversible terminator chemistry. Nature. malaria parasites. PLoS Genet. 2012;8:e1002992. 2008;456:53-59. 29. Wellems TE, Plowe CV. Chloroquine-resistant malaria. J Infect Dis. 42. Neafsey DE, Schaffner SF, Volkman SK, et al. Genome-wide SNP genotyping 2001;184:770-776. highlights the role of natural selection in Plasmodium falciparum population 30. Barrett RD, Schluter D. Adaptation from standing genetic variation. Trends Ecol divergence. Genome Biol. 2008;9:R171. Evol. 2008;23:38-44. 43. Auburn S, Campino S, Miotto O, et al. Characterization of within-host Plasmo- 31. Botwe AK, Asante KP, Adjei G, et al. Dynamics in multiplicity of Plasmodium dium falciparum diversity using next-generation sequence data. PLoS One. falciparum infection among children with asymptomatic malaria in central 2012;7:e32891. Ghana. BMC Genet. 2017;18:67. 4 4. Pfeifer B, Wittelsbürger U, Ramos-Onsins SE, Lercher MJ. PopGenome: an 32. Lamptey H, Ofori MF, Kusi KA, et al. The prevalence of submicroscopic Plas- efficient Swiss army knife for population genomic analyses in R. Mol Biol Evol. modium falciparum gametocyte carriage and multiplicity of infection in chil- 2014;31:1929-1936. dren, pregnant women and adults in a low malaria transmission area in Southern 45. Rozas J, Ferrer-Mata A, Sánchez-DelBarrio JC, et al. DnaSP 6: DNA sequence Ghana. Malar J. 2018;17:331. polymorphism analysis of large data sets. Mol Biol Evol. 2017;34:3299-3302. 3 3. WWARN K13 Genotype-Phenotype Study Group. Association of mutations in 46. Tajima F. Statistical method for testing the neutral mutation hypothesis by DNA the Plasmodium falciparum Kelch13 gene (Pf3D7_1343700) with parasite clear- polymorphism. Genetics. 1989;123:585-595. ance rates after artemisinin-based treatments – a WWARN individual patient 47. Casillas S, Barbadilla A. Molecular population genetics. Genetics. data meta-analysis. BMC Med. 2019;17:1. 2017;205:1003-1035. 3 4. Adams CJ, Kopp MC, Larburu N, Nowak PR, Ali MMU. Structure and molec- 48. Fu YX, Li WH. Statistical tests of neutrality of mutations. Genetics. ular mechanism of ER stress signaling by the unfolded protein response signal 1993;133:693-709. activator IRE1. Front Mol Biosci. 2019;6:11. 49. Innan H. Modified Hudson-Kreitman-Aguade test and two-dimensional evalu- 35. Dogovski C, Xie SC, Burgio G, et al. Targeting the cell stress response of Plas- ation of neutrality tests. Genetics. 2006;173:1725-1733. modium falciparum to overcome artemisinin resistance. PLoS Biol. 2015;13: 50. McDonald JH, Kreitman M. Adaptive protein evolution at the Adh locus in e1002132. Drosophila. Nature. 1991;351:652-654.