De novo mutation rates at the single-mutation resolution in a human HBB gene region associated with adaptation and genetic disease Daniel Melamed,1,2 Yuval Nov,3 Assaf Malik,4 Michael B. Yakass,5,6 Evgeni Bolotin,1,2 Revital Shemer,7 Edem K. Hiadzi,6 Karl L. Skorecki,8 and Adi Livnat1,2 1Department of Evolutionary and Environmental Biology, University of Haifa, Haifa 3498838, Israel; 2Institute of Evolution, University of Haifa, Haifa 3498838, Israel; 3Department of Statistics, University of Haifa, Haifa 3498838, Israel; 4Bioinformatics Unit, Faculty of Natural Sciences, University of Haifa, Haifa 3498838, Israel; 5West African Centre for Cell Biology of Infectious Pathogens (WACCBIP), Department of Biochemistry, Cell and Molecular Biology, University of Ghana, Legon-Accra 00233, Ghana; 6Assisted Conception Unit, Lister Hospital and Fertility Centre, Accra CT966, Ghana; 7The Ruth and Bruce Rappaport Faculty of Medicine and Research Institute, Technion—Israel Institute of Technology, Haifa 3525433, Israel; 8The Azrieli Faculty of Medicine, Bar-Ilan University, Safed 1311502, Israel Although it is known that the mutation rate varies across the genome, previous estimates were based on averaging across various numbers of positions. Here, we describe a method tomeasure the origination rates of target mutations at target base positions and apply it to a 6-bp region in the human hemoglobin subunit beta (HBB) gene and to the identical, paralogous hemoglobin subunit delta (HBD) region in sperm cells from both African and European donors. The HBB region of interest (ROI) includes the site of the hemoglobin S (HbS) mutation, which protects against malaria, is common in Africa, and has served as a classic example of adaptation by randommutation and natural selection. We found a significant correspondence between de novomutation rates and past observations of alleles in carriers, showing that mutation rates vary substantially in a mutation-specific manner that contributes to the site frequency spectrum. We also found that the overall point mutation rate is significantly higher in Africans than in Europeans in the HBB region studied. Finally, the rate of the 20A→Tmutation, called the “HbS mutation” when it appears in HBB, is significantly higher than expected from the genome-wide average for this mutation type. Nine instances were observed in the African HBB ROI, where it is of adaptive significance, representing at least three independent originations; no instances were observed elsewhere. Further studies will be needed to examine mu- tation rates at the single-mutation resolution across these and other loci and organisms and to uncover the molecular mech- anisms responsible. [Supplemental material is available for this article.] It is widely known that mutation rates vary across the genome at multiple scales (Hodgkinson and Eyre-Walker 2011; Rahbari et al. 2016; Carlson et al. 2018) and are affected bymultiple factors, from themutation type (Gojobori et al. 1982; Bulmer 1986), to the local genetic context (Gojobori et al. 1982; Bulmer 1986; Blake et al. 1992; Hwang and Green 2004; Rahbari et al. 2016; Carlson et al. 2018), to the general location in the genome (Wolfe et al. 1989; Matassi et al. 1999; Lercher et al. 2001; Ellegren et al. 2003). Although this knowledge is highly advanced now com- pared with what was known a mere decade ago (Campbell et al. 2012; Michaelson et al. 2012; Francioli et al. 2015; Rahbari et al. 2016; Carlson et al. 2018), it could be enhanced further. In partic- ular, rate measurements to date all have been based on averages of various kinds, such as an average across the genome (Nachman and Crowell 2000; Rahbari et al. 2016), or across the instances of any particular motif (Hwang and Green 2004; Carlson et al. 2018), or in certain cases, across the entire stretch of a gene (Haldane 1949; Vogel and Motulsky 1997; Kondrashov 2003). In contrast, technological limitations have precluded measuring mu- tation rates at particular base positions and of particularmutations at such positions. However, suchhigh-resolution knowledge of the mutation rate variation would bear on multiple open questions in genetics and evolution—from the relative importance of mutation rate variation to the site frequency spectrum (SFS) (Harpak et al. 2016; Lek et al. 2016; Mathieson and Reich 2017), to its impor- tance for adaptive evolution and parallelism (Inoue et al. 2001; Crow et al. 2009; Dumas et al. 2012; Losos 2017; Kratochwil et al. 2019; Kratochwil and Meyer 2019; Lind 2019; Xie et al. 2019), to its contribution to recurrent genetic disease and cancer (Lupski 1998; McClellan and King 2010; Veltman and Brunner 2012; Shendure and Akey 2015). The most precise way of measuring mutation rates, free of bi- ases attributable to past natural selection or random genetic drift events, is offered by de novomutations—mutations that appeared for the first time in their carrier (Goldmann et al. 2016; Rahbari et al. 2016). These mutations are usually detected by studies com- paring the genomes of children to those of their parents, also known as “trio studies” (Roach et al. 2010; Conrad et al. 2011). Corresponding author: alivnat@univ.haifa.ac.il Article published online before print. Article, supplemental material, and publi- cation date are at https://www.genome.org/cgi/doi/10.1101/gr.276103.121. Freely available online through the Genome Research Open Access option. © 2022 Melamed et al. This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/. Method 488 Genome Research 32:488–498 Published by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/22; www.genome.org www.genome.org Cold Spring Harbor Laboratory Press on April 7, 2022 - Published by genome.cshlp.orgDownloaded from mailto:alivnat@univ.haifa.ac.il https://www.genome.org/cgi/doi/10.1101/gr.276103.121 https://www.genome.org/cgi/doi/10.1101/gr.276103.121 http://genome.cshlp.org/site/misc/terms.xhtml http://creativecommons.org/licenses/by/4.0/ http://creativecommons.org/licenses/by/4.0/ http://genome.cshlp.org/site/misc/terms.xhtml http://genome.cshlp.org/ http://www.cshlpress.com However, because each individual carries only a small number (e.g., several dozen in humans) of de novo mutations scattered across the genome, the chance of encountering any particular tar- get mutation of interest is miniscule, rendering it impractical to measure rates of target mutations using such studies. To overcome this barrier, we have developed a method that enables identifying and counting, with high accuracy, ultrarare ge- netic variants of choice in extremely narrow regions of interest (ROIs) within large populations of cells, such as a single target mu- tant in 100 million genomes. Because this method has both an er- ror rate lower than the human mutation rate and sufficient yield for the purpose, it enables measuring the frequencies of target mu- tations of choice in human sperm samples by counting their de novo instances at a single-digit resolution. For variants that are not expected to affect sperm fertility and viability (as in the case below), this frequency is the evolutionarily relevant mutation rate in males. Note that aside from this evolutionary application, ultra-accurate methods of mutation-detection are sought after for early detection of cancer, noninvasive prenatal testing, early iden- tification of virus within host, and more (Salk et al. 2018). As a first target for this method, we chose two sites: a 6-bp re- gion spanning three codons within the human hemoglobin sub- unit beta (HBB) gene that is of great importance for adaptation and hematologic disease, and the identical, paralogous region within the hemoglobin subunit delta (HBD) gene. The former re- gion includes, among others, the site of the hemoglobin S (HbS) mutation. The most iconic balanced polymorphism mutation (Pauling et al. 1949; Allison 1954; Ingram 1957; Cavalli-Sforza and Feldman 2003; Feng et al. 2004; Hartl and Clark 2007), the HbS mutation is an A to T transversion (GAG→GTG, Glu→Val) in codon 6 of HBB causing sickle-cell anemia in homozygotes (Pauling et al. 1949) and providing substantial protection against severe malaria in heterozygotes (Allison 1954; Flint et al. 1998; Kwiatkowski 2005; Piel et al. 2010). Malaria, in turn, has been a leading cause of human morbidity and mortality, often causing more than a million deaths per year in the recent past, with Africa bearing the brunt of the disease burden (Carter and Mendis 2002), and thus has been possibly the strongest known agent of selection in humans in recent history (Kwiatkowski 2005). Besides the HbS mutation, many other mutations, both point mutations and indels, are also known at this site, many of which are involved in hematologic illness (Hardison et al. 2002; Hardison and Miller 2002). In contrast to HBB, mutations in HBD have a more limited effect and are not thought to confer re- sistance to malaria, because the HBD’s lower expression levels make it account for <3% of the circulating red blood cell hemoglo- bin in adults (Steinberg and Adams 1991). Although the popula- tion prevalence of the HBB mutations, whether beneficial or detrimental, is normally attributed to natural selection, so far it has not been possible to examine to what degree, if at all, muta- tional phenomenamay also be relevant to their prevalence. To ad- dress this gap, we sought to characterize the rates of mutations, including the HbS mutation, in the HBB and HBD ROIs in sperm samples of both African and European donors. Results To substantially reduce the false positive rate resulting from PCR amplification or high-throughput sequencing errors, following ex- traction of the DNA from the sperm of the donors, we first remove the majority of wild-type (WT) ROI molecules from each sample. Specifically for the target sites, we use the restriction enzyme (RE) Bsu36I, which cleaves the WT sequence CCTGAGG at positions 16–22 of HBB and the paralogous positions of HBD while leaving the HbS mutant and other mutants in these positions intact. Besides substantially reducing the false positive rate, thisWTdeple- tion has the additional benefit of reducing the sequencing costs by the same factor, because it removes the majority of fragments whose sequences are known to be WT (Fig. 1; Supplemental Text; Supplemental Figs. S1–S4). Importantly for the mutation rate calculation, we keep track of the number ofWTmolecules removed by accurately calculating theprotectedmutants’ enrichment factoronaper sample basis. For this purpose, we generate twomixtures, each of which includes, in addition to the DNA studied, known amounts ofmockDNA that is resistant to the RE digestion (Supplemental Text S2; Supplemental Fig. S2).Next,weapply the sameprotocol to the twomixtures,with the exception that the RE digestion step is applied to only one of them (Supplemental Text S2; Supplemental Fig. S2). The ratio of the ratios of sensitive to resistant molecules identified for the two mixtures after treatment at the sequence analysis step provides the enrichment factor of the protected mutants (Supplemental Text S2; Supplemental Fig. S2). This enrichment factor, multiplied by the number of WT molecules called, with the addition of the small number of mutants called, provides the number of cells ana- lyzed (Supplemental Text S2; Supplemental Fig. S2). We set up the system in such manner that the calculation of the enrichment factor depends onlyonquantities that are precisely known, includ- ing volume measurements (Supplemental Text S2; Supplemental Fig. S2) and numbers of WT and mutant molecules called during the barcode-based sequence analysis stage as described below. Following this mutation enrichment step, we attach unique barcodes to the DNA fragments to reduce error by consensus se- quencing of copies originating from the same original fragment. For this purpose, we build on and improve themaximumdepth se- quencing method (MDS) (Jee et al. 2016), which allows one to fo- cus on a narrow region of interest (ROI) and whose key idea is to attach the barcodes directly to a cleaved end of one of the two strands of each original target DNA fragment via a DNA polymer- ase–assisted extension reaction, as opposed to including the barcode only in the first copy of the DNA by extending the target- specific primer that carries it. In this manner, errors that occur during the first critical copying step are also detected via consen- sus sequencing of reads sharing the same barcode (Fig. 1; Supplemental Text; Supplemental Fig. S1; Jee et al. 2016). To all of the above, we add multiple innovations that increase sequenc- ing accuracy, handle the large amounts of genomic DNA required, and enable accuratemeasurement of the Bsu36I enrichment factor per sample as needed for the mutation rate calculation (Supplemental Figs. S1–S5). We refer to this whole method as mu- tation enrichment followed by upscaled maximum depth se- quencing (MEMDS) (for a complete protocol, see Supplemental Text S1–S9 and Methods). Finally, following sequence analysis (Supplemental Figs. S6–S10; Supplemental Table S2) the number of appearances of any mutation that confers resistance to the restriction enzyme is counted and divided by the calculated number of cells analyzed, providing the evolutionarily relevant de novo origination rate for each specificmutation inmales per donor and per group of donors (Supplemental Figs. S11–S13; Supplemental Table S3). Following previous literature, we ignore G→T and C→Tmutations in the bar- coded strand (C→A and G→A in the sequenced strand) because they are thought to reflect not lasting mutations but the experi- mental disruption of an ongoing in vivo process of base damage Rates of target de novo mutations Genome Research 489 www.genome.org Cold Spring Harbor Laboratory Press on April 7, 2022 - Published by genome.cshlp.orgDownloaded from http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/ http://www.cshlpress.com and repair aswell as in vitromutations at- tributed to guanine oxidation and cyto- sine deamination (Supplemental Text S8; Supplemental Figs. S12–S14; Arbei- thuber et al. 2016; Jee et al. 2016). In addi- tion, we exclude C→A, the complement of G→T, owing to its association with the latter and its frequent appearance in the data (Supplemental Text S8; Supple- mental Fig. S12). Following normal loss of material of ∼65%, true positives of non G→T, C→T, and C→A mutations are identifiedwith a false positive rate (er- ror rate) <2.5×10−9 per base (Fig. 2). Overall, MEMDS surpasses recent cut- ting-edge methods in both accuracy (Fig. 2A) and yield (Fig. 2B; see also Sup- plemental Fig. S11). With the help of this method, we examined a total of more than half a bil- lion gene fragments individually taken from sperm of 12 donors. Because one of the samples was a mixture from two African donors with a total number of cells similar to the other African samples, we consider it here as a single sample of mixed African origins, bringing the total to 11 samples, seven from African and four from European donors (Supplemen- tal Table S1). The numbers of cells scanned and de novo mutations ob- served per person are shown in Table 1. Average per ROI mutation rates The average per base point mutation rates in the HBB and HBD ROIs are 3.3 × 10−8 and 2.79×10−8, respectively, signif- icantly higher by ∼2.6-fold (P< 2×10−8, 95% CI 2.4 ×10−8–4.4 ×10−8) and ∼2.2- fold (P<6.7 ×10−5, 95% CI 1.9 ×10−8– 4 ×10−8, two-sided binomial exact test) than 1.25×10−8, which we use as an esti- mate of the genome-wide per base per generation point mutation rate (Supple- mental Text S10). The average indel rates in these ROIs were 1.1 ×10−8 and 4.3 ×10−9, respectively, significantly higher by approximately ninefold (P< 4.3 ×10−25, 95% CI 8×10−9–1.5 ×10−8) and ∼3.4-fold (P<1.8 ×10−4, 95% CI 2.3 ×10−9–7.3 ×10−9; two-sided binomi- al exact test) than the expected 1/10 of the point mutation rate (Supplemental Text S10). The average point mutation rate of the HBB ROI is not significantly higher than that of the HBD ROI (P= 0.49, two-sided Fisher’s exact test), and the average indel rate of the former is sig- nificantly higher by ∼2.6-fold than that of the latter (P=0.0015, OR 95% CI 1.42–5.01; two-sided Fisher’s exact test). Figure 1. Experiment overview. Sperm samples are obtained fromworld regionswithhighor lowmalaria infection burden (malaria impactmap adjusted from the CDCmap) (CDCDivision of Parasitic Diseases and Malaria 2019). Whole-genome DNA is extracted and an amount equivalent to 60–80 million sperm cells per donor is subjected to Bsu36I digestion. Bsu36I cleaves the DNA at multiple sites, including the HBB and HBD ROIs, which carry a specific recognition sequence. The HbS mutation blocks Bsu36I digestion and is thus enriched over the wild-type (WT). A primary barcode is added directly to each antisense DNA strand that carries theHBB orHBD ROI via a DNA polymerase–assisted fill-in reaction. Because each barcode consists of a random sequence of nucleotides, each of the numerous target fragments has its own unique barcode, illustrated by a unique color on the left end of the representation of each barcoded fragment. Multiple single-strand copies are each generated directly from each uniquely barcoded target fragment by linear amplification. A secondary barcode composed of a random sequence of nucleotides is added to the other end of each of these copies by a single primer extension reaction, illustrated by a unique color on the right end of each barcoded fragment. Thus, only full-length fragments (i.e., mutant or WT ROI se- quences that evaded Bsu36I digestion) carry both the primary and the secondary barcodes and can be am- plified by PCR for high-throughput sequencing. At the sequence analysis step, sequencing reads representing the PCR products of the linearly amplified copies are grouped together into families (see box- es), where in each family, reads share the same primary barcode sequence. Sporadic sequencing errors or DNA-polymerase errors generated during linear or subsequent amplification steps are unlikely to be repeat- ed inmultiple copies andare removed.Denovomutations, suchas theHbSmutation, are easily identifiedby their appearance inmultiple reads fromdistinct linear-amplificationevents. Foracompletedescriptionof the library preparation protocol, which includes additional steps, see Supplemental Figures S1–S3. Melamed et al. 490 Genome Research www.genome.org Cold Spring Harbor Laboratory Press on April 7, 2022 - Published by genome.cshlp.orgDownloaded from http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/ http://www.cshlpress.com Basic characteristics of mutation rate variation The variance in the rates of de novo point mutations is higher than expected from the genome-wide average (GWA) rates of thesemuta- tions (e.g.,Harris 2015;Harris and Pritchard 2017), and their relative rates are different than expected from the GWA rates (P<10−6 in an omnibus multinomial test, adjusted for the excluded mutations, compared to the rates of Rahbari et al. 2016), even when adjusting the latter for the 3-mer, 5-mer, and 7-mer nucleotide contexts (P< 10−5 in all cases, compared to the rates of Carlson et al. 2018). The overall de novo rates of the six observed deletion types are high- ly nonuniform (P<10−6, multisample proportion test). Correspondence between de novo rates and observations of alleles in carriers The HbS and Hb-Leiden mutations both have been notably ob- served onmultiple different genetic backgrounds in human popu- lations, the former particularly in Africans (Flint et al. 1998; Hardison et al. 2002; Hardison and Miller 2002). Here, they are the point mutation of highest de novo rate in the African HBB ROI and the deletion mutation of highest de novo rate in any gene and ethnicity. Furthermore, of the 23 potential deletions of up to size 3 that are observable by our method per ROI, only five deletions (16delC, 17_18delCT, 18_19delTG, 19_21delGAG or the equivalent 22_24delGAG—the Hb-Leiden mutation—and 20delA) have been reported to date on theHbVar database—a large collection of hemoglobin variants (Hardison et al. 2002; Hardison andMiller 2002)—all inHBB; and of these deletion types, a signifi- cantly higher fraction is observed here de novo compared to dele- tion types not reported on HbVar (Supplemental Text S11). Pooling together both the HBB and HBD ROIs given the similarity of de novo indel types observed between them, this effect is signifi- cant both with (P= 0.0078, OR 95% CI 2.17–818.08, two-sided Fisher’s exact test) and without (P= 0.024, OR 95% CI 1.44– 653.93, two-sided Fisher’s exact test) the Hb-Leiden mutation, showing that the correspondencebetweendenovo rates and alleles in populations extends beyond theHbS andHb-Leidenmutations. Although the same analysis cannot be repeated for the pointmuta- tions because of the smaller number of observable mutation types and the synonymous versus nonsynonymousmutation confound, further observations are in the expected direction (Supplemental Text S11). The correspondence observed could not have been pre- dicted from the mutations’ GWA rates, even when adjusting for the genetic context (Supplemental Text S10, S11). Between-population comparisons To provide a conservative statistical test of a population-level dif- ference that excludes individual- or sample-level variation alone as accounting for the result, we compared the per person overall point mutation rates in the HBB ROI between the African and European groups. Results showed that these rates were signifi- cantly higher in the African than in the European group both with (P=0.0061) and without (P=0.043, two-sided Wilcoxon rank-sum test) counting the HbSmutation. Next, pooling together cells fromall donorswithin eachpopulation to estimate the overall point mutation rate in the HBB ROI shows it to be significantly higher by 2.57-fold in the African than in the European donors (P<0.006, OR 95% CI 1.27–5.49, two-sided Fisher’s exact test). Thus, there is a significant population-level difference between the continental groups in the overall point mutation rate in this narrow ROI that is not attributable to individual- or sample-level variation. In contrast, in the HBD ROI, the number of mutations was not high enough to establish such a difference above and be- yond individual- or sample-level variation (P= 0.18, two-sided Wilcoxon rank-sum test). In contrast to the HBB overall point mu- tation rate, the overall indel rate did not vary significantly between these groups in either ROI (P=0.35 and P=1, respectively, two-sid- ed Fisher’s exact test). Position 20 mutation rates Two particularly notable mutations are the HbS and Hb-Leiden mutations (details below). Considering codons 6 and 7 equivalent with respect to the latter mutation, both mutations can be said to affect position 20. Using the aforementioned conservative test to exclude sample-level variation alone as accounting for the result, the overall per person point mutation rates at position 20 specifi- cally are significantly higher in the HBB than in the HBD ROI in Africans (P= 0.017, two-sided Wilcoxon rank-sum test) but not in Europeans (P=1). In the former, the overall point mutation rate at position 20 pooled across individuals is ∼6.1× higher in HBB than in HBD (P= 0.0061, OR 95% CI 1.50–37.14, two-sided Fisher’s exact test). In the case of the overall indel rates at position 20, although the pooled rates are significantly higher in HBB than in HBD for both Africans and Europeans (P= 0.044, OR 95% CI 1.03–6.54 and P=0.027, OR 95% CI 1.11–7.02, respectively; two- sided Fisher’s exact tests), sample-level variation cannot be B A Figure 2. Accuracy and yield ofMEMDS comparedwith current cutting- edge methods for studying target regions. (A) Under a highly conservative estimate, MEMDS increases accuracy by at least 40-fold compared to du- plex sequencing (DS) (Kennedy et al. 2014) and maximum depth se- quencing (MDS) (Jee et al. 2016). (B) MEMDS also increases yield per sequenced base (i.e., the number of MEMDS confirmed bases divided by the number of paired-end sequenced bases) by orders of magnitude over both DS and MDS (Kennedy et al. 2014; Jee et al. 2016). Notice that in MEMDS, the yield can be higher than 1 because the mutation en- richment factor is accurately calculated (Supplemental Text S2) and the base identity is known for the ROI sequences that were digested and re- moved from the final sequencing libraries (they have the restriction en- zyme recognition sequence). Although the accuracy of DS has been improved in the context of sequencing large parts of the genome (Abascal et al. 2021), yield considerations and other limitations preclude applying current DS-based methods to narrow ROIs and target mutations (Kennedy et al. 2014; Supplemental Text S1) with the same efficiency as that of MEMDS. Rates of target de novo mutations Genome Research 491 www.genome.org Cold Spring Harbor Laboratory Press on April 7, 2022 - Published by genome.cshlp.orgDownloaded from http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/ http://www.cshlpress.com excluded as the source of the differences (P=1 and P= 0.69 for Africans and Europeans, respectively; two-sided Wilcoxon rank- sum tests). Rates of the Hb-Leiden mutation The 3-bp in-frame deletion variant of either codon 6 or codon 7 that is called “the Hb-Leiden mutation” when it occurs in HBB re- curs noticeably more often than other mutations (comparing its per person rates to those of all other deletions combined to exclude sample-level variation, P<0.0005, two-sided Wilcoxon rank-sum test). Pooled across individuals, it appears at rates of 1.11×10−7 and 3.96× 10−8 in the HBB and HBD ROIs, respectively, ∼88.86× and ∼31.66× higher than the 1.25×10−9 estimate (P=4.04× 10−58, 95% CI 7.82× 10−8–1.53×10−7; and P= 1.62× 10−13, 95% CI 1.98×10−8–7.08×10−8), where the HBB rate is significantly (∼2.81×) higher than the HBD rate (P=0.002, OR 95% CI 1.40– 5.63, two-sided Fisher’s exact test). Rates of the HbS mutation The 20A→T mutation called “the HbS mutation” when it appears in the HBB ROI appears nine times in the African HBB ROI and no times in the other cases combined (the EuropeanHBB ROI and the European and African HBD ROIs) (P=0.023, 95% CI 1.5077–Inf; two-sided Fisher’s exact test classifying each individual and gene case as having [>0] or not having [=0] de novo 20A→T in sperm and comparing the fractions of these classes between the groups). The rate of the HbS mutation in the overall group (Africans and Europeans combined)—2.7 ×10−8—is 19.6× higher (P<2 ×10−9, rate 95% CI 1.24×10−8–5.13× 10−8) than expected from the GWA for this mutation type (Supplemental Text S10), and its rate in the African group specifically—4.74×10−8—is ∼35× higher than expected from its GWA (P=1.2 ×10−11, rate 95% CI 2.17× 10−8–9.0 ×10−8; two-sided binomial exact test). In the African group, it is the mutation that deviates the most (Supplemental Table S4) from its GWA among the 12 observable pointmutations, where its de novo rate varies significantly across samples (P= 0.0025, multisample proportion test), from 0 to 2.24×10−7 (the latter rate being ∼163× faster than expected; P=2.23×10−10, 95% CI 7.27× 10−8–5.23×10−7, two-sided binomial exact test). Note that the evolutionarily relevant mutation rate depends on the fraction of the mutation in sperm per se, not on whether it re- peats because of independent originations or owing to an early ap- pearance followed by duplications during spermatogenesis. The minimal number of independent originations of the HbS muta- tion is three, given that three individuals produced it de novo, and the corresponding minimal rate of independent occurrence of the HbS mutation in the sperm samples (a rate lower than the actual evolutionarily relevantmutation rate observed) across all in- dividuals is 9.01 ×10−9. This rate is still ∼6.5× higher than the Table 1. HBB and HBD ROI mutation counts Counts of de novo mutations identified by MEMDS in DNA from 11 sperm samples, seven from African (AFR) and four from European (EUR) donors. The numbers next to the donor labels refer to the calculated number of haploid individual genomes scanned by MEMDS. Light gray, dark gray, and black cell shading represent mutation counts of 1, 2–4, and ≥5, respectively. Some of the mutations have been observed before in carriers and have common names when they appear in HBB. These are 16C→G, Hb-Gorwihl; 16C→T, Hb-Tyne; 17C→G, Hb-Warwickshire; 17C→T, Hb-Aix- les-Bains; 20A→G, Hb-Lavagna; 20A→T, HbS; 20A→C, Hb-G-Makassar; 22G→C, Hb Bellevue III; and 19_21del or 22_24del, Hb-Leiden. Note that Hb-Leiden can result from deletion of either positions 19–21 or positions 22–24, which include the same GAG sequence, both of which can be en- riched and captured by MEMDS. aHbS. bHb-Leiden. Melamed et al. 492 Genome Research www.genome.org Cold Spring Harbor Laboratory Press on April 7, 2022 - Published by genome.cshlp.orgDownloaded from http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/ http://www.cshlpress.com genome-wide evolutionarily relevant mutation rate for this muta- tion type (P=0.011, 95% rate CI 1.86×10−9–2.63×10−8, two-sided binomial exact test). Discussion The data expose an ultrahigh resolution correspondence between de novo mutation rates and past observations of alleles in carriers (Flint et al. 1998; Hardison et al. 2002; Hardison and Miller 2002; Supplemental Text S11; Results), suggesting that these rates con- tribute to the prevalence of these mutations in populations. This correspondence could not have been predicted from the GWA rates of these mutation types even when adjusting for the local ge- netic context (Supplemental Text S10, S11). Consideration of the deletions observed clarifies this point. Although past literature fea- tured a single microdeletion rate decreasing with size (Gu and Li 1995; Kondrashov 2003; Lynch 2010), sized-based rate variation cannot explain the aforementioned correspondence obtained for same-sized deletions, the higher rate of the Hb-Leiden mutation compared to the smaller deletions, or the extent of rate variation observed. Thus, the aforementioned correspondence, together with the fact that in these ROIs, the rates of some mutations (e.g., those of the HbS and Hb-Leiden mutations) deviate much more than others from their corresponding GWA rates show that mutation-specific rates vary not only in the case of large rearrange- ment mutations (Gu et al. 2008; Zhang et al. 2009) but also in the cases of pointmutations andmicroindels. This rate variation could not have been seen using average-based measures (Kondrashov 2003; Lynch 2010) and establishes the relevance of mutation-spe- cific point mutation and microindel rates to the site frequency spectrum (SFS) (Harpak et al. 2016; Lek et al. 2016; Mathieson and Reich 2017). The overall point mutation rate in the HBB ROI is signifi- cantly higher in the African than in the European group even un- der a nonparametric comparison, which shows that the difference cannot be attributed to individual- or sample-level variation alone. Thus, it represents a significant population-level difference be- tween the groups. This difference, occurring in an extremely nar- row region spanning three codons of great importance for adaptation and genetic disease, is at least two orders of magnitude larger than previously reported differences in GWAmutation rates between continental groups (Harris 2015; Harris and Pritchard 2017). The correspondence between mutation-specific de novo rates and observations of alleles in carriers as well as this large dif- ference in the overall point mutation rate between populations in a narrow region establish the importance of measuring mutation rate variation at an ultrahigh resolution. Potential contributions to mutation rates from gross-level bi- ological or environmental factors, such as age or pesticides, cannot sufficiently explain the results. First, the two populations are sim- ilar in ages (Supplemental Table S1). Second, any mutation-specif- ic effect, like the correspondence between de novo rates and observations of alleles in carriers, cannot be explained by such macrolevel factors, because the latter cannot be expected to affect the rates of equivalent mutations, such as 20A→T in HBB versus HBD, differently. Third, the overall point mutation rate difference between the populations is also unlikely to be explained by them, because if on their own such macrolevel factors had affected the ROIs, they should have affected the entire genome similarly, yet GWA differences in point mutation rates between continental groups are smaller than the ROI-specific differences observed here (Harris 2015; Harris and Pritchard 2017). Note that if macro- level factors affectmutation rates in interactionwithmutation-, lo- cus-, individual-, and/or population-specific factors, then such specific factorsmust be assumed in any case. Thus, rather than sug- gesting involvement of macrolevel factors, the data suggest a com- plex picture of mutation rates involving mutation-specific influences. In addition, although the replication of mutations during spermatogenesis (clonal dependence) may make some contribu- tion to the data, in practice it is insufficient to account for the sig- nificant results. First, the significance of the continental difference in the overall point mutation rates in HBB is impervious to any sample-level variation, including clonal dependence, as shown by the nonparametric between-population comparison described in the results section. Second, the correspondence between muta- tion rates and observations of alleles in carriers cannot be driven by it. On the contrary, in the absence of a cellular-level mechanism that induces specific mutations in a population-specific manner in accord with the cellular generation during spermatogenesis, dif- ferences in mutation timing during spermatogenesis could only addnoise to the patterns observed, and thus any presence of clonal dependence would only make it more difficult to obtain signifi- cance for such patterns and in that sense is conservative to finding a pattern. Thus, more likely, the significance of these patterns is driven by independent originations of the mutations. These inde- pendent originations are consistent with mutation-specific rates being influenced by genetic and/or epigenetic factors (Livnat 2013, 2017). The prevalence of a mutation of heterozygote advantage in a population and of reading-frame conservation in a coding se- quence are generally considered to be outcomes of selection. However, here, both theHbSmutation, which provides strongma- laria protection in heterozygotes, and the Hb-Leiden mutation, which is an in-frame deletion, are frequent not because of selection but because of frequent de novo origination. Indeed, that the rate of the in-frameHb-Leidenmutation ismuchhigher than that of all other observed deletions, which are frameshift deletions, shows reading-frame conservation that is not caused by selection (Lek et al. 2016) but rather bymutational phenomena. This observation provides a concrete example of “mutational conservation”—evo- lutionary conservation caused by mutational reasons which, if it occurs more broadly, could offer an explanation for the puzzling observation of reading-frame conservation bias in pseudogenes (Zhang and Gerstein 2003). That the genetic sequences at and adjacent to the ROIs are identical for the two populations and for the two genes yet themu- tation rates vary significantly between the populations and be- tween the genes suggests that what affects these mutation rates in the germline includes more than this local DNA sequence and in that sense is complex (Livnat 2013, 2017). These results are con- sistent with the observation that the variation of the mutation rates across loci is partly cryptic (not explained by the local DNA context) (Hodgkinson et al. 2009; Hodgkinson and Eyre-Walker 2011), especially in the case of A↔T transversions (Hodgkinson et al. 2009), which include the HbS mutation type (A→T). Combining the multiple insights discussed, the results suggest that mutation rates are both mutation-specific and influenced in a complex manner by the genetic and/or epigenetic background (Livnat 2013, 2017). TheHBB region spanning three codons is of particular impor- tance for adaptation and genetic disease: it is the site of mutations that provide strong protection against malaria (HbS and HbC, the Rates of target de novo mutations Genome Research 493 www.genome.org Cold Spring Harbor Laboratory Press on April 7, 2022 - Published by genome.cshlp.orgDownloaded from http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/ http://www.cshlpress.com latter not observable by our method) and/or increase the risk for hematologic disease (Flint et al. 1998; Hardison et al. 2002; Hardison and Miller 2002). Thus, it is of interest that the overall point mutation rate in this region is significantly higher than ex- pected, and that it is significantly higher in the African than in the European population. These results provide a clear case of a connection between mutation rates and adaptive evolution, thus moving beyond previous literature on the relevance of mutation rates to adaptive evolution and its repeatability (Crow et al. 2009; Dumas et al. 2012; Kratochwil et al. 2019; Kratochwil and Meyer 2019; Lind 2019; Xie et al. 2019). The results underscore the importance of mapping the muta- tion rate variation at an ultrahigh resolution. It is beyond this fact that several observations on the HbS mutation specifically can be mentioned. First, if one assumes that the HbS rate is the same for both of the continental groups, the data show that it is signifi- cantly higher by nearly 20-fold than expected from the GWA for this mutation type, in both Africans and Europeans. Any amount of hypothetical clonal dependence does not change this estimate of the observed evolutionarily relevant mutation rate, because the latter does not depend on the cause of the recurrence of the mutation in the sperm. Even the observed minimal rate of inde- pendent HbS originations in sperm is still significantly larger by 6.5× than the evolutionarily relevant GWA rate for this mutation type. Consideration of the local genetic context does not change this conclusion (Supplemental Text S10). Thus, although the clas- sical explanation of the HbS case relied only on selection, even un- der the most conservative assumptions the overall HbS mutation rate observed here is notably higher than expected. Second, given the significant continental difference in the overall point mutation rate between the groups, it would be surprising if the HbS mutation specifically does not show a conti- nental effect. Consistent with this, in our samples, using themeth- odology described, we observe no instances of it in Europeans but nine instances of it in total in Africans, amounting to a rate ∼35× higher than expected from the GWA of this mutation type in the latter. Further consistent with a continental difference in the HbS mutation rate, it fits with the broader correspondence between de novo rates and observations of alleles in populations that HbS is most frequent in Africans and in some other populations in the Asianmalaria belt (Flint et al. 1998) and appears de novo in our Af- rican but not in our European samples, whereas Hb-Leiden has been observed across the globe (Hardison et al. 2002; Hardison and Miller 2002) and appears de novo in both our African and Eu- ropean samples. Third, in the AfricanHBB ROI, out of 12 observable pointmu- tations, the HbSmutation has the rate that deviates the most from the corresponding GWA rate (Supplemental Table S4). Fourth, it is striking that despite at least three independent oc- currences of the HbS mutation in the HBB ROI, not a single case of the equivalent 20A→T mutation in the HBD ROI was observed in anydonor, African or European. Accordingly,wenote that the bina- ry test establishing the significantly higher concentration of the 20A→Tmutation in the AfricanHBBROI as opposed to all other cas- es (the EuropeanHBB ROI or theHBD ROIs), which is impervious to any individual- or sample-level variance including clonal depen- dence, suggests that the 20A→T mutation arises more frequently where it is of adaptive significance than where it is not, although data do not suffice to tell whether this effect results from a popula- tion-level difference or from a locus-based difference or from both. Knowing that the HbSmutation is advantageous in heterozy- gotes under malarial pressure, how should we interpret these re- sults? One possibility is that, for a reason unrelated to adaptation, some individuals have a genomic fragility in HBB that generates the HbS mutation at a high rate. Accordingly, it is merely a coinci- dence thatHbSprovidesprotection againstmalaria, evenmore so if that fragility applies more to Africans. Another possibility is modifier theory (Feldman and Liberman 1986; Altenberg et al. 2017), according to which alleles affecting the mutation rate may be favored by selection under cer- tain conditions (Leigh 1970;Moxon et al. 1994). However, because the benefit of a modifier allele that increases the mutation rate is tied to the excess beneficial mutations it helps to generate, and because mutations are rare, it is normally expected that, for selec- tion to be effective, it must act on a modifier allele that increases the mutation rate across a long enough stretch of the genome with which it remains linked for a long enough period of time, so thatmany differentmutations potentially induced by this allele over space and time are factored into its selective benefit (Hodgkinson and Eyre-Walker 2011; Martincorena and Luscombe 2013; Walsh and Lynch 2018). Thus, modifier theory does not predict an increase in the rate of particular DNA muta- tions at specific base positions, let alone in sexual, complex organ- isms, nor the complex genetic and/or epigenetic influences on such mutation rates suggested by the current data (cf. Leigh 1970; Moxon et al. 1994; Altenberg et al. 2017; Walsh and Lynch 2018). On the contrary, the “reduction principle”—the first-order principle in modifier theory—underscores the general difficulty of accounting for increased mutation rates (Feldman and Liberman 1986; Altenberg et al. 2017). Finally, a recently proposed theory predicted that mutation- specific origination rates are influenced by the complex genetic and epigenetic background, that genetic relatedness inmutational tendencies exist, and that theHbSmutation arisesmore frequently in Africans than in Europeans (Livnat 2013, 2017). It holds that novelty in evolution arises from emergent interactions, which are then simplified through the generations by mutational mech- anisms while being checked by natural selection (Livnat 2017), one hypothetical example being that A→I RNA editing can mech- anistically increase the A→G mutation rate in the corresponding positions (cf. Popitsch et al. 2020). Based on these and other previ- ouswork (Livnat and Papadimitriou 2016), we hypothesize that re- curring, evolved processes acting on DNA and/or RNA through epigenetic modifications (Klose and Bird 2006), RNA editing (Nishikura 2010) and other mechanismsmay lead directly to their own replacement and simplification via DNAmutations that arise in the course of evolution from these processes’molecular nature, mechanistically linking regulatory activity with structural muta- tional changes—although whether and by what specific mecha- nism this “replacement” hypothesis explains the HbS case specifically (alternative decoding of A→I editing [Licht et al. 2019] or other mechanisms) is yet to be investigated. This raises the possibility that a mutation of adaptive value such as the HbS one need not initiate the process of adaptation but can arise later in an evolutionary process where adaptations and mutation-spe- cific rates jointly evolve (Livnat 2013, 2017), and thus studies on the fundamental nature of mutation need to test for not only a short-term response to environmental pressures (Luria and Delbrück 1943; Cairns et al. 1988) but also a long-term one. Unlike previous methods that could explore only diffuse rela- tionships between long-term selection pressures and the evolution of GWAmutation rates, the presentmethod offers the refined abil- ity needed to explore such relationships, if they exist, at the muta- tion-specific resolution. Because this method examines the Melamed et al. 494 Genome Research www.genome.org Cold Spring Harbor Laboratory Press on April 7, 2022 - Published by genome.cshlp.orgDownloaded from http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/ http://www.cshlpress.com mutation-specific resolution for the first time, it provides only ini- tial estimates of mutation rates, which will require further investi- gation and refinement. Furthermore, it cannot be applied currently to all mutations, because it requires a special RE for each ROI. However, given the numerous REs available and their short recognition sequences, which imply large representation of these sequences across the genome, it likely applies across many loci and organisms. Therefore, some of the most important tasks now are to examine the high-resolution mutation rate variation across additional loci of interest and to explore the molecular mechanisms responsible. Methods For the experimental design and different stages of library prepara- tion, see Supplemental Text S1–S3 and Supplemental Figures S1– S3. All of the oligos for the sperm DNA library preparation de- scribed in Supplemental Text S14 were ordered from Integrated DNA Technologies (IDT) with standard desalting purity, unless otherwise mentioned. All enzymes were obtained from New England Biolabs (NEB). Plasmid mini-prep, PCR purification, and agarose gel extraction were performed with QIAGEN kits. Spike-in plasmid preparation Four puc19-based plasmids were generated. Two (ALP13 and ALP17) were designed to carry theHBB genomic segment from po- sition −203 to +223 relative to the mRNA translation start site, with the Bsu36I restriction site CCTGAGG replaced with TTATGTT and ACGAGAC, respectively; and two others (ALP16 and ALP18) were designed to carry the HBD genomic segment from position −59 to +220 relative to the mRNA translation start site, with the Bsu36I restriction site replaced with TTATGTT and ACGAGAC, respectively. To prepare the spike-in mixture, the four plasmids were linearized by BamHI, mixed in equal amounts, and diluted to 10 fg/µL for the AFR1, AFR3, AFR5, AFR6, AFR7, EUR3, and EUR4 samples and to 5 fg/µL for all other samples. Collection of sperm samples Semen samples from Africans were collected in the Assisted Conception Unit of the Lister Hospital and Fertility Centre in Accra, Ghana, following clinical standards. Semen samples from Europeans were purchased from Fairfax, a large US cryobank, with the approvals of the Institutional Review Board of the Noguchi Memorial Institute for Medical Research (NMIMR-IRB 081/16-17) at the University of Ghana, Legon, the Rambam Health Care Center Helsinki Committee, Haifa (0312-16-RMB), and the Israel Ministry of Health (20188768). Donors with a histo- ry of cancer or infertility or with high fever in the 3mo before don- ation were excluded. Informed consent was obtained from all participants, and personal identifying information was removed and replaced with codes at the source. DNA extraction from sperm cells The DNA isolation protocol was modified fromWeyrich (2012). A semen sample from a single donor was divided into 500-µL ali- quots in multiple screw-capped tubes. The sperm aliquots were washed twicewith 70% ethanol to remove seminal plasma. The re- maining cells were rotated overnight at 50°C in a 700-µL lysis buff- er (50 mM Tris-HCl [pH 8.0], 100 mM NaCl2, 50 mM EDTA, 1% SDS) containing 0.5% Triton X-100 (Fisher BioReagents BP151- 100), 50 mM Tris(2-carboxyethyl) phosphine hydrochloride (TCEP; Sigma-Aldrich 646547), and 1.75 mg/mL Proteinase K (Fisher BioReagents BP1700-100). Lysates were centrifuged at 21,000g for 10 min at room temperature, and the supernatants were united in a single tube. DNA purification from the cleared ly- sate was performed using QIAGEN Blood and Cell Culture DNA Maxi Kit (13362). Specifically, 5 mL lysate were supplemented by 15 mL buffer G2 (800 mM guanidine hydrochloride, 30 mM Tris-HCl [pH 8.0], 30 mM EDTA [pH 8.0], 5% Tween 20, 0.5% Triton X-100), vortexed thoroughly, and allowed to gravity flow through a single Genomic-tip 500/G column pre-equilibrated by 10mL buffer QBT (750mMNaCl, 50mMMOPS [pH 7.0], 15% iso- propanol [v/v]). Resin was washed twice by 15 mL Buffer QC (1 M NaCl, 50 mMMOPS [pH 7.0], 15% isopropanol [v/v]), and elution was performed by 15 mL Buffer QF prewarmed to 50°C (1.25 M NaCl, 50 mM Tris-HCl [pH 7.0], 15% isopropanol [v/v]). DNA was precipitated by adding 10.5mL room temperature isopropanol to the elute, inverting the tube 10 times, and using a sterile tip to spool and transfer the DNA to a screw-capped tube containing 500 µL buffer EB (10 mM Tris-HCl [pH 8.5]). The DNA was allowed to dissolve overnight at room temperature. For each donor, a small al- iquot from the extracted DNA was PCR amplified and Sanger se- quenced to verify the exact sequence of the HBB and HBD regions and to confirm that the donors were homozygous for the WT sequence for both ROIs. Enzymatic digestion For the Bsu36I-treated sample (Supplemental Text S1–S3), ∼264 µg sperm DNA, equivalent to 80 million haploid cells (for AFR2, a DNA amount equivalent to 60 million cells was used), were mixed with a plasmid spike-in mixture (0.2 pg for AFR1 and 0.1 pg for other donors) and equally divided in a 96-well plate. Bsu36I diges- tion was performed overnight at 37°C according to the manufac- turer’s instructions using 5 units per well. Then, each well was supplemented by 6 units of HpyCH4III to generate the primary barcode attachment site, and digestion continued for an addition- al 3 h. For the Bsu36I-untreated reaction, 13.2 µg sperm DNA (and 9.9 µg for AFR2), representing 5% of the DNA amount used for the Bsu36I digest, were mixed with 6 times the volume of plas- mid spike-in mixture, aliquoted to five tubes, and incubated over- night with 2 units SalI-HF per tube instead of Bsu36I to allow for similar conditions of DNA digestion without affecting the Bsu36I and HpyCH4III sites. Then, each well was supplemented by 6 units of HpyCH4III and digestion continued for an additional 3 h, followed by DNA purification. Primary barcode labeling and linear amplification Direct barcode labeling and linear amplification of the digested HBB and HBD strands were performed in a single reaction in 96- well plates. Eachwell contained∼1 µg of digestedDNA, 0.1 µMpri- mary barcode oligo (oligo A) (Supplemental Text S14), and 1 µMof 5′-phosphorothioate-protected primer for linear amplification (oligo B). The reaction was performed with Q5 high-fidelity poly- merase according to themanufacturer’s instructions, using the fol- lowing thermocycler parameters: initial denaturation for 20 sec at 98°C, followed by 16 cycles for 5 sec at 98°C, for 15 sec at 68°C, and for 20 sec at 72°C. For each donor, each of the Bsu36I-treated and -untreated samples was labeled by an oligo A with a different Donor Identifier-1 (ID-1) sequence, which was also not shared by samples from other donors, providing each donor and each condition with a unique identifier sequence. 5′-exonuclease treatment To eliminate non 5′-phosphorothioate-protected strands, follow- ing purification, 15 µg DNA aliquots from the post-linearly Rates of target de novo mutations Genome Research 495 www.genome.org Cold Spring Harbor Laboratory Press on April 7, 2022 - Published by genome.cshlp.orgDownloaded from http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/ http://www.cshlpress.com amplified product of the Bsu36I-treated sample were incubated each at 37°C in the presence of 15 units of Lambda exonuclease, 30 units of T7 exonuclease, and 90 units of RecJF exonuclease in 1× CutSmart buffer for 2.5 h. The post-linearly amplified product of the Bsu36I-untreated sample was incubated at the same condi- tions with 10 units of Lambda exonuclease, 20 units of T7 exonu- clease, and 60 units of RecJF exonuclease. Secondary barcode labeling and 3′-exonuclease treatment Following purification, the DNAwas aliquoted into a 96-well plate (1 µg per well). A single primer extension reaction was performed using 0.5 µM of the secondary barcode primer (oligo C) and Q5 high-fidelity polymerase according to the manufacturer’s instruc- tions. The following thermocycler parameters were used: initial denaturation for 20 sec at 98°C, followed by a single cycle for 5 sec at 98°C, for 15 sec at 68°C, and for 40 sec at 72°C. To remove excess oligo C, immediately after the thermocycler temperature dropped to 16°C, 20 units of thermolabile Exo I were added directly to each well together with the relabeling control primer (oligoD) in a known amount equivalent to 0.66%of the secondary barcode primer. After incubation of 1 h at 37°C, the thermolabile Exo I was heat-inactivated for 1 min at 80°C and the DNAwas pu- rified. For each donor, each of the Bsu36I-treated and -untreated samples was labeled by an oligo C with a different Donor Identifier-2 sequence (ID-2), whichwas also not shared by samples from other donors, resulting in each donor and each condition having a unique Identifier-2 sequence. PCR amplification and sequencing The first PCR reaction of the dual-barcode-labeled product was per- formed using oligo E and oligo F1 as primers and Q5 high-fidelity polymerase, according to the manufacturer’s instructions. The fol- lowing thermocycler parameters were used: initial denaturation for 30 sec at 98°C, followed by 10 cycles for 5 sec at 98°C, for 15 sec at 72°C, for 30 sec at 72°C, and a final extension for 30 sec at 72°C. Amplification products were purified, and the second PCR reaction was performed using 25% of the first PCR product as tem- plate, the amplification primers E and F2, and Q5 high-fidelity po- lymerase according to the manufacturer’s instructions (different F2 primers were used to add a unique Illumina index sequence to each Bsu36I-treated and -untreated sample). The following ther- mocycler parameters were used: initial denaturation for 30 sec at 98°C, followed by 24 cycles (with the exception of the EUR4 sam- ple that was amplified by 17 cycles) for 5 sec at 98°C, for 15 sec at 70°C, for 30 sec at 72°C, and a final extension for 1 min at 72°C. PCR products were agarose gel purified and further concentrated by a DNA clean and concentrator kit (Zymo Research). DNA librar- ies prepared from the Bsu36I-treated and -untreated samples of the same donor were mixed in equal amounts and paired-end se- quenced with 20% PhiX by Illumina MiSeq 300 cycles kit (V2) at the Technion Genome Center (TGC). For each donor, two or three MiSeq runs were performed to reach a minimum of 10 million reads per treatment (specifically, all but AFR5 and EUR3 were se- quenced two times), and the resulting FASTQ sequences were joined before the sequence analysis step. Sequence analysis Illumina paired-end (PE) reads were merged via Pear (Zhang et al. 2014) using the default model for the detection of significantly aligned regions and Phred score corrections. Merged sequences were trimmed from Illumina adapters using cutadapt (Martin 2011), and quality filtered by Trimmomatic (Bolger et al. 2014) us- ing a sliding window size of 3 and a Phred quality threshold of 30. Quality filtered sequences were trimmed to remove the 5′ edge up to position 18, a sequence which includes the 14 bases of the pri- mary barcode and the 4 bases of ID-1, while adding this informa- tion to the read’s header. Only sequences with the correct ID-1 and first three bases of HBB or HBD sequences were maintained. Similarly, sequences were trimmed from 9 bp at their 3′ edge, which include the 5 bases of the secondary barcode and the 4 bases of ID-2, while adding this information to the read’s header. Only sequences with the correct ID-2 were maintained. Trimmed se- quences were sorted to HBB or HBD sequence pools, based on the occupying bases at positions 33–38 of the coding sequence (CGTTAC for HBB and TGTCAA for HBD), allowing one mismatch and frameshifts of up to −3 or +3. Successfully sorted sequences were mapped to either the HBB or HBD reference sequence (ob- tained by Sanger sequencing aliquots from the matching donor samples) using BWA (Li 2013) (parameters -M -t), and high-quality mutations (Phred score ≥28) were noted. Reads were grouped by their primary barcodes to “families” and processed according to the workflow depicted in Supplemental Figure S9. Data access All raw sequencing data generated in this study have been submit- ted to the NCBI database of Genotypes and Phenotypes (dbGaP; https://www.ncbi.nlm.nih.gov/gap/) under accession number phs002391.v1.p1. For final processed data see Supplemental Datasheets and Supplemental Text S15. Software is available at GitHub (https://github.com/livnat-lab/HBB_HBD) and as Supplemental Code. Competing interest statement The authors declare no competing interests. Acknowledgments We thank Marc Feldman for comments on a previous draft; Rami Reshef for infrastructural resources; Mary Otoo and Joshua Adoboe for help with sample collection; Sara Zelig, Alan Templeton, and Nick Pippenger for technical comments; and Kim Weaver for extensive help. This publication was made possi- ble through the support of a grant from the John Templeton Foundation (61129). The opinions expressed in this publication are those of the authors and do not necessarily reflect the views of the John Templeton Foundation. Author contributions: D.M. and A.L. invented the method and designed the studies; D.M. performed all experiments except for R.S.’s; R.S. processed the EUR4 sample; Y.N. created software tools for data analysis; A.M. created the computational pipeline for mu- tation calling; E.B. improved the pipeline; Y.N. and A.L. provided statistical tools; M.B.Y., E.K.H., K.L.S., and A.L. obtained IRB and Helsinki approvals; M.B.Y. and E.K.H. collected samples; D.M., Y.N., E.B., A.M., and A.L. analyzed the results; D.M. and A.L. draft- ed the paper; D.M., Y.N., K.L.S. and A.L. revised the draft; K.L.S. provided general advice; K.L.S. and A.L. acquired funding; A.L. conceived of the project and the replacement hypothesis and su- pervised the project. References Abascal F, Harvey LM,Mitchell E, Lawson AR, Lensing SV, Ellis P, Russell AJ, Alcantara RE, Baez-Ortega A, Wang Y, et al. 2021. Somatic mutation landscapes at single-molecule resolution. Nature 593: 405–410. doi:10 .1038/s41586-021-03477-4 Melamed et al. 496 Genome Research www.genome.org Cold Spring Harbor Laboratory Press on April 7, 2022 - Published by genome.cshlp.orgDownloaded from http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 https://www.ncbi.nlm.nih.gov/gap/ https://www.ncbi.nlm.nih.gov/gap/ https://www.ncbi.nlm.nih.gov/gap/ https://www.ncbi.nlm.nih.gov/gap/ https://www.ncbi.nlm.nih.gov/gap/ https://www.ncbi.nlm.nih.gov/gap/ https://www.ncbi.nlm.nih.gov/gap/ http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 https://github.com/livnat-lab/HBB_HBD https://github.com/livnat-lab/HBB_HBD https://github.com/livnat-lab/HBB_HBD https://github.com/livnat-lab/HBB_HBD http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1 http://genome.cshlp.org/ http://www.cshlpress.com Allison AC. 1954. Protection afforded by sickle-cell trait against subtertian malarial infection. Br Med J 1: 290–294. doi:10.1136/bmj.1.4857.290 Altenberg L, Liberman U, Feldman MW. 2017. Unified reduction principle for the evolution of mutation, migration, and recombination. Proc Natl Acad Sci 114: E2392–E2400. doi:10.1073/pnas.1619655114 Arbeithuber B, Makova KD, Tiemann-Boege I. 2016. Artifactual mutations resulting from DNA lesions limit detection levels in ultrasensitive se- quencing applications. DNA Res 23: 547–559. doi:10.1093/dnares/ dsw038 Blake R, Hess ST, Nicholson-Tuell J. 1992. The influence of nearest neigh- bors on the rate and pattern of spontaneous point mutations. J Mol Evol 34: 189–200. doi:10.1007/BF00162968 Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30: 2114–2120. doi:10.1093/bio informatics/btu170 Bulmer M. 1986. Neighboring base effects on substitution rates in pseudo- genes. Mol Biol Evol 3: 322–329. doi:10.1093/oxfordjournals.molbev .a040401 Cairns J, Overbaugh J, Miller S. 1988. The origin of mutants. Nature 335: 142–145. doi:10.1038/335142a0 Campbell CD, Chong JX, Malig M, Ko A, Dumont BL, Han L, Vives L, O’Roak BJ, Sudmant PH, Shendure J, et al. 2012. Estimating the human mutation rate using autozygosity in a founder population.Nat Genet 44: 1277–1281. doi:10.1038/ng.2418 Carlson J, Locke AE, Flickinger M, Zawistowski M, Levy S, Myers RM, BoehnkeM, Kang HM, Scott LJ, Li JZ, et al. 2018. Extremely rare variants reveal patterns of germline mutation rate heterogeneity in humans. Nat Commun 9: 3753. doi:10.1038/s41467-018-05936-5 Carter R, Mendis K. 2002. Evolutionary and historical aspects of the burden of malaria. Clin Microbiol Rev 15: 564–594. doi:10.1128/CMR.15.4.564- 594.2002 Cavalli-Sforza LL, FeldmanMW. 2003. The application ofmolecular genetic approaches to the study of human evolution. Nat Genet 33: 266–275. doi:10.1038/ng1113 CDC Division of Parasitic Diseases and Malaria. 2019. Where malaria oc- curs. http://www.cdc.gov/malaria/about/distribution.html. Conrad DF, Keebler JE, DePristo MA, Lindsay SJ, Zhang Y, Casals F, Idaghdour Y, Hartl CL, Torroja C, Garimella KV, et al. 2011. Variation in genome-wide mutation rates within and between human families. Nat Genet 43: 712–714. doi:10.1038/ng.862 Crow KD, Amemiya CT, Roth J, Wagner GP. 2009. Hypermutability of HoxA13A and functional divergence from its paralog are associated with the origin of a novel developmental feature in zebrafish and related taxa (Cypriniformes). Evolution 63: 1574–1592. doi:10.1111/j.1558- 5646.2009.00657.x Dumas LJ, O’Bleness MS, Davis JM, Dickens CM, Anderson N, Keeney J, Jackson J, Sikela M, Raznahan A, Giedd J, et al. 2012. DUF1220-domain copy number implicated in human brain-size pathology and evolution. Am J Hum Genet 91: 444–454. doi:10.1016/j.ajhg.2012.07.016 Ellegren H, Smith NG, Webster MT. 2003. Mutation rate variation in the mammalian genome. Curr Opin Genet Dev 13: 562–568. doi:10.1016/j .gde.2003.10.008 Feldman MW, Liberman U. 1986. An evolutionary reduction principle for genetic modifiers. Proc Natl Acad Sci 83: 4824–4827. doi:10.1073/pnas .83.13.4824 Feng Z, Smith D, Ellis McKenzie F, Levin S. 2004. Coupling ecology and evo- lution: malaria and the S-gene across time scales.Math Biosci 189: 1–19. doi:10.1016/j.mbs.2004.01.005 Flint J, Harding RM, Boyce AJ, Clegg JB. 1998. The population genetics of the haemoglobinopathies. Baillière’s Clin Haem 11: 1–51. doi:10.1016/ S0950-3536(98)80069-3 Francioli LC, Polak PP, Koren A, Menelaou A, Chun S, Renkens I, Van Duijn CM, Swertz M, Wijmenga C, Van Ommen G, et al. 2015. Genome-wide patterns and properties of de novo mutations in humans. Nat Genet 47: 822–826. doi:10.1038/ng.3292 Gojobori T, Li WH, Graur D. 1982. Patterns of nucleotide substitution in pseudogenes and functional genes. J Mol Evol 18: 360–369. doi:10 .1007/BF01733904 Goldmann JM, Wong WS, Pinelli M, Farrah T, Bodian D, Stittrich AB, Glusman G, Vissers LE, Hoischen A, Roach JC, et al. 2016. Parent-of-or- igin-specific signatures of de novo mutations. Nat Genet 48: 935–939. doi:10.1038/ng.3597 Gu X, Li WH. 1995. The size distribution of insertions and deletions in hu- man and rodent pseudogenes suggests the logarithmic gap penalty for sequence alignment. J Mol Evol 40: 464–473. doi:10.1007/BF00164032 Gu W, Zhang F, Lupski JR. 2008. Mechanisms for human genomic rear- rangements. Pathogenetics 1: 4. doi:10.1186/1755-8417-1-4 Haldane JBS. 1949. The rate ofmutation of human genes.Hereditas 35: 267– 273. doi:10.1111/j.1601-5223.1949.tb03339.x Hardison R, Miller W. 2002. Welcome to the globin gene server. http ://globin.cse.psu.edu/. Hardison RC, Chui DH, Giardine B, Riemer C, Patrinos GP, Anagnou N, Miller W,Wajcman H. 2002. HbVar: a relational database of human he- moglobin variants and thalassemia mutations at the globin gene server. Hum Mutat 19: 225–233. doi:10.1002/humu.10044 Harpak A, Bhaskar A, Pritchard J. 2016. Mutation rate variation is a primary determinant of the distribution of allele frequencies in humans. PLoS Genet 12: e1006489. doi:10.1371/journal.pgen.1006489 Harris K. 2015. Evidence for recent, population-specific evolution of the hu- man mutation rate. Proc Natl Acad Sci 112: 3439–3444. doi:10.1073/ pnas.1418652112 Harris K, Pritchard JK. 2017. Rapid evolution of the human mutation spec- trum. eLife 6: e24284. doi:10.7554/eLife.24284 Hartl DL, Clark AG. 2007. Principles of population genetics, 4th ed. Sinauer Associates, Sunderland, MA. Hodgkinson A, Eyre-Walker A. 2011. Variation in the mutation rate across mammalian genomes.Nat Rev Genet 12: 756–766. doi:10.1038/nrg3098 Hodgkinson A, Ladoukakis E, Eyre-Walker A. 2009. Cryptic variation in the human mutation rate. PLoS Biol 7: e1000027. doi:10.1371/journal.pbio .1000027 Hwang DG, Green P. 2004. Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution. Proc Natl Acad Sci 101: 13994–14001. doi:10.1073/pnas .0404142101 Ingram V. 1957. Gene mutations in human hæmoglobin: the chemical dif- ference between normal and sickle cell hæmoglobin. Nature 180: 326– 328. doi:10.1038/180326a0 Inoue K, Dewar K, Katsanis N, Reiter LT, Lander ES, Devon KL,Wyman DW, Lupski JR, Birren B. 2001. The 1.4-Mb CMT1A duplication/HNPP dele- tion genomic region reveals unique genome architectural features and provides insights into the recent evolution of new genes. Genome Res 11: 1018–1033. doi:10.1101/gr.180401 Jee J, Rasouly A, Shamovsky I, Akivis Y, Steinman SR, Mishra B, Nudler E. 2016. Rates and mechanisms of bacterial mutagenesis from maxi- mum-depth sequencing. Nature 534: 693–696. doi:10.1038/ nature18313 Kennedy SR, Schmitt MW, Fox EJ, Kohrn BF, Salk JJ, Ahn EH, Prindle MJ, Kuong KJ, Shen JC, Risques RA, et al. 2014. Detecting ultralow-frequen- cy mutations by duplex sequencing. Nat Protoc 9: 2586–2606. doi:10 .1038/nprot.2014.170 Klose RJ, Bird AP. 2006. Genomic DNAmethylation: the mark and its medi- ators. Trends Biochem Sci 31: 89–97. doi:10.1016/j.tibs.2005.12.008 Kondrashov AS. 2003. Direct estimates of human per nucleotide mutation rates at 20 loci causing Mendelian diseases. Hum Mutat 21: 12–27. doi:10.1002/humu.10147 Kratochwil CF, Meyer A. 2019. Fragile DNA contributes to repeated evolu- tion. Genome Biol 20: 39. doi:10.1186/s13059-019-1655-x Kratochwil CF, Liang Y, Urban S, Torres-Dowdall J, Meyer A. 2019. Evolutionary dynamics of structural variation at a key locus for color pattern diversification in cichlid fishes. Genome Biol Evol 11: 3452– 3465. doi:10.1093/gbe/evz261 Kwiatkowski DP. 2005. How malaria has affected the human genome and what human genetics can teach us about malaria. Am J Hum Genet 77: 171–192. doi:10.1086/432519 Leigh EG Jr. 1970. Natural selection and mutability. Am Nat 104: 301–305. doi:10.1086/282663 Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, O’Donnell-Luria AH, Ware JS, Hill AJ, Cummings BB, et al. 2016. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536: 285–291. doi:10.1038/nature19057 Lercher MJ, Williams EJ, Hurst LD. 2001. Local similarity in evolutionary rates extends over whole chromosomes in human-rodent and mouse- rat comparisons: implications for understanding the mechanistic basis of the male mutation bias. Mol Biol Evol 18: 2032–2039. doi:10.1093/ oxfordjournals.molbev.a003744 Li H. 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303.3997 [q-bio.GN]. Licht K, Hartl M, Amman F, Anrather D, Janisiw MP, Jantsch MF. 2019. Inosine induces context-dependent recoding and translational stalling. Nucleic Acids Res 47: 3–14. doi:10.1093/nar/gky1163 Lind PA. 2019. Repeatability and predictability in experimental evolution. In Evolution, origin of life, concepts and methods (ed. Pontarotti P), pp. 57–83. Springer, Cham, Switzerland. Livnat A. 2013. Interaction-based evolution: how natural selection and nonrandom mutation work together. Biol Direct 8: 24. doi:10.1186/ 1745-6150-8-24 Livnat A. 2017. Simplification, innateness, and the absorption of meaning from context: how novelty arises from gradual network evolution. Evol Biol 44: 145–189. doi:10.1007/s11692-017-9407-x Livnat A, Papadimitriou C. 2016. Evolution and learning: used together, fused together. A response to Watson and Szathmáry. Trends Ecol Evol 31: 894–896. doi:10.1016/j.tree.2016.10.004 Rates of target de novo mutations Genome Research 497 www.genome.org Cold Spring Harbor Laboratory Press on April 7, 2022 - Published by genome.cshlp.orgDownloaded from http://www.cdc.gov/malaria/about/distribution.html http://www.cdc.gov/malaria/about/distribution.html http://www.cdc.gov/malaria/about/distribution.html http://www.cdc.gov/malaria/about/distribution.html http://www.cdc.gov/malaria/about/distribution.html http://www.cdc.gov/malaria/about/distribution.html http://globin.cse.psu.edu/ http://globin.cse.psu.edu/ http://globin.cse.psu.edu/ http://globin.cse.psu.edu/ http://globin.cse.psu.edu/ http://globin.cse.psu.edu/ http://genome.cshlp.org/ http://www.cshlpress.com Losos JB. 2017. Improbable destinies: fate, chance, and the future of evolution. Penguin, New York. Lupski JR. 1998. Genomic disorders: structural features of the genome can lead to DNA rearrangements and human disease traits. Trends Genet 14: 417–422. doi:10.1016/S0168-9525(98)01555-8 Luria SE, Delbrück M. 1943. Mutations of bacteria from virus sensitivity to virus resistance. Genetics 28: 491–511. doi:10.1093/genetics/28.6.491 Lynch M. 2010. Rate, molecular spectrum, and consequences of human mutation. Proc Natl Acad Sci 107: 961–968. doi:10.1073/pnas .0912629107 Martin M. 2011. Cutadapt removes adapter sequences from high-through- put sequencing reads. EMBnet 17: 10–12. doi:10.14806/ej.17.1.200 Martincorena I, LuscombeNM. 2013. Non-randommutation: the evolution of targeted hypermutation and hypomutation. Bioessays 35: 123–130. doi:10.1002/bies.201200150 Matassi G, Sharp PM, Gautier C. 1999. Chromosomal location effects on gene sequence evolution in mammals. Curr Biol 9: 786–791. doi:10 .1016/S0960-9822(99)80361-3 Mathieson I, Reich D. 2017. Differences in the rare variant spectrum among human populations. PLoS Genet 13: e1006581. doi:10.1371/journal .pgen.1006581 McClellan J, King MC. 2010. Genetic heterogeneity in human disease. Cell 141: 210–217. doi:10.1016/j.cell.2010.03.032 Michaelson JJ, Shi Y, Gujral M, Zheng H, Malhotra D, Jin X, Jian M, Liu G, Greer D, Bhandari A, et al. 2012. Whole-genome sequencing in autism identifies hot spots for de novo germline mutation. Cell 151: 1431– 1442. doi:10.1016/j.cell.2012.11.019 Moxon ER, Rainey PB, Nowak MA, Lenski RE. 1994. Adaptive evolution of highly mutable loci in pathogenic bacteria. Curr Biol 4: 24–33. doi:10 .1016/S0960-9822(00)00005-1 NachmanMW, Crowell SL. 2000. Estimate of the mutation rate per nucleo- tide in humans.Genetics 156: 297–304. doi:10.1093/genetics/156.1.297 Nishikura K. 2010. Functions and regulation of RNA editing by ADAR deam- inases. Annu Rev Biochem 79: 321–349. doi:10.1146/annurev-biochem- 060208-105251 Pauling L, Itano HA, Singer SJ, Wells IC. 1949. Sickle-cell anemia, a molec- ular disease. Science 110: 543–548. doi:10.1126/science.110.2865.543 Piel FB, Patil AP, Howes RE, Nyangiri OA, Gething PW, Williams TN, Weatherall DJ, Hay SI. 2010. Global distribution of the sickle cell gene and geographical confirmation of the malaria hypothesis. Nat Commun 1: 104. doi:10.1038/ncomms1104 Popitsch N, Huber CD, Buchumenski I, Eisenberg E, Jantsch M, Von Haeseler A, Gallach M. 2020. A-to-I RNA editing uncovers hidden sig- nals of adaptive genome evolution in animals. Genome Biol Evol 12: 345–357. doi:10.1093/gbe/evaa046 Rahbari R, Wuster A, Lindsay SJ, Hardwick RJ, Alexandrov LB, Al Turki S, Dominiczak A, Morris A, Porteous D, Smith B, et al. 2016. Timing, rates and spectra of human germline mutation. Nat Genet 48: 126. doi:10 .1038/ng.3469 Roach JC, Glusman G, Smit AF, Huff CD, Hubley R, Shannon PT, Rowen L, Pant KP, GoodmanN, BamshadM, et al. 2010. Analysis of genetic inher- itance in a family quartet by whole-genome sequencing. Science 328: 636–639. doi:10.1126/science.1186802 Salk JJ, Schmitt MW, Loeb LA. 2018. Enhancing the accuracy of next-gener- ation sequencing for detecting rare and subclonal mutations. Nat Rev Genet 19: 269–285. doi:10.1038/nrg.2017.117 Shendure J, Akey JM. 2015. The origins, determinants, and consequences of human mutations. Science 349: 1478–1483. doi:10.1126/science .aaa9119 Steinberg M, Adams JI. 1991. Hemoglobin A2: origin, evolution, and after- math. Blood 78: 2165–2177. doi:10.1182/blood.V78.9.2165.2165 Veltman JA, Brunner HG. 2012. De novo mutations in human genetic dis- ease. Nat Rev Genet 13: 565–575. doi:10.1038/nrg3241 Vogel F, Motulsky A. 1997. Human genetics: problems and approaches. Springer-Verlag, Berlin. Walsh B, Lynch M. 2018. Evolution and selection of quantitative traits. Oxford University Press, Oxford, UK. Weyrich A. 2012. Preparation of genomic DNA from mammalian sperm. Curr Protoc Mol Biol 98: 2–13. doi:10.1002/0471142727.mb0213s98 Wolfe KH, Sharp PM, Li WH. 1989. Mutation rates differ among regions of the mammalian genome. Nature 337: 283–285. doi:10.1038/337283a0 Xie KT,Wang G, Thompson AC, Wucherpfennig JI, Reimchen TE, MacColl AD, Schluter D, Bell MA, Vasquez KM, KingsleyDM. 2019. DNA fragility in the parallel evolution of pelvic reduction in stickleback fish. Science 363: 81–84. doi:10.1126/science.aan1425 Zhang Z, Gerstein M. 2003. Patterns of nucleotide substitution, insertion and deletion in the human genome inferred from pseudogenes. Nucleic Acids Res 31: 5338–5348. doi:10.1093/nar/gkg745 Zhang F, Carvalho CMB, Lupski JR. 2009. Complex human chromosomal and genomic rearrangements. Trends Genet 25: 298–307. doi:10.1016/ j.tig.2009.05.005 Zhang J, Kobert K, Flouri T, Stamatakis A. 2014. PEAR: a fast and accurate Illumina Paired-End reAd mergeR. Bioinformatics 30: 614–620. doi:10 .1093/bioinformatics/btt593 Received August 17, 2021; accepted in revised form January 10, 2022. Melamed et al. 498 Genome Research www.genome.org Cold Spring Harbor Laboratory Press on April 7, 2022 - Published by genome.cshlp.orgDownloaded from http://genome.cshlp.org/ http://www.cshlpress.com 10.1101/gr.276103.121Access the most recent version at doi: 2022 32: 488-498 originally published online January 14, 2022Genome Res.    Daniel Melamed, Yuval Nov, Assaf Malik, et al.   disease gene region associated with adaptation and geneticHBBhuman De novo mutation rates at the single-mutation resolution in a   Material Supplemental   http://genome.cshlp.org/content/suppl/2022/02/14/gr.276103.121.DC1   References   http://genome.cshlp.org/content/32/3/488.full.html#ref-list-1 This article cites 75 articles, 15 of which can be accessed free at:   Open Access   Open Access option.Genome ResearchFreely available online through the   License Commons Creative .http://creativecommons.org/licenses/by/4.0/ Commons License (Attribution 4.0 International), as described at , is available under a CreativeGenome ResearchThis article, published in Service Email Alerting   click here.top right corner of the article or Receive free email alerts when new articles cite this article - sign up in the box at the https://genome.cshlp.org/subscriptions go to: Genome Research To subscribe to © 2022 Melamed et al.; Published by Cold Spring Harbor Laboratory Press Cold Spring Harbor Laboratory Press on April 7, 2022 - Published by genome.cshlp.orgDownloaded from http://genome.cshlp.org/lookup/doi/10.1101/gr.276103.121 http://genome.cshlp.org/content/suppl/2022/02/14/gr.276103.121.DC1 http://genome.cshlp.org/content/32/3/488.full.html#ref-list-1 http://creativecommons.org/licenses/by/4.0/ http://genome.cshlp.org/cgi/alerts/ctalert?alertType=citedby&addAlert=cited_by&saveAlert=no&cited_by_criteria_resid=protocols;10.1101/gr.276103.121&return_type=article&return_url=http://genome.cshlp.org/content/10.1101/gr.276103.121.full.pdf http://genome.cshlp.org/cgi/adclick/?ad=56437&adclick=true&url=https%3A%2F%2Fwww.gencove.com%2F https://genome.cshlp.org/subscriptions http://genome.cshlp.org/ http://www.cshlpress.com