De novo mutation rates at the single-mutation
resolution in a human HBB gene region associated
with adaptation and genetic disease

Daniel Melamed,1,2 Yuval Nov,3 Assaf Malik,4 Michael B. Yakass,5,6 Evgeni Bolotin,1,2

Revital Shemer,7 Edem K. Hiadzi,6 Karl L. Skorecki,8 and Adi Livnat1,2
1Department of Evolutionary and Environmental Biology, University of Haifa, Haifa 3498838, Israel; 2Institute of Evolution, University
of Haifa, Haifa 3498838, Israel; 3Department of Statistics, University of Haifa, Haifa 3498838, Israel; 4Bioinformatics Unit, Faculty of
Natural Sciences, University of Haifa, Haifa 3498838, Israel; 5West African Centre for Cell Biology of Infectious Pathogens (WACCBIP),
Department of Biochemistry, Cell and Molecular Biology, University of Ghana, Legon-Accra 00233, Ghana; 6Assisted Conception
Unit, Lister Hospital and Fertility Centre, Accra CT966, Ghana; 7The Ruth and Bruce Rappaport Faculty of Medicine and Research
Institute, Technion—Israel Institute of Technology, Haifa 3525433, Israel; 8The Azrieli Faculty of Medicine, Bar-Ilan University,
Safed 1311502, Israel

Although it is known that the mutation rate varies across the genome, previous estimates were based on averaging across

various numbers of positions. Here, we describe a method tomeasure the origination rates of target mutations at target base

positions and apply it to a 6-bp region in the human hemoglobin subunit beta (HBB) gene and to the identical, paralogous

hemoglobin subunit delta (HBD) region in sperm cells from both African and European donors. The HBB region of interest

(ROI) includes the site of the hemoglobin S (HbS) mutation, which protects against malaria, is common in Africa, and has

served as a classic example of adaptation by randommutation and natural selection. We found a significant correspondence

between de novomutation rates and past observations of alleles in carriers, showing that mutation rates vary substantially in

a mutation-specific manner that contributes to the site frequency spectrum. We also found that the overall point mutation

rate is significantly higher in Africans than in Europeans in the HBB region studied. Finally, the rate of the 20A→Tmutation,

called the “HbS mutation” when it appears in HBB, is significantly higher than expected from the genome-wide average for

this mutation type. Nine instances were observed in the African HBB ROI, where it is of adaptive significance, representing at

least three independent originations; no instances were observed elsewhere. Further studies will be needed to examine mu-

tation rates at the single-mutation resolution across these and other loci and organisms and to uncover the molecular mech-

anisms responsible.

[Supplemental material is available for this article.]

It is widely known that mutation rates vary across the genome at
multiple scales (Hodgkinson and Eyre-Walker 2011; Rahbari
et al. 2016; Carlson et al. 2018) and are affected bymultiple factors,
from themutation type (Gojobori et al. 1982; Bulmer 1986), to the
local genetic context (Gojobori et al. 1982; Bulmer 1986; Blake
et al. 1992; Hwang and Green 2004; Rahbari et al. 2016; Carlson
et al. 2018), to the general location in the genome (Wolfe et al.
1989; Matassi et al. 1999; Lercher et al. 2001; Ellegren et al.
2003). Although this knowledge is highly advanced now com-
pared with what was known a mere decade ago (Campbell et al.
2012; Michaelson et al. 2012; Francioli et al. 2015; Rahbari et al.
2016; Carlson et al. 2018), it could be enhanced further. In partic-
ular, rate measurements to date all have been based on averages of
various kinds, such as an average across the genome (Nachman
and Crowell 2000; Rahbari et al. 2016), or across the instances of
any particular motif (Hwang and Green 2004; Carlson et al.
2018), or in certain cases, across the entire stretch of a gene
(Haldane 1949; Vogel and Motulsky 1997; Kondrashov 2003). In

contrast, technological limitations have precluded measuring mu-
tation rates at particular base positions and of particularmutations
at such positions. However, suchhigh-resolution knowledge of the
mutation rate variation would bear on multiple open questions in
genetics and evolution—from the relative importance of mutation
rate variation to the site frequency spectrum (SFS) (Harpak et al.
2016; Lek et al. 2016; Mathieson and Reich 2017), to its impor-
tance for adaptive evolution and parallelism (Inoue et al. 2001;
Crow et al. 2009; Dumas et al. 2012; Losos 2017; Kratochwil
et al. 2019; Kratochwil and Meyer 2019; Lind 2019; Xie et al.
2019), to its contribution to recurrent genetic disease and cancer
(Lupski 1998; McClellan and King 2010; Veltman and Brunner
2012; Shendure and Akey 2015).

The most precise way of measuring mutation rates, free of bi-
ases attributable to past natural selection or random genetic drift
events, is offered by de novomutations—mutations that appeared
for the first time in their carrier (Goldmann et al. 2016; Rahbari
et al. 2016). These mutations are usually detected by studies com-
paring the genomes of children to those of their parents, also
known as “trio studies” (Roach et al. 2010; Conrad et al. 2011).

Corresponding author: alivnat@univ.haifa.ac.il
Article published online before print. Article, supplemental material, and publi-
cation date are at https://www.genome.org/cgi/doi/10.1101/gr.276103.121.
Freely available online through the Genome Research Open Access option.

© 2022 Melamed et al. This article, published in Genome Research, is available
under a Creative Commons License (Attribution 4.0 International), as described
at http://creativecommons.org/licenses/by/4.0/.

Method

488 Genome Research 32:488–498 Published by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/22; www.genome.org
www.genome.org

 Cold Spring Harbor Laboratory Press on April 7, 2022 - Published by genome.cshlp.orgDownloaded from 

mailto:alivnat@univ.haifa.ac.il
https://www.genome.org/cgi/doi/10.1101/gr.276103.121
https://www.genome.org/cgi/doi/10.1101/gr.276103.121
http://genome.cshlp.org/site/misc/terms.xhtml
http://creativecommons.org/licenses/by/4.0/
http://creativecommons.org/licenses/by/4.0/
http://genome.cshlp.org/site/misc/terms.xhtml
http://genome.cshlp.org/
http://www.cshlpress.com


However, because each individual carries only a small number
(e.g., several dozen in humans) of de novo mutations scattered
across the genome, the chance of encountering any particular tar-
get mutation of interest is miniscule, rendering it impractical to
measure rates of target mutations using such studies.

To overcome this barrier, we have developed a method that
enables identifying and counting, with high accuracy, ultrarare ge-
netic variants of choice in extremely narrow regions of interest
(ROIs) within large populations of cells, such as a single target mu-
tant in 100 million genomes. Because this method has both an er-
ror rate lower than the human mutation rate and sufficient yield
for the purpose, it enables measuring the frequencies of target mu-
tations of choice in human sperm samples by counting their de
novo instances at a single-digit resolution. For variants that are
not expected to affect sperm fertility and viability (as in the case
below), this frequency is the evolutionarily relevant mutation
rate in males. Note that aside from this evolutionary application,
ultra-accurate methods of mutation-detection are sought after for
early detection of cancer, noninvasive prenatal testing, early iden-
tification of virus within host, and more (Salk et al. 2018).

As a first target for this method, we chose two sites: a 6-bp re-
gion spanning three codons within the human hemoglobin sub-
unit beta (HBB) gene that is of great importance for adaptation
and hematologic disease, and the identical, paralogous region
within the hemoglobin subunit delta (HBD) gene. The former re-
gion includes, among others, the site of the hemoglobin S (HbS)
mutation. The most iconic balanced polymorphism mutation
(Pauling et al. 1949; Allison 1954; Ingram 1957; Cavalli-Sforza
and Feldman 2003; Feng et al. 2004; Hartl and Clark 2007), the
HbS mutation is an A to T transversion (GAG→GTG, Glu→Val)
in codon 6 of HBB causing sickle-cell anemia in homozygotes
(Pauling et al. 1949) and providing substantial protection against
severe malaria in heterozygotes (Allison 1954; Flint et al. 1998;
Kwiatkowski 2005; Piel et al. 2010). Malaria, in turn, has been a
leading cause of human morbidity and mortality, often causing
more than a million deaths per year in the recent past, with
Africa bearing the brunt of the disease burden (Carter and
Mendis 2002), and thus has been possibly the strongest known
agent of selection in humans in recent history (Kwiatkowski
2005). Besides the HbS mutation, many other mutations, both
point mutations and indels, are also known at this site, many of
which are involved in hematologic illness (Hardison et al. 2002;
Hardison and Miller 2002). In contrast to HBB, mutations in
HBD have a more limited effect and are not thought to confer re-
sistance to malaria, because the HBD’s lower expression levels
make it account for <3% of the circulating red blood cell hemoglo-
bin in adults (Steinberg and Adams 1991). Although the popula-
tion prevalence of the HBB mutations, whether beneficial or
detrimental, is normally attributed to natural selection, so far it
has not been possible to examine to what degree, if at all, muta-
tional phenomenamay also be relevant to their prevalence. To ad-
dress this gap, we sought to characterize the rates of mutations,
including the HbS mutation, in the HBB and HBD ROIs in sperm
samples of both African and European donors.

Results

To substantially reduce the false positive rate resulting from PCR
amplification or high-throughput sequencing errors, following ex-
traction of the DNA from the sperm of the donors, we first remove
the majority of wild-type (WT) ROI molecules from each sample.
Specifically for the target sites, we use the restriction enzyme (RE)

Bsu36I, which cleaves the WT sequence CCTGAGG at positions
16–22 of HBB and the paralogous positions of HBD while leaving
the HbS mutant and other mutants in these positions intact.
Besides substantially reducing the false positive rate, thisWTdeple-
tion has the additional benefit of reducing the sequencing costs by
the same factor, because it removes the majority of fragments
whose sequences are known to be WT (Fig. 1; Supplemental Text;
Supplemental Figs. S1–S4).

Importantly for the mutation rate calculation, we keep track
of the number ofWTmolecules removed by accurately calculating
theprotectedmutants’ enrichment factoronaper sample basis. For
this purpose, we generate twomixtures, each of which includes, in
addition to the DNA studied, known amounts ofmockDNA that is
resistant to the RE digestion (Supplemental Text S2; Supplemental
Fig. S2).Next,weapply the sameprotocol to the twomixtures,with
the exception that the RE digestion step is applied to only one of
them (Supplemental Text S2; Supplemental Fig. S2). The ratio of
the ratios of sensitive to resistant molecules identified for the two
mixtures after treatment at the sequence analysis step provides
the enrichment factor of the protected mutants (Supplemental
Text S2; Supplemental Fig. S2). This enrichment factor, multiplied
by the number of WT molecules called, with the addition of the
small number of mutants called, provides the number of cells ana-
lyzed (Supplemental Text S2; Supplemental Fig. S2). We set up the
system in such manner that the calculation of the enrichment
factor depends onlyonquantities that are precisely known, includ-
ing volume measurements (Supplemental Text S2; Supplemental
Fig. S2) and numbers of WT and mutant molecules called during
the barcode-based sequence analysis stage as described below.

Following this mutation enrichment step, we attach unique
barcodes to the DNA fragments to reduce error by consensus se-
quencing of copies originating from the same original fragment.
For this purpose, we build on and improve themaximumdepth se-
quencing method (MDS) (Jee et al. 2016), which allows one to fo-
cus on a narrow region of interest (ROI) and whose key idea is to
attach the barcodes directly to a cleaved end of one of the two
strands of each original target DNA fragment via a DNA polymer-
ase–assisted extension reaction, as opposed to including the
barcode only in the first copy of the DNA by extending the target-
specific primer that carries it. In this manner, errors that occur
during the first critical copying step are also detected via consen-
sus sequencing of reads sharing the same barcode (Fig. 1;
Supplemental Text; Supplemental Fig. S1; Jee et al. 2016). To all
of the above, we add multiple innovations that increase sequenc-
ing accuracy, handle the large amounts of genomic DNA required,
and enable accuratemeasurement of the Bsu36I enrichment factor
per sample as needed for the mutation rate calculation
(Supplemental Figs. S1–S5). We refer to this whole method as mu-
tation enrichment followed by upscaled maximum depth se-
quencing (MEMDS) (for a complete protocol, see Supplemental
Text S1–S9 and Methods).

Finally, following sequence analysis (Supplemental Figs.
S6–S10; Supplemental Table S2) the number of appearances of
any mutation that confers resistance to the restriction enzyme is
counted and divided by the calculated number of cells analyzed,
providing the evolutionarily relevant de novo origination rate for
each specificmutation inmales per donor and per group of donors
(Supplemental Figs. S11–S13; Supplemental Table S3). Following
previous literature, we ignore G→T and C→Tmutations in the bar-
coded strand (C→A and G→A in the sequenced strand) because
they are thought to reflect not lasting mutations but the experi-
mental disruption of an ongoing in vivo process of base damage

Rates of target de novo mutations

Genome Research 489
www.genome.org

 Cold Spring Harbor Laboratory Press on April 7, 2022 - Published by genome.cshlp.orgDownloaded from 

http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/
http://www.cshlpress.com


and repair aswell as in vitromutations at-
tributed to guanine oxidation and cyto-
sine deamination (Supplemental Text
S8; Supplemental Figs. S12–S14; Arbei-
thuber et al. 2016; Jee et al. 2016). In addi-
tion, we exclude C→A, the complement
of G→T, owing to its association with
the latter and its frequent appearance in
the data (Supplemental Text S8; Supple-
mental Fig. S12). Following normal loss
of material of ∼65%, true positives of
non G→T, C→T, and C→A mutations
are identifiedwith a false positive rate (er-
ror rate) <2.5×10−9 per base (Fig. 2).
Overall, MEMDS surpasses recent cut-
ting-edge methods in both accuracy
(Fig. 2A) and yield (Fig. 2B; see also Sup-
plemental Fig. S11).

With the help of this method, we
examined a total of more than half a bil-
lion gene fragments individually taken
from sperm of 12 donors. Because one
of the samples was a mixture from two
African donors with a total number of
cells similar to the other African samples,
we consider it here as a single sample of
mixed African origins, bringing the total
to 11 samples, seven from African and
four from European donors (Supplemen-
tal Table S1). The numbers of cells
scanned and de novo mutations ob-
served per person are shown in Table 1.

Average per ROI mutation rates

The average per base point mutation
rates in the HBB and HBD ROIs are 3.3 ×
10−8 and 2.79×10−8, respectively, signif-
icantly higher by ∼2.6-fold (P< 2×10−8,
95% CI 2.4 ×10−8–4.4 ×10−8) and ∼2.2-
fold (P<6.7 ×10−5, 95% CI 1.9 ×10−8–
4 ×10−8, two-sided binomial exact test)
than 1.25×10−8, which we use as an esti-
mate of the genome-wide per base per
generation point mutation rate (Supple-
mental Text S10). The average indel
rates in these ROIs were 1.1 ×10−8 and
4.3 ×10−9, respectively, significantly
higher by approximately ninefold (P<
4.3 ×10−25, 95% CI 8×10−9–1.5 ×10−8)
and ∼3.4-fold (P<1.8 ×10−4, 95% CI
2.3 ×10−9–7.3 ×10−9; two-sided binomi-
al exact test) than the expected 1/10 of
the point mutation rate (Supplemental
Text S10). The average point mutation
rate of the HBB ROI is not significantly
higher than that of the HBD ROI (P=
0.49, two-sided Fisher’s exact test), and
the average indel rate of the former is sig-
nificantly higher by ∼2.6-fold than that
of the latter (P=0.0015, OR 95% CI
1.42–5.01; two-sided Fisher’s exact test).

Figure 1. Experiment overview. Sperm samples are obtained fromworld regionswithhighor lowmalaria
infection burden (malaria impactmap adjusted from the CDCmap) (CDCDivision of Parasitic Diseases and
Malaria 2019). Whole-genome DNA is extracted and an amount equivalent to 60–80 million sperm cells
per donor is subjected to Bsu36I digestion. Bsu36I cleaves the DNA at multiple sites, including the HBB
and HBD ROIs, which carry a specific recognition sequence. The HbS mutation blocks Bsu36I digestion
and is thus enriched over the wild-type (WT). A primary barcode is added directly to each antisense DNA
strand that carries theHBB orHBD ROI via a DNA polymerase–assisted fill-in reaction. Because each barcode
consists of a random sequence of nucleotides, each of the numerous target fragments has its own unique
barcode, illustrated by a unique color on the left end of the representation of each barcoded fragment.
Multiple single-strand copies are each generated directly from each uniquely barcoded target fragment
by linear amplification. A secondary barcode composed of a random sequence of nucleotides is added to
the other end of each of these copies by a single primer extension reaction, illustrated by a unique color
on the right end of each barcoded fragment. Thus, only full-length fragments (i.e., mutant or WT ROI se-
quences that evaded Bsu36I digestion) carry both the primary and the secondary barcodes and can be am-
plified by PCR for high-throughput sequencing. At the sequence analysis step, sequencing reads
representing the PCR products of the linearly amplified copies are grouped together into families (see box-
es), where in each family, reads share the same primary barcode sequence. Sporadic sequencing errors or
DNA-polymerase errors generated during linear or subsequent amplification steps are unlikely to be repeat-
ed inmultiple copies andare removed.Denovomutations, suchas theHbSmutation, are easily identifiedby
their appearance inmultiple reads fromdistinct linear-amplificationevents. Foracompletedescriptionof the
library preparation protocol, which includes additional steps, see Supplemental Figures S1–S3.

Melamed et al.

490 Genome Research
www.genome.org

 Cold Spring Harbor Laboratory Press on April 7, 2022 - Published by genome.cshlp.orgDownloaded from 

http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/
http://www.cshlpress.com


Basic characteristics of mutation rate variation

The variance in the rates of de novo point mutations is higher than
expected from the genome-wide average (GWA) rates of thesemuta-
tions (e.g.,Harris 2015;Harris and Pritchard 2017), and their relative
rates are different than expected from the GWA rates (P<10−6 in an
omnibus multinomial test, adjusted for the excluded mutations,
compared to the rates of Rahbari et al. 2016), even when adjusting
the latter for the 3-mer, 5-mer, and 7-mer nucleotide contexts (P<
10−5 in all cases, compared to the rates of Carlson et al. 2018).
The overall de novo rates of the six observed deletion types are high-
ly nonuniform (P<10−6, multisample proportion test).

Correspondence between de novo rates and observations

of alleles in carriers

The HbS and Hb-Leiden mutations both have been notably ob-
served onmultiple different genetic backgrounds in human popu-
lations, the former particularly in Africans (Flint et al. 1998;
Hardison et al. 2002; Hardison and Miller 2002). Here, they are
the point mutation of highest de novo rate in the African HBB
ROI and the deletion mutation of highest de novo rate in any
gene and ethnicity. Furthermore, of the 23 potential deletions of
up to size 3 that are observable by our method per ROI, only five
deletions (16delC, 17_18delCT, 18_19delTG, 19_21delGAG or
the equivalent 22_24delGAG—the Hb-Leiden mutation—and

20delA) have been reported to date on theHbVar database—a large
collection of hemoglobin variants (Hardison et al. 2002; Hardison
andMiller 2002)—all inHBB; and of these deletion types, a signifi-
cantly higher fraction is observed here de novo compared to dele-
tion types not reported on HbVar (Supplemental Text S11).
Pooling together both the HBB and HBD ROIs given the similarity
of de novo indel types observed between them, this effect is signifi-
cant both with (P= 0.0078, OR 95% CI 2.17–818.08, two-sided
Fisher’s exact test) and without (P= 0.024, OR 95% CI 1.44–
653.93, two-sided Fisher’s exact test) the Hb-Leiden mutation,
showing that the correspondencebetweendenovo rates and alleles
in populations extends beyond theHbS andHb-Leidenmutations.
Although the same analysis cannot be repeated for the pointmuta-
tions because of the smaller number of observable mutation types
and the synonymous versus nonsynonymousmutation confound,
further observations are in the expected direction (Supplemental
Text S11). The correspondence observed could not have been pre-
dicted from the mutations’ GWA rates, even when adjusting for
the genetic context (Supplemental Text S10, S11).

Between-population comparisons

To provide a conservative statistical test of a population-level dif-
ference that excludes individual- or sample-level variation alone
as accounting for the result, we compared the per person overall
point mutation rates in the HBB ROI between the African and
European groups. Results showed that these rates were signifi-
cantly higher in the African than in the European group both
with (P=0.0061) and without (P=0.043, two-sided Wilcoxon
rank-sum test) counting the HbSmutation. Next, pooling together
cells fromall donorswithin eachpopulation to estimate the overall
point mutation rate in the HBB ROI shows it to be significantly
higher by 2.57-fold in the African than in the European donors
(P<0.006, OR 95% CI 1.27–5.49, two-sided Fisher’s exact test).
Thus, there is a significant population-level difference between
the continental groups in the overall point mutation rate in this
narrow ROI that is not attributable to individual- or sample-level
variation. In contrast, in the HBD ROI, the number of mutations
was not high enough to establish such a difference above and be-
yond individual- or sample-level variation (P= 0.18, two-sided
Wilcoxon rank-sum test). In contrast to the HBB overall point mu-
tation rate, the overall indel rate did not vary significantly between
these groups in either ROI (P=0.35 and P=1, respectively, two-sid-
ed Fisher’s exact test).

Position 20 mutation rates

Two particularly notable mutations are the HbS and Hb-Leiden
mutations (details below). Considering codons 6 and 7 equivalent
with respect to the latter mutation, both mutations can be said to
affect position 20. Using the aforementioned conservative test to
exclude sample-level variation alone as accounting for the result,
the overall per person point mutation rates at position 20 specifi-
cally are significantly higher in the HBB than in the HBD ROI in
Africans (P= 0.017, two-sided Wilcoxon rank-sum test) but not
in Europeans (P=1). In the former, the overall point mutation
rate at position 20 pooled across individuals is ∼6.1× higher in
HBB than in HBD (P= 0.0061, OR 95% CI 1.50–37.14, two-sided
Fisher’s exact test). In the case of the overall indel rates at position
20, although the pooled rates are significantly higher in HBB than
in HBD for both Africans and Europeans (P= 0.044, OR 95% CI
1.03–6.54 and P=0.027, OR 95% CI 1.11–7.02, respectively; two-
sided Fisher’s exact tests), sample-level variation cannot be

B

A

Figure 2. Accuracy and yield ofMEMDS comparedwith current cutting-
edge methods for studying target regions. (A) Under a highly conservative
estimate, MEMDS increases accuracy by at least 40-fold compared to du-
plex sequencing (DS) (Kennedy et al. 2014) and maximum depth se-
quencing (MDS) (Jee et al. 2016). (B) MEMDS also increases yield per
sequenced base (i.e., the number of MEMDS confirmed bases divided
by the number of paired-end sequenced bases) by orders of magnitude
over both DS and MDS (Kennedy et al. 2014; Jee et al. 2016). Notice
that in MEMDS, the yield can be higher than 1 because the mutation en-
richment factor is accurately calculated (Supplemental Text S2) and the
base identity is known for the ROI sequences that were digested and re-
moved from the final sequencing libraries (they have the restriction en-
zyme recognition sequence). Although the accuracy of DS has been
improved in the context of sequencing large parts of the genome
(Abascal et al. 2021), yield considerations and other limitations preclude
applying current DS-based methods to narrow ROIs and target mutations
(Kennedy et al. 2014; Supplemental Text S1) with the same efficiency as
that of MEMDS.

Rates of target de novo mutations

Genome Research 491
www.genome.org

 Cold Spring Harbor Laboratory Press on April 7, 2022 - Published by genome.cshlp.orgDownloaded from 

http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/
http://www.cshlpress.com


excluded as the source of the differences (P=1 and P= 0.69 for
Africans and Europeans, respectively; two-sided Wilcoxon rank-
sum tests).

Rates of the Hb-Leiden mutation

The 3-bp in-frame deletion variant of either codon 6 or codon 7
that is called “the Hb-Leiden mutation” when it occurs in HBB re-
curs noticeably more often than other mutations (comparing its
per person rates to those of all other deletions combined to exclude
sample-level variation, P<0.0005, two-sided Wilcoxon rank-sum
test). Pooled across individuals, it appears at rates of 1.11×10−7

and 3.96× 10−8 in the HBB and HBD ROIs, respectively, ∼88.86×
and ∼31.66× higher than the 1.25×10−9 estimate (P=4.04×
10−58, 95% CI 7.82× 10−8–1.53×10−7; and P= 1.62× 10−13, 95%
CI 1.98×10−8–7.08×10−8), where the HBB rate is significantly
(∼2.81×) higher than the HBD rate (P=0.002, OR 95% CI 1.40–
5.63, two-sided Fisher’s exact test).

Rates of the HbS mutation

The 20A→T mutation called “the HbS mutation” when it appears
in the HBB ROI appears nine times in the African HBB ROI and no
times in the other cases combined (the EuropeanHBB ROI and the
European and African HBD ROIs) (P=0.023, 95% CI 1.5077–Inf;
two-sided Fisher’s exact test classifying each individual and gene

case as having [>0] or not having [=0] de novo 20A→T in sperm
and comparing the fractions of these classes between the groups).
The rate of the HbS mutation in the overall group (Africans and
Europeans combined)—2.7 ×10−8—is 19.6× higher (P<2 ×10−9,
rate 95% CI 1.24×10−8–5.13× 10−8) than expected from the
GWA for this mutation type (Supplemental Text S10), and its
rate in the African group specifically—4.74×10−8—is ∼35× higher
than expected from its GWA (P=1.2 ×10−11, rate 95% CI 2.17×
10−8–9.0 ×10−8; two-sided binomial exact test). In the African
group, it is the mutation that deviates the most (Supplemental
Table S4) from its GWA among the 12 observable pointmutations,
where its de novo rate varies significantly across samples (P=
0.0025, multisample proportion test), from 0 to 2.24×10−7 (the
latter rate being ∼163× faster than expected; P=2.23×10−10,
95% CI 7.27× 10−8–5.23×10−7, two-sided binomial exact test).
Note that the evolutionarily relevant mutation rate depends on
the fraction of the mutation in sperm per se, not on whether it re-
peats because of independent originations or owing to an early ap-
pearance followed by duplications during spermatogenesis. The
minimal number of independent originations of the HbS muta-
tion is three, given that three individuals produced it de novo,
and the corresponding minimal rate of independent occurrence
of the HbS mutation in the sperm samples (a rate lower than the
actual evolutionarily relevantmutation rate observed) across all in-
dividuals is 9.01 ×10−9. This rate is still ∼6.5× higher than the

Table 1. HBB and HBD ROI mutation counts

Counts of de novo mutations identified by MEMDS in DNA from 11 sperm samples, seven from African (AFR) and four from European (EUR)
donors. The numbers next to the donor labels refer to the calculated number of haploid individual genomes scanned by MEMDS. Light gray, dark
gray, and black cell shading represent mutation counts of 1, 2–4, and ≥5, respectively. Some of the mutations have been observed before in carriers
and have common names when they appear in HBB. These are 16C→G, Hb-Gorwihl; 16C→T, Hb-Tyne; 17C→G, Hb-Warwickshire; 17C→T, Hb-Aix-
les-Bains; 20A→G, Hb-Lavagna; 20A→T, HbS; 20A→C, Hb-G-Makassar; 22G→C, Hb Bellevue III; and 19_21del or 22_24del, Hb-Leiden. Note that
Hb-Leiden can result from deletion of either positions 19–21 or positions 22–24, which include the same GAG sequence, both of which can be en-
riched and captured by MEMDS.
aHbS.
bHb-Leiden.

Melamed et al.

492 Genome Research
www.genome.org

 Cold Spring Harbor Laboratory Press on April 7, 2022 - Published by genome.cshlp.orgDownloaded from 

http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/
http://www.cshlpress.com


genome-wide evolutionarily relevant mutation rate for this muta-
tion type (P=0.011, 95% rate CI 1.86×10−9–2.63×10−8, two-sided
binomial exact test).

Discussion

The data expose an ultrahigh resolution correspondence between
de novo mutation rates and past observations of alleles in carriers
(Flint et al. 1998; Hardison et al. 2002; Hardison and Miller 2002;
Supplemental Text S11; Results), suggesting that these rates con-
tribute to the prevalence of these mutations in populations. This
correspondence could not have been predicted from the GWA
rates of these mutation types even when adjusting for the local ge-
netic context (Supplemental Text S10, S11). Consideration of the
deletions observed clarifies this point. Although past literature fea-
tured a single microdeletion rate decreasing with size (Gu and Li
1995; Kondrashov 2003; Lynch 2010), sized-based rate variation
cannot explain the aforementioned correspondence obtained for
same-sized deletions, the higher rate of the Hb-Leiden mutation
compared to the smaller deletions, or the extent of rate variation
observed. Thus, the aforementioned correspondence, together
with the fact that in these ROIs, the rates of some mutations
(e.g., those of the HbS and Hb-Leiden mutations) deviate much
more than others from their corresponding GWA rates show that
mutation-specific rates vary not only in the case of large rearrange-
ment mutations (Gu et al. 2008; Zhang et al. 2009) but also in the
cases of pointmutations andmicroindels. This rate variation could
not have been seen using average-based measures (Kondrashov
2003; Lynch 2010) and establishes the relevance of mutation-spe-
cific point mutation and microindel rates to the site frequency
spectrum (SFS) (Harpak et al. 2016; Lek et al. 2016; Mathieson
and Reich 2017).

The overall point mutation rate in the HBB ROI is signifi-
cantly higher in the African than in the European group even un-
der a nonparametric comparison, which shows that the difference
cannot be attributed to individual- or sample-level variation alone.
Thus, it represents a significant population-level difference be-
tween the groups. This difference, occurring in an extremely nar-
row region spanning three codons of great importance for
adaptation and genetic disease, is at least two orders of magnitude
larger than previously reported differences in GWAmutation rates
between continental groups (Harris 2015; Harris and Pritchard
2017). The correspondence between mutation-specific de novo
rates and observations of alleles in carriers as well as this large dif-
ference in the overall point mutation rate between populations in
a narrow region establish the importance of measuring mutation
rate variation at an ultrahigh resolution.

Potential contributions to mutation rates from gross-level bi-
ological or environmental factors, such as age or pesticides, cannot
sufficiently explain the results. First, the two populations are sim-
ilar in ages (Supplemental Table S1). Second, any mutation-specif-
ic effect, like the correspondence between de novo rates and
observations of alleles in carriers, cannot be explained by such
macrolevel factors, because the latter cannot be expected to affect
the rates of equivalent mutations, such as 20A→T in HBB versus
HBD, differently. Third, the overall point mutation rate difference
between the populations is also unlikely to be explained by them,
because if on their own such macrolevel factors had affected the
ROIs, they should have affected the entire genome similarly, yet
GWA differences in point mutation rates between continental
groups are smaller than the ROI-specific differences observed

here (Harris 2015; Harris and Pritchard 2017). Note that if macro-
level factors affectmutation rates in interactionwithmutation-, lo-
cus-, individual-, and/or population-specific factors, then such
specific factorsmust be assumed in any case. Thus, rather than sug-
gesting involvement of macrolevel factors, the data suggest a com-
plex picture of mutation rates involving mutation-specific
influences.

In addition, although the replication of mutations during
spermatogenesis (clonal dependence) may make some contribu-
tion to the data, in practice it is insufficient to account for the sig-
nificant results. First, the significance of the continental difference
in the overall point mutation rates in HBB is impervious to any
sample-level variation, including clonal dependence, as shown
by the nonparametric between-population comparison described
in the results section. Second, the correspondence between muta-
tion rates and observations of alleles in carriers cannot be driven by
it. On the contrary, in the absence of a cellular-level mechanism
that induces specific mutations in a population-specific manner
in accord with the cellular generation during spermatogenesis, dif-
ferences in mutation timing during spermatogenesis could only
addnoise to the patterns observed, and thus any presence of clonal
dependence would only make it more difficult to obtain signifi-
cance for such patterns and in that sense is conservative to finding
a pattern. Thus, more likely, the significance of these patterns is
driven by independent originations of the mutations. These inde-
pendent originations are consistent with mutation-specific rates
being influenced by genetic and/or epigenetic factors (Livnat
2013, 2017).

The prevalence of a mutation of heterozygote advantage in a
population and of reading-frame conservation in a coding se-
quence are generally considered to be outcomes of selection.
However, here, both theHbSmutation, which provides strongma-
laria protection in heterozygotes, and the Hb-Leiden mutation,
which is an in-frame deletion, are frequent not because of selection
but because of frequent de novo origination. Indeed, that the rate
of the in-frameHb-Leidenmutation ismuchhigher than that of all
other observed deletions, which are frameshift deletions, shows
reading-frame conservation that is not caused by selection (Lek
et al. 2016) but rather bymutational phenomena. This observation
provides a concrete example of “mutational conservation”—evo-
lutionary conservation caused by mutational reasons which, if it
occurs more broadly, could offer an explanation for the puzzling
observation of reading-frame conservation bias in pseudogenes
(Zhang and Gerstein 2003).

That the genetic sequences at and adjacent to the ROIs are
identical for the two populations and for the two genes yet themu-
tation rates vary significantly between the populations and be-
tween the genes suggests that what affects these mutation rates
in the germline includes more than this local DNA sequence and
in that sense is complex (Livnat 2013, 2017). These results are con-
sistent with the observation that the variation of the mutation
rates across loci is partly cryptic (not explained by the local DNA
context) (Hodgkinson et al. 2009; Hodgkinson and Eyre-Walker
2011), especially in the case of A↔T transversions (Hodgkinson
et al. 2009), which include the HbS mutation type (A→T).
Combining the multiple insights discussed, the results suggest
that mutation rates are both mutation-specific and influenced in
a complex manner by the genetic and/or epigenetic background
(Livnat 2013, 2017).

TheHBB region spanning three codons is of particular impor-
tance for adaptation and genetic disease: it is the site of mutations
that provide strong protection against malaria (HbS and HbC, the

Rates of target de novo mutations

Genome Research 493
www.genome.org

 Cold Spring Harbor Laboratory Press on April 7, 2022 - Published by genome.cshlp.orgDownloaded from 

http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/
http://www.cshlpress.com


latter not observable by our method) and/or increase the risk for
hematologic disease (Flint et al. 1998; Hardison et al. 2002;
Hardison and Miller 2002). Thus, it is of interest that the overall
point mutation rate in this region is significantly higher than ex-
pected, and that it is significantly higher in the African than in
the European population. These results provide a clear case of a
connection between mutation rates and adaptive evolution, thus
moving beyond previous literature on the relevance of mutation
rates to adaptive evolution and its repeatability (Crow et al.
2009; Dumas et al. 2012; Kratochwil et al. 2019; Kratochwil and
Meyer 2019; Lind 2019; Xie et al. 2019).

The results underscore the importance of mapping the muta-
tion rate variation at an ultrahigh resolution. It is beyond this fact
that several observations on the HbS mutation specifically can be
mentioned. First, if one assumes that the HbS rate is the same for
both of the continental groups, the data show that it is signifi-
cantly higher by nearly 20-fold than expected from the GWA for
this mutation type, in both Africans and Europeans. Any amount
of hypothetical clonal dependence does not change this estimate
of the observed evolutionarily relevant mutation rate, because
the latter does not depend on the cause of the recurrence of the
mutation in the sperm. Even the observed minimal rate of inde-
pendent HbS originations in sperm is still significantly larger by
6.5× than the evolutionarily relevant GWA rate for this mutation
type. Consideration of the local genetic context does not change
this conclusion (Supplemental Text S10). Thus, although the clas-
sical explanation of the HbS case relied only on selection, even un-
der the most conservative assumptions the overall HbS mutation
rate observed here is notably higher than expected.

Second, given the significant continental difference in the
overall point mutation rate between the groups, it would be
surprising if the HbS mutation specifically does not show a conti-
nental effect. Consistent with this, in our samples, using themeth-
odology described, we observe no instances of it in Europeans but
nine instances of it in total in Africans, amounting to a rate ∼35×
higher than expected from the GWA of this mutation type in the
latter. Further consistent with a continental difference in the HbS
mutation rate, it fits with the broader correspondence between de
novo rates and observations of alleles in populations that HbS is
most frequent in Africans and in some other populations in the
Asianmalaria belt (Flint et al. 1998) and appears de novo in our Af-
rican but not in our European samples, whereas Hb-Leiden has
been observed across the globe (Hardison et al. 2002; Hardison
and Miller 2002) and appears de novo in both our African and Eu-
ropean samples.

Third, in the AfricanHBB ROI, out of 12 observable pointmu-
tations, the HbSmutation has the rate that deviates the most from
the corresponding GWA rate (Supplemental Table S4).

Fourth, it is striking that despite at least three independent oc-
currences of the HbS mutation in the HBB ROI, not a single case of
the equivalent 20A→T mutation in the HBD ROI was observed in
anydonor, African or European. Accordingly,wenote that the bina-
ry test establishing the significantly higher concentration of the
20A→Tmutation in the AfricanHBBROI as opposed to all other cas-
es (the EuropeanHBB ROI or theHBD ROIs), which is impervious to
any individual- or sample-level variance including clonal depen-
dence, suggests that the 20A→T mutation arises more frequently
where it is of adaptive significance than where it is not, although
data do not suffice to tell whether this effect results from a popula-
tion-level difference or from a locus-based difference or from both.

Knowing that the HbSmutation is advantageous in heterozy-
gotes under malarial pressure, how should we interpret these re-

sults? One possibility is that, for a reason unrelated to adaptation,
some individuals have a genomic fragility in HBB that generates
the HbS mutation at a high rate. Accordingly, it is merely a coinci-
dence thatHbSprovidesprotection againstmalaria, evenmore so if
that fragility applies more to Africans.

Another possibility is modifier theory (Feldman and
Liberman 1986; Altenberg et al. 2017), according to which alleles
affecting the mutation rate may be favored by selection under cer-
tain conditions (Leigh 1970;Moxon et al. 1994). However, because
the benefit of a modifier allele that increases the mutation rate is
tied to the excess beneficial mutations it helps to generate, and
because mutations are rare, it is normally expected that, for selec-
tion to be effective, it must act on a modifier allele that increases
the mutation rate across a long enough stretch of the genome
with which it remains linked for a long enough period of time,
so thatmany differentmutations potentially induced by this allele
over space and time are factored into its selective benefit
(Hodgkinson and Eyre-Walker 2011; Martincorena and
Luscombe 2013; Walsh and Lynch 2018). Thus, modifier theory
does not predict an increase in the rate of particular DNA muta-
tions at specific base positions, let alone in sexual, complex organ-
isms, nor the complex genetic and/or epigenetic influences on
such mutation rates suggested by the current data (cf. Leigh
1970; Moxon et al. 1994; Altenberg et al. 2017; Walsh and
Lynch 2018). On the contrary, the “reduction principle”—the
first-order principle in modifier theory—underscores the general
difficulty of accounting for increased mutation rates (Feldman
and Liberman 1986; Altenberg et al. 2017).

Finally, a recently proposed theory predicted that mutation-
specific origination rates are influenced by the complex genetic
and epigenetic background, that genetic relatedness inmutational
tendencies exist, and that theHbSmutation arisesmore frequently
in Africans than in Europeans (Livnat 2013, 2017). It holds that
novelty in evolution arises from emergent interactions, which
are then simplified through the generations by mutational mech-
anisms while being checked by natural selection (Livnat 2017),
one hypothetical example being that A→I RNA editing can mech-
anistically increase the A→G mutation rate in the corresponding
positions (cf. Popitsch et al. 2020). Based on these and other previ-
ouswork (Livnat and Papadimitriou 2016), we hypothesize that re-
curring, evolved processes acting on DNA and/or RNA through
epigenetic modifications (Klose and Bird 2006), RNA editing
(Nishikura 2010) and other mechanismsmay lead directly to their
own replacement and simplification via DNAmutations that arise
in the course of evolution from these processes’molecular nature,
mechanistically linking regulatory activity with structural muta-
tional changes—although whether and by what specific mecha-
nism this “replacement” hypothesis explains the HbS case
specifically (alternative decoding of A→I editing [Licht et al.
2019] or other mechanisms) is yet to be investigated. This raises
the possibility that a mutation of adaptive value such as the HbS
one need not initiate the process of adaptation but can arise later
in an evolutionary process where adaptations and mutation-spe-
cific rates jointly evolve (Livnat 2013, 2017), and thus studies on
the fundamental nature of mutation need to test for not only a
short-term response to environmental pressures (Luria and
Delbrück 1943; Cairns et al. 1988) but also a long-term one.

Unlike previous methods that could explore only diffuse rela-
tionships between long-term selection pressures and the evolution
of GWAmutation rates, the presentmethod offers the refined abil-
ity needed to explore such relationships, if they exist, at the muta-
tion-specific resolution. Because this method examines the

Melamed et al.

494 Genome Research
www.genome.org

 Cold Spring Harbor Laboratory Press on April 7, 2022 - Published by genome.cshlp.orgDownloaded from 

http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/
http://www.cshlpress.com


mutation-specific resolution for the first time, it provides only ini-
tial estimates of mutation rates, which will require further investi-
gation and refinement. Furthermore, it cannot be applied
currently to all mutations, because it requires a special RE for
each ROI. However, given the numerous REs available and their
short recognition sequences, which imply large representation of
these sequences across the genome, it likely applies across many
loci and organisms. Therefore, some of the most important tasks
now are to examine the high-resolution mutation rate variation
across additional loci of interest and to explore the molecular
mechanisms responsible.

Methods

For the experimental design and different stages of library prepara-
tion, see Supplemental Text S1–S3 and Supplemental Figures S1–
S3. All of the oligos for the sperm DNA library preparation de-
scribed in Supplemental Text S14 were ordered from Integrated
DNA Technologies (IDT) with standard desalting purity, unless
otherwise mentioned. All enzymes were obtained from New
England Biolabs (NEB). Plasmid mini-prep, PCR purification, and
agarose gel extraction were performed with QIAGEN kits.

Spike-in plasmid preparation

Four puc19-based plasmids were generated. Two (ALP13 and
ALP17) were designed to carry theHBB genomic segment from po-
sition −203 to +223 relative to the mRNA translation start site,
with the Bsu36I restriction site CCTGAGG replaced with
TTATGTT and ACGAGAC, respectively; and two others (ALP16
and ALP18) were designed to carry the HBD genomic segment
from position −59 to +220 relative to the mRNA translation start
site, with the Bsu36I restriction site replaced with TTATGTT and
ACGAGAC, respectively. To prepare the spike-in mixture, the
four plasmids were linearized by BamHI, mixed in equal amounts,
and diluted to 10 fg/µL for the AFR1, AFR3, AFR5, AFR6, AFR7,
EUR3, and EUR4 samples and to 5 fg/µL for all other samples.

Collection of sperm samples

Semen samples from Africans were collected in the Assisted
Conception Unit of the Lister Hospital and Fertility Centre in
Accra, Ghana, following clinical standards. Semen samples from
Europeans were purchased from Fairfax, a large US cryobank,
with the approvals of the Institutional Review Board of the
Noguchi Memorial Institute for Medical Research (NMIMR-IRB
081/16-17) at the University of Ghana, Legon, the Rambam
Health Care Center Helsinki Committee, Haifa (0312-16-RMB),
and the Israel Ministry of Health (20188768). Donors with a histo-
ry of cancer or infertility or with high fever in the 3mo before don-
ation were excluded. Informed consent was obtained from all
participants, and personal identifying information was removed
and replaced with codes at the source.

DNA extraction from sperm cells

The DNA isolation protocol was modified fromWeyrich (2012). A
semen sample from a single donor was divided into 500-µL ali-
quots in multiple screw-capped tubes. The sperm aliquots were
washed twicewith 70% ethanol to remove seminal plasma. The re-
maining cells were rotated overnight at 50°C in a 700-µL lysis buff-
er (50 mM Tris-HCl [pH 8.0], 100 mM NaCl2, 50 mM EDTA, 1%
SDS) containing 0.5% Triton X-100 (Fisher BioReagents BP151-
100), 50 mM Tris(2-carboxyethyl) phosphine hydrochloride
(TCEP; Sigma-Aldrich 646547), and 1.75 mg/mL Proteinase K

(Fisher BioReagents BP1700-100). Lysates were centrifuged at
21,000g for 10 min at room temperature, and the supernatants
were united in a single tube. DNA purification from the cleared ly-
sate was performed using QIAGEN Blood and Cell Culture DNA
Maxi Kit (13362). Specifically, 5 mL lysate were supplemented by
15 mL buffer G2 (800 mM guanidine hydrochloride, 30 mM
Tris-HCl [pH 8.0], 30 mM EDTA [pH 8.0], 5% Tween 20, 0.5%
Triton X-100), vortexed thoroughly, and allowed to gravity flow
through a single Genomic-tip 500/G column pre-equilibrated by
10mL buffer QBT (750mMNaCl, 50mMMOPS [pH 7.0], 15% iso-
propanol [v/v]). Resin was washed twice by 15 mL Buffer QC (1 M
NaCl, 50 mMMOPS [pH 7.0], 15% isopropanol [v/v]), and elution
was performed by 15 mL Buffer QF prewarmed to 50°C (1.25 M
NaCl, 50 mM Tris-HCl [pH 7.0], 15% isopropanol [v/v]). DNA
was precipitated by adding 10.5mL room temperature isopropanol
to the elute, inverting the tube 10 times, and using a sterile tip to
spool and transfer the DNA to a screw-capped tube containing 500
µL buffer EB (10 mM Tris-HCl [pH 8.5]). The DNA was allowed to
dissolve overnight at room temperature. For each donor, a small al-
iquot from the extracted DNA was PCR amplified and Sanger se-
quenced to verify the exact sequence of the HBB and HBD
regions and to confirm that the donors were homozygous for the
WT sequence for both ROIs.

Enzymatic digestion

For the Bsu36I-treated sample (Supplemental Text S1–S3), ∼264 µg
sperm DNA, equivalent to 80 million haploid cells (for AFR2, a
DNA amount equivalent to 60 million cells was used), were mixed
with a plasmid spike-in mixture (0.2 pg for AFR1 and 0.1 pg for
other donors) and equally divided in a 96-well plate. Bsu36I diges-
tion was performed overnight at 37°C according to the manufac-
turer’s instructions using 5 units per well. Then, each well was
supplemented by 6 units of HpyCH4III to generate the primary
barcode attachment site, and digestion continued for an addition-
al 3 h. For the Bsu36I-untreated reaction, 13.2 µg sperm DNA
(and 9.9 µg for AFR2), representing 5% of the DNA amount used
for the Bsu36I digest, were mixed with 6 times the volume of plas-
mid spike-in mixture, aliquoted to five tubes, and incubated over-
night with 2 units SalI-HF per tube instead of Bsu36I to allow for
similar conditions of DNA digestion without affecting the
Bsu36I and HpyCH4III sites. Then, each well was supplemented
by 6 units of HpyCH4III and digestion continued for an additional
3 h, followed by DNA purification.

Primary barcode labeling and linear amplification

Direct barcode labeling and linear amplification of the digested
HBB and HBD strands were performed in a single reaction in 96-
well plates. Eachwell contained∼1 µg of digestedDNA, 0.1 µMpri-
mary barcode oligo (oligo A) (Supplemental Text S14), and 1 µMof
5′-phosphorothioate-protected primer for linear amplification
(oligo B). The reaction was performed with Q5 high-fidelity poly-
merase according to themanufacturer’s instructions, using the fol-
lowing thermocycler parameters: initial denaturation for 20 sec at
98°C, followed by 16 cycles for 5 sec at 98°C, for 15 sec at 68°C, and
for 20 sec at 72°C. For each donor, each of the Bsu36I-treated and
-untreated samples was labeled by an oligo A with a different
Donor Identifier-1 (ID-1) sequence, which was also not shared
by samples from other donors, providing each donor and each
condition with a unique identifier sequence.

5′-exonuclease treatment

To eliminate non 5′-phosphorothioate-protected strands, follow-
ing purification, 15 µg DNA aliquots from the post-linearly

Rates of target de novo mutations

Genome Research 495
www.genome.org

 Cold Spring Harbor Laboratory Press on April 7, 2022 - Published by genome.cshlp.orgDownloaded from 

http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/
http://www.cshlpress.com


amplified product of the Bsu36I-treated sample were incubated
each at 37°C in the presence of 15 units of Lambda exonuclease,
30 units of T7 exonuclease, and 90 units of RecJF exonuclease in
1× CutSmart buffer for 2.5 h. The post-linearly amplified product
of the Bsu36I-untreated sample was incubated at the same condi-
tions with 10 units of Lambda exonuclease, 20 units of T7 exonu-
clease, and 60 units of RecJF exonuclease.

Secondary barcode labeling and 3′-exonuclease treatment

Following purification, the DNAwas aliquoted into a 96-well plate
(1 µg per well). A single primer extension reaction was performed
using 0.5 µM of the secondary barcode primer (oligo C) and Q5
high-fidelity polymerase according to the manufacturer’s instruc-
tions. The following thermocycler parameters were used: initial
denaturation for 20 sec at 98°C, followed by a single cycle for 5
sec at 98°C, for 15 sec at 68°C, and for 40 sec at 72°C. To remove
excess oligo C, immediately after the thermocycler temperature
dropped to 16°C, 20 units of thermolabile Exo I were added
directly to each well together with the relabeling control primer
(oligoD) in a known amount equivalent to 0.66%of the secondary
barcode primer. After incubation of 1 h at 37°C, the thermolabile
Exo I was heat-inactivated for 1 min at 80°C and the DNAwas pu-
rified. For each donor, each of the Bsu36I-treated and -untreated
samples was labeled by an oligo C with a different Donor
Identifier-2 sequence (ID-2), whichwas also not shared by samples
from other donors, resulting in each donor and each condition
having a unique Identifier-2 sequence.

PCR amplification and sequencing

The first PCR reaction of the dual-barcode-labeled product was per-
formed using oligo E and oligo F1 as primers and Q5 high-fidelity
polymerase, according to the manufacturer’s instructions. The fol-
lowing thermocycler parameters were used: initial denaturation
for 30 sec at 98°C, followed by 10 cycles for 5 sec at 98°C, for 15
sec at 72°C, for 30 sec at 72°C, and a final extension for 30 sec at
72°C. Amplification products were purified, and the second PCR
reaction was performed using 25% of the first PCR product as tem-
plate, the amplification primers E and F2, and Q5 high-fidelity po-
lymerase according to the manufacturer’s instructions (different
F2 primers were used to add a unique Illumina index sequence
to each Bsu36I-treated and -untreated sample). The following ther-
mocycler parameters were used: initial denaturation for 30 sec at
98°C, followed by 24 cycles (with the exception of the EUR4 sam-
ple that was amplified by 17 cycles) for 5 sec at 98°C, for 15 sec at
70°C, for 30 sec at 72°C, and a final extension for 1 min at 72°C.
PCR products were agarose gel purified and further concentrated
by a DNA clean and concentrator kit (Zymo Research). DNA librar-
ies prepared from the Bsu36I-treated and -untreated samples of the
same donor were mixed in equal amounts and paired-end se-
quenced with 20% PhiX by Illumina MiSeq 300 cycles kit (V2) at
the Technion Genome Center (TGC). For each donor, two or three
MiSeq runs were performed to reach a minimum of 10 million
reads per treatment (specifically, all but AFR5 and EUR3 were se-
quenced two times), and the resulting FASTQ sequences were
joined before the sequence analysis step.

Sequence analysis

Illumina paired-end (PE) reads were merged via Pear (Zhang et al.
2014) using the default model for the detection of significantly
aligned regions and Phred score corrections. Merged sequences
were trimmed from Illumina adapters using cutadapt (Martin
2011), and quality filtered by Trimmomatic (Bolger et al. 2014) us-
ing a sliding window size of 3 and a Phred quality threshold of 30.

Quality filtered sequences were trimmed to remove the 5′ edge up
to position 18, a sequence which includes the 14 bases of the pri-
mary barcode and the 4 bases of ID-1, while adding this informa-
tion to the read’s header. Only sequences with the correct ID-1
and first three bases of HBB or HBD sequences were maintained.
Similarly, sequences were trimmed from 9 bp at their 3′ edge,
which include the 5 bases of the secondary barcode and the 4 bases
of ID-2, while adding this information to the read’s header. Only
sequences with the correct ID-2 were maintained. Trimmed se-
quences were sorted to HBB or HBD sequence pools, based on
the occupying bases at positions 33–38 of the coding sequence
(CGTTAC for HBB and TGTCAA for HBD), allowing one mismatch
and frameshifts of up to −3 or +3. Successfully sorted sequences
were mapped to either the HBB or HBD reference sequence (ob-
tained by Sanger sequencing aliquots from the matching donor
samples) using BWA (Li 2013) (parameters -M -t), and high-quality
mutations (Phred score ≥28) were noted. Reads were grouped by
their primary barcodes to “families” and processed according to
the workflow depicted in Supplemental Figure S9.

Data access

All raw sequencing data generated in this study have been submit-
ted to the NCBI database of Genotypes and Phenotypes (dbGaP;
https://www.ncbi.nlm.nih.gov/gap/) under accession number
phs002391.v1.p1. For final processed data see Supplemental
Datasheets and Supplemental Text S15. Software is available
at GitHub (https://github.com/livnat-lab/HBB_HBD) and as
Supplemental Code.

Competing interest statement

The authors declare no competing interests.

Acknowledgments

We thank Marc Feldman for comments on a previous draft; Rami
Reshef for infrastructural resources; Mary Otoo and Joshua
Adoboe for help with sample collection; Sara Zelig, Alan
Templeton, and Nick Pippenger for technical comments; and
Kim Weaver for extensive help. This publication was made possi-
ble through the support of a grant from the John Templeton
Foundation (61129). The opinions expressed in this publication
are those of the authors and do not necessarily reflect the views
of the John Templeton Foundation.

Author contributions: D.M. and A.L. invented the method and
designed the studies; D.M. performed all experiments except for
R.S.’s; R.S. processed the EUR4 sample; Y.N. created software tools
for data analysis; A.M. created the computational pipeline for mu-
tation calling; E.B. improved the pipeline; Y.N. and A.L. provided
statistical tools; M.B.Y., E.K.H., K.L.S., and A.L. obtained IRB and
Helsinki approvals; M.B.Y. and E.K.H. collected samples; D.M.,
Y.N., E.B., A.M., and A.L. analyzed the results; D.M. and A.L. draft-
ed the paper; D.M., Y.N., K.L.S. and A.L. revised the draft; K.L.S.
provided general advice; K.L.S. and A.L. acquired funding; A.L.
conceived of the project and the replacement hypothesis and su-
pervised the project.

References

Abascal F, Harvey LM,Mitchell E, Lawson AR, Lensing SV, Ellis P, Russell AJ,
Alcantara RE, Baez-Ortega A, Wang Y, et al. 2021. Somatic mutation
landscapes at single-molecule resolution. Nature 593: 405–410. doi:10
.1038/s41586-021-03477-4

Melamed et al.

496 Genome Research
www.genome.org

 Cold Spring Harbor Laboratory Press on April 7, 2022 - Published by genome.cshlp.orgDownloaded from 

http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
https://www.ncbi.nlm.nih.gov/gap/
https://www.ncbi.nlm.nih.gov/gap/
https://www.ncbi.nlm.nih.gov/gap/
https://www.ncbi.nlm.nih.gov/gap/
https://www.ncbi.nlm.nih.gov/gap/
https://www.ncbi.nlm.nih.gov/gap/
https://www.ncbi.nlm.nih.gov/gap/
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
https://github.com/livnat-lab/HBB_HBD
https://github.com/livnat-lab/HBB_HBD
https://github.com/livnat-lab/HBB_HBD
https://github.com/livnat-lab/HBB_HBD
http://genome.cshlp.org/lookup/suppl/doi:10.1101/gr.276103.121/-/DC1
http://genome.cshlp.org/
http://www.cshlpress.com


Allison AC. 1954. Protection afforded by sickle-cell trait against subtertian
malarial infection. Br Med J 1: 290–294. doi:10.1136/bmj.1.4857.290

Altenberg L, Liberman U, Feldman MW. 2017. Unified reduction principle
for the evolution of mutation, migration, and recombination. Proc Natl
Acad Sci 114: E2392–E2400. doi:10.1073/pnas.1619655114

Arbeithuber B, Makova KD, Tiemann-Boege I. 2016. Artifactual mutations
resulting from DNA lesions limit detection levels in ultrasensitive se-
quencing applications. DNA Res 23: 547–559. doi:10.1093/dnares/
dsw038

Blake R, Hess ST, Nicholson-Tuell J. 1992. The influence of nearest neigh-
bors on the rate and pattern of spontaneous point mutations. J Mol
Evol 34: 189–200. doi:10.1007/BF00162968

Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for
Illumina sequence data. Bioinformatics 30: 2114–2120. doi:10.1093/bio
informatics/btu170

Bulmer M. 1986. Neighboring base effects on substitution rates in pseudo-
genes. Mol Biol Evol 3: 322–329. doi:10.1093/oxfordjournals.molbev
.a040401

Cairns J, Overbaugh J, Miller S. 1988. The origin of mutants. Nature 335:
142–145. doi:10.1038/335142a0

Campbell CD, Chong JX, Malig M, Ko A, Dumont BL, Han L, Vives L,
O’Roak BJ, Sudmant PH, Shendure J, et al. 2012. Estimating the human
mutation rate using autozygosity in a founder population.Nat Genet 44:
1277–1281. doi:10.1038/ng.2418

Carlson J, Locke AE, Flickinger M, Zawistowski M, Levy S, Myers RM,
BoehnkeM, Kang HM, Scott LJ, Li JZ, et al. 2018. Extremely rare variants
reveal patterns of germline mutation rate heterogeneity in humans. Nat
Commun 9: 3753. doi:10.1038/s41467-018-05936-5

Carter R, Mendis K. 2002. Evolutionary and historical aspects of the burden
of malaria. Clin Microbiol Rev 15: 564–594. doi:10.1128/CMR.15.4.564-
594.2002

Cavalli-Sforza LL, FeldmanMW. 2003. The application ofmolecular genetic
approaches to the study of human evolution. Nat Genet 33: 266–275.
doi:10.1038/ng1113

CDC Division of Parasitic Diseases and Malaria. 2019. Where malaria oc-
curs. http://www.cdc.gov/malaria/about/distribution.html.

Conrad DF, Keebler JE, DePristo MA, Lindsay SJ, Zhang Y, Casals F,
Idaghdour Y, Hartl CL, Torroja C, Garimella KV, et al. 2011. Variation
in genome-wide mutation rates within and between human families.
Nat Genet 43: 712–714. doi:10.1038/ng.862

Crow KD, Amemiya CT, Roth J, Wagner GP. 2009. Hypermutability of
HoxA13A and functional divergence from its paralog are associated
with the origin of a novel developmental feature in zebrafish and related
taxa (Cypriniformes). Evolution 63: 1574–1592. doi:10.1111/j.1558-
5646.2009.00657.x

Dumas LJ, O’Bleness MS, Davis JM, Dickens CM, Anderson N, Keeney J,
Jackson J, Sikela M, Raznahan A, Giedd J, et al. 2012. DUF1220-domain
copy number implicated in human brain-size pathology and evolution.
Am J Hum Genet 91: 444–454. doi:10.1016/j.ajhg.2012.07.016

Ellegren H, Smith NG, Webster MT. 2003. Mutation rate variation in the
mammalian genome. Curr Opin Genet Dev 13: 562–568. doi:10.1016/j
.gde.2003.10.008

Feldman MW, Liberman U. 1986. An evolutionary reduction principle for
genetic modifiers. Proc Natl Acad Sci 83: 4824–4827. doi:10.1073/pnas
.83.13.4824

Feng Z, Smith D, Ellis McKenzie F, Levin S. 2004. Coupling ecology and evo-
lution: malaria and the S-gene across time scales.Math Biosci 189: 1–19.
doi:10.1016/j.mbs.2004.01.005

Flint J, Harding RM, Boyce AJ, Clegg JB. 1998. The population genetics of
the haemoglobinopathies. Baillière’s Clin Haem 11: 1–51. doi:10.1016/
S0950-3536(98)80069-3

Francioli LC, Polak PP, Koren A, Menelaou A, Chun S, Renkens I, Van Duijn
CM, Swertz M, Wijmenga C, Van Ommen G, et al. 2015. Genome-wide
patterns and properties of de novo mutations in humans. Nat Genet 47:
822–826. doi:10.1038/ng.3292

Gojobori T, Li WH, Graur D. 1982. Patterns of nucleotide substitution in
pseudogenes and functional genes. J Mol Evol 18: 360–369. doi:10
.1007/BF01733904

Goldmann JM, Wong WS, Pinelli M, Farrah T, Bodian D, Stittrich AB,
Glusman G, Vissers LE, Hoischen A, Roach JC, et al. 2016. Parent-of-or-
igin-specific signatures of de novo mutations. Nat Genet 48: 935–939.
doi:10.1038/ng.3597

Gu X, Li WH. 1995. The size distribution of insertions and deletions in hu-
man and rodent pseudogenes suggests the logarithmic gap penalty for
sequence alignment. J Mol Evol 40: 464–473. doi:10.1007/BF00164032

Gu W, Zhang F, Lupski JR. 2008. Mechanisms for human genomic rear-
rangements. Pathogenetics 1: 4. doi:10.1186/1755-8417-1-4

Haldane JBS. 1949. The rate ofmutation of human genes.Hereditas 35: 267–
273. doi:10.1111/j.1601-5223.1949.tb03339.x

Hardison R, Miller W. 2002. Welcome to the globin gene server. http
://globin.cse.psu.edu/.

Hardison RC, Chui DH, Giardine B, Riemer C, Patrinos GP, Anagnou N,
Miller W,Wajcman H. 2002. HbVar: a relational database of human he-
moglobin variants and thalassemia mutations at the globin gene server.
Hum Mutat 19: 225–233. doi:10.1002/humu.10044

Harpak A, Bhaskar A, Pritchard J. 2016. Mutation rate variation is a primary
determinant of the distribution of allele frequencies in humans. PLoS
Genet 12: e1006489. doi:10.1371/journal.pgen.1006489

Harris K. 2015. Evidence for recent, population-specific evolution of the hu-
man mutation rate. Proc Natl Acad Sci 112: 3439–3444. doi:10.1073/
pnas.1418652112

Harris K, Pritchard JK. 2017. Rapid evolution of the human mutation spec-
trum. eLife 6: e24284. doi:10.7554/eLife.24284

Hartl DL, Clark AG. 2007. Principles of population genetics, 4th ed. Sinauer
Associates, Sunderland, MA.

Hodgkinson A, Eyre-Walker A. 2011. Variation in the mutation rate across
mammalian genomes.Nat Rev Genet 12: 756–766. doi:10.1038/nrg3098

Hodgkinson A, Ladoukakis E, Eyre-Walker A. 2009. Cryptic variation in the
human mutation rate. PLoS Biol 7: e1000027. doi:10.1371/journal.pbio
.1000027

Hwang DG, Green P. 2004. Bayesian Markov chain Monte Carlo sequence
analysis reveals varying neutral substitution patterns in mammalian
evolution. Proc Natl Acad Sci 101: 13994–14001. doi:10.1073/pnas
.0404142101

Ingram V. 1957. Gene mutations in human hæmoglobin: the chemical dif-
ference between normal and sickle cell hæmoglobin. Nature 180: 326–
328. doi:10.1038/180326a0

Inoue K, Dewar K, Katsanis N, Reiter LT, Lander ES, Devon KL,Wyman DW,
Lupski JR, Birren B. 2001. The 1.4-Mb CMT1A duplication/HNPP dele-
tion genomic region reveals unique genome architectural features and
provides insights into the recent evolution of new genes. Genome Res
11: 1018–1033. doi:10.1101/gr.180401

Jee J, Rasouly A, Shamovsky I, Akivis Y, Steinman SR, Mishra B, Nudler E.
2016. Rates and mechanisms of bacterial mutagenesis from maxi-
mum-depth sequencing. Nature 534: 693–696. doi:10.1038/
nature18313

Kennedy SR, Schmitt MW, Fox EJ, Kohrn BF, Salk JJ, Ahn EH, Prindle MJ,
Kuong KJ, Shen JC, Risques RA, et al. 2014. Detecting ultralow-frequen-
cy mutations by duplex sequencing. Nat Protoc 9: 2586–2606. doi:10
.1038/nprot.2014.170

Klose RJ, Bird AP. 2006. Genomic DNAmethylation: the mark and its medi-
ators. Trends Biochem Sci 31: 89–97. doi:10.1016/j.tibs.2005.12.008

Kondrashov AS. 2003. Direct estimates of human per nucleotide mutation
rates at 20 loci causing Mendelian diseases. Hum Mutat 21: 12–27.
doi:10.1002/humu.10147

Kratochwil CF, Meyer A. 2019. Fragile DNA contributes to repeated evolu-
tion. Genome Biol 20: 39. doi:10.1186/s13059-019-1655-x

Kratochwil CF, Liang Y, Urban S, Torres-Dowdall J, Meyer A. 2019.
Evolutionary dynamics of structural variation at a key locus for color
pattern diversification in cichlid fishes. Genome Biol Evol 11: 3452–
3465. doi:10.1093/gbe/evz261

Kwiatkowski DP. 2005. How malaria has affected the human genome and
what human genetics can teach us about malaria. Am J Hum Genet 77:
171–192. doi:10.1086/432519

Leigh EG Jr. 1970. Natural selection and mutability. Am Nat 104: 301–305.
doi:10.1086/282663

Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T,
O’Donnell-Luria AH, Ware JS, Hill AJ, Cummings BB, et al. 2016.
Analysis of protein-coding genetic variation in 60,706 humans. Nature
536: 285–291. doi:10.1038/nature19057

Lercher MJ, Williams EJ, Hurst LD. 2001. Local similarity in evolutionary
rates extends over whole chromosomes in human-rodent and mouse-
rat comparisons: implications for understanding the mechanistic basis
of the male mutation bias. Mol Biol Evol 18: 2032–2039. doi:10.1093/
oxfordjournals.molbev.a003744

Li H. 2013. Aligning sequence reads, clone sequences and assembly contigs
with BWA-MEM. arXiv:1303.3997 [q-bio.GN].

Licht K, Hartl M, Amman F, Anrather D, Janisiw MP, Jantsch MF. 2019.
Inosine induces context-dependent recoding and translational stalling.
Nucleic Acids Res 47: 3–14. doi:10.1093/nar/gky1163

Lind PA. 2019. Repeatability and predictability in experimental evolution.
In Evolution, origin of life, concepts and methods (ed. Pontarotti P), pp.
57–83. Springer, Cham, Switzerland.

Livnat A. 2013. Interaction-based evolution: how natural selection and
nonrandom mutation work together. Biol Direct 8: 24. doi:10.1186/
1745-6150-8-24

Livnat A. 2017. Simplification, innateness, and the absorption of meaning
from context: how novelty arises from gradual network evolution. Evol
Biol 44: 145–189. doi:10.1007/s11692-017-9407-x

Livnat A, Papadimitriou C. 2016. Evolution and learning: used together,
fused together. A response to Watson and Szathmáry. Trends Ecol Evol
31: 894–896. doi:10.1016/j.tree.2016.10.004

Rates of target de novo mutations

Genome Research 497
www.genome.org

 Cold Spring Harbor Laboratory Press on April 7, 2022 - Published by genome.cshlp.orgDownloaded from 

http://www.cdc.gov/malaria/about/distribution.html
http://www.cdc.gov/malaria/about/distribution.html
http://www.cdc.gov/malaria/about/distribution.html
http://www.cdc.gov/malaria/about/distribution.html
http://www.cdc.gov/malaria/about/distribution.html
http://www.cdc.gov/malaria/about/distribution.html
http://globin.cse.psu.edu/
http://globin.cse.psu.edu/
http://globin.cse.psu.edu/
http://globin.cse.psu.edu/
http://globin.cse.psu.edu/
http://globin.cse.psu.edu/
http://genome.cshlp.org/
http://www.cshlpress.com


Losos JB. 2017. Improbable destinies: fate, chance, and the future of evolution.
Penguin, New York.

Lupski JR. 1998. Genomic disorders: structural features of the genome can
lead to DNA rearrangements and human disease traits. Trends Genet
14: 417–422. doi:10.1016/S0168-9525(98)01555-8

Luria SE, Delbrück M. 1943. Mutations of bacteria from virus sensitivity to
virus resistance. Genetics 28: 491–511. doi:10.1093/genetics/28.6.491

Lynch M. 2010. Rate, molecular spectrum, and consequences of human
mutation. Proc Natl Acad Sci 107: 961–968. doi:10.1073/pnas
.0912629107

Martin M. 2011. Cutadapt removes adapter sequences from high-through-
put sequencing reads. EMBnet 17: 10–12. doi:10.14806/ej.17.1.200

Martincorena I, LuscombeNM. 2013. Non-randommutation: the evolution
of targeted hypermutation and hypomutation. Bioessays 35: 123–130.
doi:10.1002/bies.201200150

Matassi G, Sharp PM, Gautier C. 1999. Chromosomal location effects on
gene sequence evolution in mammals. Curr Biol 9: 786–791. doi:10
.1016/S0960-9822(99)80361-3

Mathieson I, Reich D. 2017. Differences in the rare variant spectrum among
human populations. PLoS Genet 13: e1006581. doi:10.1371/journal
.pgen.1006581

McClellan J, King MC. 2010. Genetic heterogeneity in human disease. Cell
141: 210–217. doi:10.1016/j.cell.2010.03.032

Michaelson JJ, Shi Y, Gujral M, Zheng H, Malhotra D, Jin X, Jian M, Liu G,
Greer D, Bhandari A, et al. 2012. Whole-genome sequencing in autism
identifies hot spots for de novo germline mutation. Cell 151: 1431–
1442. doi:10.1016/j.cell.2012.11.019

Moxon ER, Rainey PB, Nowak MA, Lenski RE. 1994. Adaptive evolution of
highly mutable loci in pathogenic bacteria. Curr Biol 4: 24–33. doi:10
.1016/S0960-9822(00)00005-1

NachmanMW, Crowell SL. 2000. Estimate of the mutation rate per nucleo-
tide in humans.Genetics 156: 297–304. doi:10.1093/genetics/156.1.297

Nishikura K. 2010. Functions and regulation of RNA editing by ADAR deam-
inases. Annu Rev Biochem 79: 321–349. doi:10.1146/annurev-biochem-
060208-105251

Pauling L, Itano HA, Singer SJ, Wells IC. 1949. Sickle-cell anemia, a molec-
ular disease. Science 110: 543–548. doi:10.1126/science.110.2865.543

Piel FB, Patil AP, Howes RE, Nyangiri OA, Gething PW, Williams TN,
Weatherall DJ, Hay SI. 2010. Global distribution of the sickle cell gene
and geographical confirmation of the malaria hypothesis. Nat
Commun 1: 104. doi:10.1038/ncomms1104

Popitsch N, Huber CD, Buchumenski I, Eisenberg E, Jantsch M, Von
Haeseler A, Gallach M. 2020. A-to-I RNA editing uncovers hidden sig-
nals of adaptive genome evolution in animals. Genome Biol Evol 12:
345–357. doi:10.1093/gbe/evaa046

Rahbari R, Wuster A, Lindsay SJ, Hardwick RJ, Alexandrov LB, Al Turki S,
Dominiczak A, Morris A, Porteous D, Smith B, et al. 2016. Timing, rates
and spectra of human germline mutation. Nat Genet 48: 126. doi:10
.1038/ng.3469

Roach JC, Glusman G, Smit AF, Huff CD, Hubley R, Shannon PT, Rowen L,
Pant KP, GoodmanN, BamshadM, et al. 2010. Analysis of genetic inher-
itance in a family quartet by whole-genome sequencing. Science 328:
636–639. doi:10.1126/science.1186802

Salk JJ, Schmitt MW, Loeb LA. 2018. Enhancing the accuracy of next-gener-
ation sequencing for detecting rare and subclonal mutations. Nat Rev
Genet 19: 269–285. doi:10.1038/nrg.2017.117

Shendure J, Akey JM. 2015. The origins, determinants, and consequences of
human mutations. Science 349: 1478–1483. doi:10.1126/science
.aaa9119

Steinberg M, Adams JI. 1991. Hemoglobin A2: origin, evolution, and after-
math. Blood 78: 2165–2177. doi:10.1182/blood.V78.9.2165.2165

Veltman JA, Brunner HG. 2012. De novo mutations in human genetic dis-
ease. Nat Rev Genet 13: 565–575. doi:10.1038/nrg3241

Vogel F, Motulsky A. 1997. Human genetics: problems and approaches.
Springer-Verlag, Berlin.

Walsh B, Lynch M. 2018. Evolution and selection of quantitative traits. Oxford
University Press, Oxford, UK.

Weyrich A. 2012. Preparation of genomic DNA from mammalian sperm.
Curr Protoc Mol Biol 98: 2–13. doi:10.1002/0471142727.mb0213s98

Wolfe KH, Sharp PM, Li WH. 1989. Mutation rates differ among regions of
the mammalian genome. Nature 337: 283–285. doi:10.1038/337283a0

Xie KT,Wang G, Thompson AC, Wucherpfennig JI, Reimchen TE, MacColl
AD, Schluter D, Bell MA, Vasquez KM, KingsleyDM. 2019. DNA fragility
in the parallel evolution of pelvic reduction in stickleback fish. Science
363: 81–84. doi:10.1126/science.aan1425

Zhang Z, Gerstein M. 2003. Patterns of nucleotide substitution, insertion
and deletion in the human genome inferred from pseudogenes.
Nucleic Acids Res 31: 5338–5348. doi:10.1093/nar/gkg745

Zhang F, Carvalho CMB, Lupski JR. 2009. Complex human chromosomal
and genomic rearrangements. Trends Genet 25: 298–307. doi:10.1016/
j.tig.2009.05.005

Zhang J, Kobert K, Flouri T, Stamatakis A. 2014. PEAR: a fast and accurate
Illumina Paired-End reAd mergeR. Bioinformatics 30: 614–620. doi:10
.1093/bioinformatics/btt593

Received August 17, 2021; accepted in revised form January 10, 2022.

Melamed et al.

498 Genome Research
www.genome.org

 Cold Spring Harbor Laboratory Press on April 7, 2022 - Published by genome.cshlp.orgDownloaded from 

http://genome.cshlp.org/
http://www.cshlpress.com


 10.1101/gr.276103.121Access the most recent version at doi:
2022 32: 488-498 originally published online January 14, 2022Genome Res. 

  
Daniel Melamed, Yuval Nov, Assaf Malik, et al. 
  
disease

 gene region associated with adaptation and geneticHBBhuman 
De novo mutation rates at the single-mutation resolution in a

  
Material

Supplemental
  

 http://genome.cshlp.org/content/suppl/2022/02/14/gr.276103.121.DC1

  
References

  
 http://genome.cshlp.org/content/32/3/488.full.html#ref-list-1

This article cites 75 articles, 15 of which can be accessed free at:

  
Open Access

  
 Open Access option.Genome ResearchFreely available online through the 

  
License

Commons 
Creative

.http://creativecommons.org/licenses/by/4.0/
Commons License (Attribution 4.0 International), as described at 

, is available under a CreativeGenome ResearchThis article, published in 

Service
Email Alerting

  
 click here.top right corner of the article or 

Receive free email alerts when new articles cite this article - sign up in the box at the

 https://genome.cshlp.org/subscriptions
go to: Genome Research To subscribe to 

© 2022 Melamed et al.; Published by Cold Spring Harbor Laboratory Press

 Cold Spring Harbor Laboratory Press on April 7, 2022 - Published by genome.cshlp.orgDownloaded from 

http://genome.cshlp.org/lookup/doi/10.1101/gr.276103.121
http://genome.cshlp.org/content/suppl/2022/02/14/gr.276103.121.DC1
http://genome.cshlp.org/content/32/3/488.full.html#ref-list-1
http://creativecommons.org/licenses/by/4.0/
http://genome.cshlp.org/cgi/alerts/ctalert?alertType=citedby&addAlert=cited_by&saveAlert=no&cited_by_criteria_resid=protocols;10.1101/gr.276103.121&return_type=article&return_url=http://genome.cshlp.org/content/10.1101/gr.276103.121.full.pdf
http://genome.cshlp.org/cgi/adclick/?ad=56437&adclick=true&url=https%3A%2F%2Fwww.gencove.com%2F
https://genome.cshlp.org/subscriptions
http://genome.cshlp.org/
http://www.cshlpress.com