Genomic Investigations for Cassava Biofortification with Pro-Vitamin A Carotenoids in Kenya
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Ghana
Abstract
Cassava (Manihot esculenta Crantz) is an important crop that serves as food and income source
for about 800 million people in the world. The crop is popular due to its ability to grow in degraded
soils, tolerance to extreme weather, requires less farm inputs to grow, and its ability to be harvested
in piece meal when needed in small quantities. Despite cassava being rich in starch, it is deficient
of other nutrients such as pro-vitamin A carotenoids. It has been noted that 68% of Kenyan children
from communities that solely depend on cassava have vitamin A deficiency (VAD). To address
this challenge, the Kenya Agricultural and Livestock Research Organization (KALRO) joined
global efforts to biofortify cassava with pro-vitamin A carotenoids (pVAC). Genomic
investigation was carried out to support cassava biofortification in Kenya. A population of 94
pVAC cassava genotypes were genotyped using SNP markers and phenotyped for beta-carotene
content using high performance liquid chromatography (HPLC). Population structure, genetic
diversity and linkage disequilibrium in the population was determined using R statistical software.
Parametric genomic prediction models were developed using Bayesian generalized linear
regression (BGLR) models. The prediction ability of these models was determined before and after
adding significant SNPs from random marker effect Genome-wide Association Studies (GWAS)
model into genomic prediction model. Similarly, non-parametric machine learning (ML) models
for pVAC and their ensemble were developed and their prediction ability determined. In addition,
this study investigated genomic regions, SNP marker superior alleles and candidate genes
controlling pVAC in cassava roots using multi-locus and haplotype-based GWAS models. The
results indicated that polymorphic information content (PIC) varied from 0.10 to 0.38, with an
average of 0.24, implying that the markers used in this study were informative. The range of the
minor allele frequency (MAF) was 0.05 to 0.50, with an average of 0.20. This indicates that the genotype quality of the markers were reliable. Population structure results indicated that the
population was structured in three ancestral sub-populations. According to statistics on genetic
diversity, the observed heterozygosity (Ho) had a mean of 0.30 with a range of 0.21 to 0.38, while
the genetic diversity (GD), also known as expected heterozygosity (He), had a mean of 0.29 and
ranged between 0.10 and 0.50. This result revealed the presence of a moderate diversity in the
study population. The coefficient of inbreeding (F) varied from -0.28 to 0.28, with an average of
0.02 and this indicated that the population was generally in random mating. According to analysis
of molecular variance (AMOVA), 96.05% of genetic variability existed within the sub-populations
while 3.95% existed between the sub-populations. The results showed that at an LD threshold of
R2=0.2, LD decayed at 613.072 kb while at R2=0.1, LD decayed at 1786.714 kb. This result
suggested that genomic analyses can be effectively carried out using a minimum of 1,256 markers
distributed well across the genome. The addition of significant SNPs from GWAS enhanced the
BGLR genomic prediction models. When more significant SNPs were included as fixed effects in
the GP models, an improvement in the models was detected. Non-parametric ML models were
shown to be accurate in predicting individuals’ genomic value. ML model prediction power was
greatly increased by feature selection and use of ensemble of models. The excellent prediction
power of the Random Forest (RF), Extreme Gradient Boosting (XGBOOST), K-Nearest
Neighbors (KNN), and ensemble of models makes them suitable for the last stage of choosing a
potential cassava variety. The fixed marker effect GWAS model identified two significant SNPs
while the random marker effect GWAS models identified five significant SNPs associated with
beta-carotene. Haplotype based GWAS identified 15 genomic regions in 10 chromosomes
associated with the trait. This indicated that among the studied GWAS models, haplotype-based
GWAS had high statistical power. New causative variants on chromosomes 03, 05, 06, 08, 09, 10, 11 and 18 that are strongly associated with pVAC were discovered through this study. The
investigation found alleles C, G, C, T, and G respectively on chromosomes 01, 03, 04, 09, and 14
to be superior alleles. These superior alleles leverage the need for quick biofortification of cassava
with pVAC through gene pyramiding breeding strategy. This study uncovered a total of 20
candidate genes controlling carotenoid content in cassava roots. Five of the identified candidate
genes (Manes.01G124200, Manes.01G001200, Manes.05G193700, Manes.08G037100, and
Manes.03G084700) are involved in the anabolism of carotenoid content in cassava roots. This is
the first time to report the detection of the latter three genes on the genomic regions associated
with pVAC. On the other hand, the remaining genes (15) were involved in carotenoids catabolism.
Identification of these candidate genes facilitates informed selection targeting the development of
novel cassava varieties with high nutritional value. Results from this research directly contributes
to the ongoing effort of biofortifying cassava with high pVAC at KALRO.
Description
PhD. Plant Breeding
