Genomic Investigations for Cassava Biofortification with Pro-Vitamin A Carotenoids in Kenya

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

University of Ghana

Abstract

Cassava (Manihot esculenta Crantz) is an important crop that serves as food and income source for about 800 million people in the world. The crop is popular due to its ability to grow in degraded soils, tolerance to extreme weather, requires less farm inputs to grow, and its ability to be harvested in piece meal when needed in small quantities. Despite cassava being rich in starch, it is deficient of other nutrients such as pro-vitamin A carotenoids. It has been noted that 68% of Kenyan children from communities that solely depend on cassava have vitamin A deficiency (VAD). To address this challenge, the Kenya Agricultural and Livestock Research Organization (KALRO) joined global efforts to biofortify cassava with pro-vitamin A carotenoids (pVAC). Genomic investigation was carried out to support cassava biofortification in Kenya. A population of 94 pVAC cassava genotypes were genotyped using SNP markers and phenotyped for beta-carotene content using high performance liquid chromatography (HPLC). Population structure, genetic diversity and linkage disequilibrium in the population was determined using R statistical software. Parametric genomic prediction models were developed using Bayesian generalized linear regression (BGLR) models. The prediction ability of these models was determined before and after adding significant SNPs from random marker effect Genome-wide Association Studies (GWAS) model into genomic prediction model. Similarly, non-parametric machine learning (ML) models for pVAC and their ensemble were developed and their prediction ability determined. In addition, this study investigated genomic regions, SNP marker superior alleles and candidate genes controlling pVAC in cassava roots using multi-locus and haplotype-based GWAS models. The results indicated that polymorphic information content (PIC) varied from 0.10 to 0.38, with an average of 0.24, implying that the markers used in this study were informative. The range of the minor allele frequency (MAF) was 0.05 to 0.50, with an average of 0.20. This indicates that the genotype quality of the markers were reliable. Population structure results indicated that the population was structured in three ancestral sub-populations. According to statistics on genetic diversity, the observed heterozygosity (Ho) had a mean of 0.30 with a range of 0.21 to 0.38, while the genetic diversity (GD), also known as expected heterozygosity (He), had a mean of 0.29 and ranged between 0.10 and 0.50. This result revealed the presence of a moderate diversity in the study population. The coefficient of inbreeding (F) varied from -0.28 to 0.28, with an average of 0.02 and this indicated that the population was generally in random mating. According to analysis of molecular variance (AMOVA), 96.05% of genetic variability existed within the sub-populations while 3.95% existed between the sub-populations. The results showed that at an LD threshold of R2=0.2, LD decayed at 613.072 kb while at R2=0.1, LD decayed at 1786.714 kb. This result suggested that genomic analyses can be effectively carried out using a minimum of 1,256 markers distributed well across the genome. The addition of significant SNPs from GWAS enhanced the BGLR genomic prediction models. When more significant SNPs were included as fixed effects in the GP models, an improvement in the models was detected. Non-parametric ML models were shown to be accurate in predicting individuals’ genomic value. ML model prediction power was greatly increased by feature selection and use of ensemble of models. The excellent prediction power of the Random Forest (RF), Extreme Gradient Boosting (XGBOOST), K-Nearest Neighbors (KNN), and ensemble of models makes them suitable for the last stage of choosing a potential cassava variety. The fixed marker effect GWAS model identified two significant SNPs while the random marker effect GWAS models identified five significant SNPs associated with beta-carotene. Haplotype based GWAS identified 15 genomic regions in 10 chromosomes associated with the trait. This indicated that among the studied GWAS models, haplotype-based GWAS had high statistical power. New causative variants on chromosomes 03, 05, 06, 08, 09, 10, 11 and 18 that are strongly associated with pVAC were discovered through this study. The investigation found alleles C, G, C, T, and G respectively on chromosomes 01, 03, 04, 09, and 14 to be superior alleles. These superior alleles leverage the need for quick biofortification of cassava with pVAC through gene pyramiding breeding strategy. This study uncovered a total of 20 candidate genes controlling carotenoid content in cassava roots. Five of the identified candidate genes (Manes.01G124200, Manes.01G001200, Manes.05G193700, Manes.08G037100, and Manes.03G084700) are involved in the anabolism of carotenoid content in cassava roots. This is the first time to report the detection of the latter three genes on the genomic regions associated with pVAC. On the other hand, the remaining genes (15) were involved in carotenoids catabolism. Identification of these candidate genes facilitates informed selection targeting the development of novel cassava varieties with high nutritional value. Results from this research directly contributes to the ongoing effort of biofortifying cassava with high pVAC at KALRO.

Description

PhD. Plant Breeding

Citation

Endorsement

Review

Supplemented By

Referenced By