Tsindi et al. CABI Agriculture and Bioscience (2023) 4:15 CABI Agriculture https://doi.org/10.1186/s43170-023-00158-2 and Bioscience RESEARCH Open Access Analysis of population structure and genetic diversity in a Southern African soybean collection based on single nucleotide polymorphism markers A. Tsindi1,2, J. S. Y. Eleblu1, E. Gasura3, H. Mushoriwa4, P. Tongoona1, E. Y. Danquah1, L. Mwadzingeni2, M. Zikhali2, E. Ziramba2, G. Mabuyaye2 and J. Derera5* Abstract Soybean is an emerging strategic crop for nutrition, food security, and livestock feed in Africa, but improvement of its productivity is hampered by low genetic diversity. There is need for broadening the tropical germplasm base through incorporation and introgression of temperate germplasm in Southern Africa breeding programs. Therefore, this study was conducted to determine the population structure and molecular diversity among 180 temperate and 30 tropi- cal soybean accessions using single nucleotide polymorphism (SNP) markers. The results revealed very low levels of molecular diversity among the 210 lines with implications for the breeding strategy. Low fixation index ( FST) value of 0.06 was observed, indicating low genetic differences among populations. This suggests high genetic exchange among different lines due to global germplasm sharing. Inference based on three tools, such as the Evanno method, silhouette plots and UPMGA phylogenetic tree showed the existence of three sub-populations. The UPMGA tree showed that the first sub-cluster is composed of three genotypes, the second cluster has two genotypes, while the rest of the genotypes constituted the third cluster. The third cluster revealed low variation among most genotypes. Negligible differences were observed among some of the lines, such as Tachiyukata and Yougestu, indicating shar- ing of common parental backgrounds. However large phenotypic differences were observed among the accessions suggesting that there is potential for their utilization in the breeding programs. Rapid phenotyping revealed grain yield potential ranging from one to five tons per hectare for the 200 non-genetically modified accessions. Findings from this study will inform the crossing strategy for the subtropical soybean breeding programs. Innovation strate- gies for improving genetic variability in the germplasm collection, such as investments in pre-breeding, increasing the geographic sources of introductions and exploitation of mutation breeding would be recommended to enhance genetic gain. Keywords Glycine max, Molecular diversity, Phenotyping, Population structure, SNP, Soybean *Correspondence: J. Derera j.derera@cgiar.org Full list of author information is available at the end of the article © The Author(s) 2023. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creati veco mmons. org/l icens es/ by/4.0 /. The Creative Commons Public Domain Dedication waiver (http:// creati veco mmons. org/ publi cdomai n/ zero/1. 0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Tsindi et al. CABI Agriculture and Bioscience (2023) 4:15 Page 2 of 14 Introduction improve local varieties. The major sources of soybean Soybean is an important nutritious crop used for food, germplasm lines and populations for Africa have been feed and industrial oils, worldwide. Its high utility is China, Japan, Korea and USA (Grieshop and Fahey explained by the high protein content of about 40% and 2001; Jeong et al. 2019a, b) where greater genetic diver- high oil content reaching and exceeding 20% for some sity has been reported (Oliveira et al. 2010). China and genotypes (Bellaloui et  al. 2010; Orf 2010). In 2019, the the USA maintain large collections in their gene banks. worldwide production was over 300 million metric tons There are about thirty thousand accessions in the Chi- produced on 120 million hectares of land (FAOSTAT nese Gene bank, while the USDA gene banks contain 2021), which translate to a global average yield of 2.5 tons about 15000 accessions (Liu et  al. 2017). This germ- per hectare. Production is dominated by a few countries. plasm cannot be directly used in the breeding programs The world’s leading soybean producers are Brazil, United in Africa. There is need to characterize the germplasm States of America, Argentina and China. Africa, contrib- before crossing is done. For example, 93% of the Chi- utes only 0.9% to the total world production (FAOSTAT nese germplasm accessions are primitive cultivars but 2021), which is negligible and does not match the highly diverse (Chen and Nelson (2005). These collec- regional demand for soybean products. The major pro- tions are important sources of favorable alleles which ducers are South Africa, Nigeria, Ghana, Uganda, Ethio- can enhance breeding in Africa. However, when such pia, Zambia, Malawi and Zimbabwe. All these countries introductions are to be used for breeding purposes, fail to meet their national demand. As a result, Africa they need to be screened for their usefulness (Jeong imports soybean. et al. 2019a, b; Li et al. 2014) and inform the breeding There is need to develop varieties that are highly pro- strategy. ductive and adapted to the tropical and subtropical A survey of the literature indicates that germplasm ecologies in Africa. Efforts are underway to identify such diversity characterization can be conducted follow- varieties through the regional soybean breeding network ing two approaches. Both morphological or phenotypic that employs the Pan African Trials (PAT) under the and molecular genetic diversity studies have been used leadership of the Soybean Innovation Lab (SIL), in col- to assess variation in soybean (Abebe et  al. 2021; Ban- laboration with the International Institute of Tropical dillo et  al. 2015; Chander et  al. 2021; Malik et  al. 2011; Agriculture (IITA), national public programs and private Jeong et al. 2019a, b; Ma et al. 2006; Nawaz et al. 2021; seed companies (https:// www.s oybea ninn ovati onlab. illin Ojo et al. 2012; Valliyodan et al. 2021; Wang et al. 2012; ois. edu/). The PAT shows a general low level of produc- Mihaljević et al. 2020). The advantages and limitations of tivity due to limited genetic improvements. However, both approaches have been discussed. genetic improvement efforts are challenged by the low While morphological or phenotypic methods have genetic base of soybean (Cornelious and Sneller 2002; been successful for discriminating soybean genotypes, Lee et al. 2014; Li et al. 2013) owing to several domestica- their efficiency is compromised by complications which tion bottlenecks (Gwinner et al. 2017; Hyten et al. 2007; are caused by the genotype by environment interactions Rafalski 2002). (GxE) effects. GxE masks genotypic differences among The baseline genetic diversity of the soybean germ- the germplasm entries. The high levels of GxE effects plasm pool and introductions should be established requires that genotypes are evaluated at many sites. in order to devise a viable breeding strategy. Genetic However, due to the exorbitant costs for conducting improvement of any crop rests upon the diversity present multi-location trials, a few sites are often used resulting within and among the breeding populations (Biyeu et al. in a low resolution due to few data points. There are also 2010). Knowledge of genetic variability helps in selec- challenges of waiting for a long time to get results. The tion of parental lines to be used when making crosses, length of the cycle from seed to seed is a hindrance as it establishment of core collections and enhanced utiliza- is time consuming, labor intensive and costly (Chander tion of the germplasm in breeding programs (Abebe et al. et al. 2021; Nadeem et al. 2018). As a result, use of molec- 2021; Bandillo et al. 2017). While there is limited diver- ular markers has increased. They are not affected by GxE sity among cultivars within country or regional breeding interactions, not growth specific and are abundant within programs because of sharing of common parents (Gwin- the genome (Nadeem et  al. 2018). Although molecu- ner et al. 2017; Hahn and Würschum 2014; Tiwari et al. lar markers were initially expensive, there have been 2019), introduction of exotic germplasm plays a crucial improvements such as invention of single nucleotide pol- role in widening the genetic base from which parents can ymorphism (SNPs) DNA markers and their amenability be selected for use to make bi-parental crosses. to automation that have brought the costs per data point The tropical and subtropical soybean breeding pro- to a very competitive level compared to phenotypic data. grams in Africa utilizes temperate germplasm to T sindi et al. CABI Agriculture and Bioscience (2023) 4:15 Page 3 of 14 Currently, SNPs are among the most widely used markers 10 genotypes from South Africa were planted in South (Zhu et al. 2003; Edwards et al. 2007; Nadeem et al. 2018). Africa while the other 200 were planted in Zimbabwe. An The SNPs are the markers of choice for molecular average of six leaf discs was sampled from a single plant diversity studies. SNP markers have been successfully from each of the genotypes at 3  weeks after emergence used for diversity studies for several crops including soy- using the LGC genomics plant sample collection kit. The bean (Abebe et  al. 2021; Chander et  al. 2021; Liu et  al. leaf discs were placed in 96 well plates and sealed with 2017), cowpea (Fatokun et al. 2018; Qin et al. 2016; Sod- perforated strip caps. A desiccant sachet was placed on edji et  al. 2021), pigeon pea (Yang et  al. 2006; Zavinon top of the sealed tubes and a rack lid was fixed on top. et al. 2020) and common bean (Blair et al. 2013; Cortés The samples were placed in a sealable bag and shipped to et al. 2011; Nemlı et al. 2017). Assessment of the genetic LGC genomics, Germany, for genotyping using the tar- diversity among elite lines and varieties developed by geted genotyping-by-sequencing (SeqSNP) method. IITA using SNPs revealed high diversity within the germ- plasm and grouped the germplasm into three clusters Rapid phenotypic screening based on genetic relatedness (Abebe et  al. 2021). Simi- A total of 200 non-genetically modified accessions (tem- larly, broad genetic base among tropical soybean lines perate and Tropical) were planted in Zimbabwe. The ten with a genetic diversity index of 0.414 using SNP mark- accessions from South Africa could not be evaluated in ers has been reported (Chander et  al. 2021). However, Zimbabwe because they are genetically modified (con- previous studies cited low genetic diversity among the tain the  Roundup-ready  herbicide resistance  trait). The germplasm from Brazil, China, Europe and North Amer- rapid screening was conducted at the Rattray Arnold ica. Low genetic diversity was reported among Brazil- Research Station (RARS) (17038′60" S 31014′24"E), near ian (Gwinner et al. 2017), USA and Chinese germplasm Harare. Rapid phenotypic screening for yield was done in (Liu et  al. 2017). Central European lines were reported an observation trial without replication in two row plots to be closely related to the Swiss and Canadian lines, but which were 1.5 m long and a spacing of 0.45 m inter row distantly related to the Chinese (Hahn and Würschum and 0.05 within row. Grain yield was recorded from the 2014). These findings suggest the need for breeders to whole plot at maturity. know the molecular diversity in the germplasm to guide breeding strategies. DNA extraction, SNP marker genotyping and data Improvement of soybean varieties for adaptation and pre‑processing productivity ranks quite high on the product profile for DNA extraction was done using magnetic bead chemis- the Southern Africa region. Early maturity in response try (sbeadex™ mini plant kit from LGC, Biosearch Tech- to climate change, which has rendered growing seasons nologies, Berlin, Germany) on KingFisher Flex. SNP short, is one of the important traits for soybean lines for marker genotyping was performed using SeqSNP, a tar- deployment in sub-Saharan Africa (Ziervogel et al. 2014). geted genotyping by sequencing service offered by LGC, This requires sourcing of exotic germplasm with the which allows for genotyping of SNPs and small inser- favorable alleles for early maturity. Temperate germplasm tions/deletions using a single primer enrichment tech- is less sensitive to latitude, which is a major determinant nology (LGC Bioscience Technologies 2019). In order to of flowering and maturity time in soybean. The soybean design a SeqSNP assay, a total of 500 informative markers breeding programs in Africa have collected both temper- were selected from a panel of 1 082 markers in the LGC ate and tropical germplasm for utilization in breeding. database (https:// www.b iosea rcht ech.c om/ produc ts/p cr- However, the levels of molecular diversity in this col- kits- and- reage nts/ genot yping- assays/ kasp- genot yping- lection has not been established. The present study was chemi stry/ kasp- snp- libra ries/ soybe an- genot yping- libra therefore conducted to assess the population structure ry), which were designed from an original set of 1 536 and genetic diversity of the temperate and tropical soy- SNP markers, the “Universal Soy Linkage Panel” (USLP bean accessions using SNP markers. 1.0) described in Hyten et  al. 2010. These SNP markers were selected based on the even distribution throughout Materials and methods each of the 20 consensus linkage groups, and for opti- Plant material and sampling mum allele frequency in diverse germplasm. The physical Public (belonging to government/ national research starting and end positions of the markers for the con- institutions) and private (from private institutions) struction of a BED file for use in sequencing were taken germplasm collection which comprised 210 lines from from the Soybase database (https://w ww. soybas e. org/) South Africa (10), Malawi (1), Zimbabwe (19), and USA with the reference genome as Williams 82. (180) was used for the study. All the genotypes were The total number of targets that passed design was 496 planted in plastic sleeves in a screen house in 2019. The covered by a total of 984 oligo probes, i.e. the number Tsindi et al. CABI Agriculture and Bioscience (2023) 4:15 Page 4 of 14 of oligo probes per target being ~ 1.98. The total num- model (Gower 1971). The similarity matrix was then used ber of targets which passed the quality criteria, that is, to group the soybean genotypes using the Unweighted those that were successfully genotyped in at least 85% Pair Group Method using Arithmetic average (UPGMA) of all samples, was 485 (97.8%). NextSeq 500 sequenc- algorithm in R Statistics (Team R Core 2015) giving an ing was performed, with the number of pre-processed annotated phylogenetic tree (Rambaut 2016). The 30 reads being 35 397 796 reads which is approximately 168 tropical and 180 temperate genotypes were isolated 561 reads per sample. The percentage reads effectively and subjected to diversity analysis and a Dendogram used in genotyping was 83.4% and the average effective was drawn in R Statistics separately for each group of target SNP coverage 283x. The SNP genotyping pipeline genotypes. and settings involved diploid genotyping with minimum Population structure analysis was performed using the coverage of 8 reads per sample and locus using Free Bayesian clustering approach in STRU CTUR E v2.3.4 Bayes (Garrison & Marth 2012). A total of 437 (87.1%) (Porras-Hurtado et al. 2012). Structure analysis was run of the targets were polymorphic, 98.5% of all calls were using an Admixture model with 5 000 burning period homozygous and 1.5% heterozygous. Missing data was and 50 000 Markov-chain Monte Carlo replications. The reported with 1.4%. number of clusters (k) was set to range from 1 to 10 with Demultiplexing of all library groups was done using 3 iterations. The output from STRU CTUR E was then the Illumina bcl2fastq 2.17.1.14 software. One or two imported to Structure harvester (Earl and VonHoldt mismatches or Ns were allowed in the barcode read 2012) to visualize the delta K value which forms a distinct when barcode distances between all libraries on the lane peak, using the Evanno Method. Analysis of molecular allowed for it. Clipping of sequencing adapter remnants variance (AMOVA) was done using GenAlEx (Peakall was then done from all reads. Reads with final length < 65 and Smouse 2012) to determine the variance compo- bases were discarded. Quality trimming of adapter nents and the molecular diversity between and within clipped illumina reads was performed for the removal of populations. Bases were coded A = 1, C = 2, G = 3, T = 4 reads containing Ns and trimming of reads at 3` end to and missing data 0. Clone Identification was also done get a minimum average Phred quality score of 30 over a in GenAlEx. The Nei’s nucleotide distance and the fixa- window of ten bases. Reads with final length < 65 bases tion Index ( FST) were also computed. The fixation index were discarded. FastQC reports for all FASTQ files were is a measure of genetic variation that can be explained then created. Read counts containing all read counts for by population structure and ranges from 0 (identical) to all samples at a glance were then generated. 1 (completely different with no common alleles shared) (Mohammadi and Prasanna 2003) calculated as; Data analysis δ2 Alignment of quality trimmed reads against target sFST = genome using Bowtie2 was done followed by variant p(1− p) discovery and genotyping of samples with Freebayes where δ2s is the variance in the frequency of the allele V1.0.2–16 (https:// github. com/ ekg/ freeb ayes# readme). between different subpopulations, weighted by the sizes Ploidy was set at 2 and genotypes were filtered for a mini- of the subpopulations, and p is the average frequency of mum coverage of 8 reads. SNP marker diversity and pro- an allele in the total population. file were analyzed using the Powermarker and GenAlEx software. SNP data quality check was done by filtering, where SNPs with call rate greater than 90% were retained Results and those with minor allele frequency (MAF) of < 0.05 Phenotypic yield data were discarded. The polymorphic information content The yield data showed that the tropical lines yielded more (PIC), observed heterozygosity (Ho), expected heterozy- than the temperate genotypes in Zimbabwe. The top ten gosity (He), allele frequency and Shannon Information performing genotypes were all tropical genotypes while Index (I) were computed in Powermaker (Liu and Muse all the bottom 10 were temperate genotypes (Table  1). 2005) and GenAlEx (Peakall and Smouse 2012). The frequency of the performance data of the genotypes Genetic diversity analyses were conducted using the R is shown in Fig.  1. Only 15 genotypes were able to give software. The genotypes were subjected to Silhouette plot yield that was above 4000  kg/ha and these were mainly analysis in R Statistics 3.5.1 version (Team R Core 2015) of tropical origin. Out of the 49 genotypes which yielded to determine the probable number of clusters formed. between 3000 and 4000  kg/ha, 46 are of temperate ori- Coefficients of similarity showing genetic distances gin. Most of the genotypes (70) were in the yield range of among the soybean lines (Matrix of similarities) were 2000–3000 kg/ha while no genotype gave a yield that was calculated in R Statistics following the Gower’s Distance below 1000 kg/ha (Fig. 1). T sindi et al. CABI Agriculture and Bioscience (2023) 4:15 Page 5 of 14 Table 1 Top ten and bottom ten yield data for the soybean genotypes evaluated in Zimbabwe Rank and trial statistics Genotype name Adaptation Grain yield (kg/ha) Top Ten performing genotypes Saga Tropical 4817.09 Safari Tropical 4761.03 Serenade Tropical 4734.82 Saxon Tropical 4730.93 Mwenezi Tropical 4501.17 Solitaire Tropical 4387.07 Spike Tropical 4375.53 S722-6-1E Tropical 4266.46 S1440-5-2E Tropical 4265.78 Squire Tropical 4243.98 Bottom Ten performing genotypes Ozark Temperate 1138.72 NC-Tinius Temperate 1119.99 Spencer Temperate 1112.50 UI.San Temperate 1107.63 HF93-035 Temperate 1086.65 HF93-083 Temperate 1075.04 Defiance Temperate 1052.57 Clifford Temperate 1042.83 LN83-2356 Temperate 1011.36 UA 4805 Temperate 1010.24 Statistics Mean 2552.00 SE mean 64.84 STD 917.00 P value < 0.001 80 Table 2 SNP marker diversity for genotyping 210 diverse 70 temperate and tropical soybean lines 60 Mean Min Max 50 Major allele frequency 0.76 0.00 1.00 40 66 70 Minor Allele Frequency (MAF) 0.24 0.05 0.5030 49 Expected Heterozygosity (He) 0.31 0.00 0.94 20 Observed Heterozygosity (Ho) 0.02 0.00 1.00 10 15 Polymorphic Information Content (PIC) 0.24 0.01 0.37 0 0 Allele number 1.88 1.00 3.00 <1000 1000-2000 2000-3000 3000-4000 >4000 Yield (kg/ha) Shannon information index 0.45 0.03 0.98 Fig. 1 Frequency distribution of 200 non-genetically modified soybean genotypes for grain yield heterozygosity was 0.02. The mean polymorphic infor- mation content (PIC) was 0.24. SNP marker diversity and profile After filtering, 403 SNP markers remained with minor Population structure allele frequency > 0.05. The SNP marker profiles are pre- The silhouette plots showed that considering two clus- sented in Table 2. The average minor allele frequency was ters will produce one genotype with a negative silhou- 0.24. The number of alleles ranged from 1 to 3 with an ette value (Fig. 2a). When three clusters were considered, average of 1.88. The Shannon Information index ranged all the genotypes fitted perfectly into the three clus- from 0.03 to 0.98 with a mean of 0.45. The mean expected ters (Fig.  1b). Having more clusters produced several heterozygosity (He) was 0.31, whilst the mean observed genotypes with negative values on the silhouette plots. Number of Genotypes Tsindi et al. CABI Agriculture and Bioscience (2023) 4:15 Page 6 of 14 Fig. 2 Silhouette plots showing the number of possible clusters formed from 210 genotyped soybean lines a. considering 2 clusters b. considering 3 clusters Therefore, three clusters were perfect in grouping all South African genotypes clustered together. Several USA the genotypes (Fig. 2b) thus three clusters were the best genotypes also clustered close to each other. fit for all genotypes. In the first cluster, 205 individuals When only tropical lines were analysed three clusters were identified whilst cluster two and three had three were formed where all the Zimbabwean lines clustered and two lines, respectively. The average genetic distances together in the first cluster, while all the South Afri- (GD) were 0.28, 0.11 and 0.13 for the Clusters 1, 2 and 3, can lines also clustered together in the second cluster respectively. (Fig. 4). The third cluster had Tikolore, the only line from According to the Gower’s genetic distances calculated Malawi. Sister lines clustered close to each other, for in R statistics, all  the 210  genotypes were also grouped example S1440-5-1E and S1440-5-2E, as well as LDC-5-3 into three clusters as shown in the phylogenetic tree and LDC-5-9. Shortest genetic distance existed between drawn using UPGMA cluster analysis (Fig.  3). The first Stanza and Mwenezi (0.08) and Solitaire and Pan 1867 cluster consisted of three temperate genotypes, Nitchuu with a genetic distance of 0.09. Greatest genetic distances 47, Tara and Tousan, while the second cluster consisted were observed between Tikolore and Stanza (0.24), of two lines, namely Forrest and Fowler. The five geno- Tikolore and Mwenezi (0.17) and Tikolore and Serenade types in cluster one and two are all from USA. The third (0.12). cluster consisted of 205 genotypes. The genotypes in this A UPMGA phylogenetic tree for temperate genotypes cluster consisted of all tropical genotypes from Zimba- only is shown in Fig. 5. While this tree shows three clus- bwe, South Africa, Malawi and several temperate geno- ters for these lines, the same lines that clustered close types from the USA. There were genotypes which had together when all 210 lines were included (including tem- short genetic distances (Fig.  3) between them such as perate lines), still clustered close to each other when these Pudou 426 and Usada Zairai (0.02); Yougestu and Tachi- temperate lines were used in the analysis. Most of the LD yukata (0.02), UI. San and IC. San (0.05), Saga and Santee lines clustered together just like when the temperate lines (0.07), Stanza and Mwenezi (0.08). Most of the lines from and tropical lines where used. Moreso, lines like Benning Zimbabwe are fitted in the third cluster. Three of the and Bingnan, Yougestu and Tachiyutaka and IC-San and T sindi et al. CABI Agriculture and Bioscience (2023) 4:15 Page 7 of 14 Fig. 3 UPMGA phylogenetic tree showing three clusters for all the 210 soybean lines drawn using the Gower’s similarity distances UI-San clustered close to each other with short genetic genotypes consisted of genomes made of at least two of distances of 0.08, 0.02 and 0.05, respectively. the subpopulations (Fig. 7). The Evanno method was used to reveal the optimum k value for the genotyped soybean lines in STRU CTU Duplications RE Harvester. The results of delta k (∆k) curve show that Clone analysis was done in GenAIEx to identify duplica- the k peaked at 3 with a mean value of ln likelihood of tions. Table 3 shows the results. Two groups of duplicates -46516.5 and variance of ln likelihood of 3407.0 meaning were identified. Pudou-426 and Usada-Zairai were identi- a total of three clusters or subpopulations contributed fied as duplicates while Tachiyukata and Yougestu were to the total variation in the soybean lines under study also identified as duplicates. The duplicate groups were (Fig. 6). labeled as A and B, respectively. Population structure was constructed to reveal the architecture within the population. In agreement with Genetic diversity among soybean lines the Evanno method, three sub populations were recog- Analysis of molecular variance (AMOVA) was performed nised (Fig. 7). Each of the colors (red, green and blue) in using the GenAIex for the three subpopulations identi- the population struture represents each cluster. The lines fied in STRU CTU RE. The AMOVA showed that total Fowler and Forrest (188 and 180 respectively) clustered variation within the population can be partitioned into close to each other while these are also closely clustered among- and within population sources, accounting for to Tousan (102), Tara (147) and Nutchu 47 which were in 4% and 96% of the total variation, respectively (Table 4). another cluster according to the UPMGA. Several other The FST value of 0.06 was low. Tsindi et al. CABI Agriculture and Bioscience (2023) 4:15 Page 8 of 14 Fig. 4 Dendogram showing clustering of the 30 tropical soybean lines Table  5 shows genetic variability among and within These accessions can be utilized in soybean breeding pro- populations and the fixation index (FST) for the soybean grams for introgression of important traits, such as rust lines. The Nei’s net nucleotide distance ranged from 0.06 resistance and phenotypic maturity date if screened for between cluster 1 and cluster 2 to 0.12 between cluster such traits as this would reduce linkage drag effects on 2 and cluster 3. Cluster 1 and cluster 3 had a nucleotide productivity (Abebe et al. 2021). distance of 0.09. This means that cluster 2 and 3 were fur- thest apart, whereas cluster 1 and 2 were closer to each SNP marker diversity and profile other. The least within population variation was recorded The SNPs used were quite informative and desirable for in cluster 3 with an expected heterozygosity (He) of 0.21, differentiating the soybean genotypes under study. The whilst cluster 2 had the highest within population varia- allelic number ranging from 1 to 3 can be attributed to tion of 0.31. The fixation index ( FST) were 0.06 (Cluster the crop being self-pollinated, which is consistent with 1), 0.29 (cluster 2) and 0.02 (cluster 3). Cluster 3 had the previous reports for low allelic diversity and heterozy- lowest genetic variance proportion of 0.02 (Table 5). gosity levels for soybean (Abebe et  al. 2021; Wright 1921). The mean minor allele frequency (MAF) value of Discussion 0.24, which is above 0 reflects the SNPs were informa- Phenotypic yield data tive. The MAF values measures the ability of markers to The results showed that the tropical lines yielded more discriminate genotypes. With SNP markers due to their than the temperate lines which indicates the tropical bi-allelic nature, a value above 0 is considered informa- lines are well adapted to the Zimbabwean environment. tive or discriminating. In the present study, 60% of the This is usually expected especially when lines are intro- markers had a MAF between 0.3 and 0.5 which is com- duced from a different region with different environmen- parable to values reported on soybean in previous stud- tal conditions in terms of rainfall, latitude, altitude and ies (Chander et al. 2021; Abebe et al. 2021). The mean temperatures. While the temperate genotypes yielded PIC value of 0.24 also indicates that the markers were less than the tropical, 46 temperate genotypes yielded rel- informative. Considering the bi-allelic nature of SNPs atively better above 3000 kg/ha, indicating their potential where the PIC cannot exceed 0.5 (Singh et  al. 2013), utility for tropical and subtropical breeding programs. the PIC values obtained in this study were desirable Tsindi et al. CABI Agriculture and Bioscience (2023) 4:15 Page 9 of 14 Fig. 5 UPMGA phylogenetic tree showing clusters of the 180 temperate soybean lines only for differentiating the 210 soybean genotypes. Similar discriminating hence they can be recommended for results were reported in soybean by Abebe et al. (2021) diversity studies in other soybean populations. who reported a mean PIC value of 0.25 among elite lines developed by the IITA. In other self-pollinated Population structure and genetic diversity crops, Singh et al. (2013) reported a mean PIC value of The study was effective for determining the population 0.23 in rice. The observed heterozygosity (Ho) of 0.02 structure and level of diversity in the germplasm col- was lower than the expected heterozygosity (He) in this lection. There was consistency in the outcome from the study. This implies high possibilities of inbreeding and Silhouette plots, UPMGA and Evanno method in STRU fixation at most of the loci (Nawaz et  al. 2021). Over- CTU RE used to discriminate the 210 soybean genotypes all, the SNPs used in this study were informative and into clusters based on genetic similarity. The silhouette Tsindi et al. CABI Agriculture and Bioscience (2023) 4:15 Page 10 of 14 region and external sources from other regions, such as Asia and America. An analysis of seed shipments indi- cates that there is a lot of germplasm exchange between the soybean breeding programs in Southern Africa and the USA. This implies that the soybean lines were derived from shared backgrounds and were selected for the same market requirements leading to utilization of the same set of alleles. According to the literature and actual pedi- gree analysis of this germplasm set, most soybean lines were developed from a narrow genetic base derived from a few ancestral lines. A survey of the literature indicates extensive utilization of external germplasm from differ- ent countries, such as China, Japan and Korea (Abebe et al. 2021; Bruce et al. 2019; Jeong et al. 2019a, b; Kim et al. 2014). It is a standard and recommended industry Fig. 6 Graph showing the best k value using the Evanno method practice for breeders to continuously incorporate and integrate external germplasm in their breeding programs. According to the phylogenetic tree of the 210 geno- types and a separate analysis of the tropical lines only, Table 3 Duplications of the soybean lines derived from clone Zimbabwean and South African lines are clustered analysis together separately. These lines were bred to satisfy the same market requirements with common trait prefer- Sample No Sample Pop Number of Label of duplications duplication ences and common allelic constitutions. Several other genotypes clustered close to each other in accordance 160 Pudou-426 2 2 A with their origin, adding credence to the possibility of 125 Usada-Zairai 2 0 A utilizing common genetic background in breeding pro- 55 Tachiyukata 2 2 B grams. Similar results of soybean genotypes that were 7 Yougestu 2 0 B clustered in accordance with the place of origin have been reported (Lee et al. 2014; Liu et al. 2017). This has also been reported for other legume crops, such as cow- plots grouped the genotypes into three clusters perfectly, pea (Fatokun et al. 2018; Sodedji et al. 2021) and sesame indicating that these were the effective number of clus- (Basak et al. 2019). In the analysis involving tropical lines ters which could be formed from the germplasm used in only, Tikolore was classified alone in its own cluster this study. The silhouette plots are generally used to visu- showing its potential for use in the tropical breeding pro- alize how well the data points belongs to the cluster. The grams for introgression of important traits. silhouette scores which range from -1 to 1 measure how Duplications show high level of genetic similarities similar an object is to its own cluster compared to other (Makore et  al. 2021) which was revealed in this study clusters (Menardi 2011; Pant et al. 2008; Rousseeuw 1987; which is consistent with the findings from the phyloge- Thinsungnoen et al. 2015). This finding was confirmed by netic tree that shows low genetic distances between some two additional tools used in the study. lines. Seemingly, the observations of duplications and The Unweighted Pair Group Method using Arithme- minimal genetic distances indicates that there are intro- tic average (UPGMA) produced a phylogenetic tree with ductions that were given different names by different three populations which corroborated the findings from breeders. the silhouette plots and the Evanno method. While five The results from analysis of molecular variance genotypes from the USA (Nitchuu 47, Tara, Tousan, For- (AMOVA) supports the possibility of high gene flow rest and Fowler) were grouped in clusters one and two, as shown by the variation among populations that all other genotypes were grouped in the third cluster. The accounted for just 4% of the total variation, whilst within genotypes included in the third cluster were from differ- populations variation was about 96% of the total vari- ent sources, from the USA, Zimbabwe, South Africa and ation. The F value of 0.06 indicated that there is low Malawi. This means that there was limited molecular STgenetic difference among populations, suggesting high variation among the genotypes used in this study. This gene exchange. This observation is consistent with the could be attributed to exchange of genetic material across literature. Wang et al. (2012) reported that most popula- the different breeding programs in the Southern Africa tions were exhibiting the effects of genetic bottlenecks. T sindi et al. CABI Agriculture and Bioscience (2023) 4:15 Page 11 of 14 Fig. 7 Population structure of the 210 soybean lines Table 4 Analysis of molecular variance (AMOVA) for the 210 soybean lines Source df SS MS Est. Var % FST Among pops 2 194.021 97.010 3.451 4 0.06 Within pops 207 16574.232 80.069 80.069 96 Total 209 16768.252 80.230 83.520 100 Table 5 Allele-frequency divergence among populations (Nei’s Basak et al. (2019) also reported similar results in sesame. Net nucleotide distance) and within populations (expected Abebe et al. (2021) cited moderate genetic variation and heterozygosity) and Fixation Index (FST) for 210 soybean lines that 11% of the total variation was attributed to among Population Nei’s nucleotide distance Expected F clusters and 71% was due to individual genotypes and ST Heterozygosity an FST value of 0.11 in soybean. Generally, low F val-Cluster 2 Cluster 3 STues close to 0 indicate that subpopulations are similar 1 0.06 0.09 0.30 0.06 in almost all alleles or there is little divergence within 2 – 0.12 0.31 0.29 the population, whilst FST value of 1 means the subpop- 3 – – 0.21 0.02 ulation is fixed at all alleles (Basak et  al. 2019; Moham- madi and Prasanna 2003). In the current studies, the low FST values has an implication in breeding in that little Tsindi et al. CABI Agriculture and Bioscience (2023) 4:15 Page 12 of 14 improvement can be done through simple hybridization Competing interests in some traits of economic importance, for example yield. The authors declare that they have no competing interests. However, the low diversity can be utilized in conserva- Author details tion of such important traits by crossing the related gen- 1 West Africa Centre for Crop Improvement, College of Basic and Applied Sci- otypes. For example, crossing genotypes within cluster 3 ences, University of Ghana, PMB 30, Legon, Accra, Ghana. 2 Seed Co Limited, Rattray Arnold Research Station, Chisipite, P. O. Box CH142, Harare, Zimbabwe. to maintain high yields in some of the genotypes while 3 University of Zimbabwe, MT Pleasant, P. O. Box MP167, Harare, Zimbabwe. taking advantage of some rare or minor alleles found in 4 International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), other genotypes. Minor alleles that can be leveraged on Matopos Research Station, P.O. Box 776, Bulawayo, Zimbabwe. 5 International Institute of Tropical Agriculture (IITA), PMB 5320, Ibadan 200001, Nigeria. in such germplasm could be for earliness found in most USA genotypes. Genotypes from cluster 2 and 3 can be Received: 3 February 2023 Accepted: 22 May 2023 hybridized for improved varieties although the improve- ment has a certain ceiling because of the low genetic vari- ation within the whole germplasm used in this study. References Abebe AT, Kolawole AO, Unachukwu N, Chigeza G, Tefera H, Gedil M. Assess- ment of diversity in tropical soybean (Glycine max (L.) Merr.) varieties Conclusions and recommendations and elite breeding lines using single nucleotide polymorphism markers. The SNP markers used were informative and displayed Plant Genet Resour Charact Util. 2021;19(1):20–8. https:// doi.o rg/ 10. 1017/ S1479 262121 00003 4. high discrimination capacity, hence the results from this Bandillo N, Jarquin D, Song Q, Nelson R, Cregan P, Specht J, Lorenz A. A study were useful for molecular characterization of this population structure and genome-wide association analysis on the USDA soybean collection in Southern Africa. The 210 germ- soybean germplasm collection. Plant Genome. 2015. https:// doi. org/ 10. 3835/ plant genome 2015.0 4.0 024. plasm lines were consistently grouped into three clusters Bandillo NB, Anderson JE, Kantar MB, Stupar RM, Specht JE, Graef GL, Lorenz using three tools. Low molecular diversity was evident. AJ. Dissecting the genetic basis of local adaptation in soybean. Sci Rep. These findings have serious implications for the breed- 2017;7(1):1–12. https://d oi.o rg/1 0. 1038/ s41598- 017-1 7342-w. Basak M, Uzun B, Yol E. Genetic diversity and population structure of the ing programs that aim to improve soybean varieties by Mediterranean sesame core collection with use of genome-wide SNPs utilizing this germplasm collection. Innovation strate- developed by double digest RAD-Seq. PLoS ONE. 2019;14(10):1–15. gies for improving variability in the germplasm collec- https://d oi. org/ 10.1 371/ journ al. pone.0 2237 57. Bellaloui N, Bruns HA, Gillen AM, Abbas HK, Zablotowicz RM, Mengistu A, Paris tion, such as investments in pre-breeding, increasing the RL. Soybean seed protein, oil, fatty acids, and mineral composition as geographic sources of introductions and exploitation of influenced by soybean-corn rotation. Agric Sci. 2010;1(3):102–9. https:// mutation breeding would be recommended to enhance doi. org/1 0. 4236/ as.2 010.1 3013. Biyeu K, Ratnaparkhe MB, Kole C. Genetics, genomics and breeding of soy- genetic gain. bean. New Hampshire: CRC Press; 2010. p. 1–18. Blair MW, Cortés AJ, Penmetsa RV, Farmer A, Carrasquilla-Garcia N, Cook DR. Acknowledgements A high-throughput SNP marker system for parental polymorphism The authors would like to acknowledge DAAD for funding the research and screening, and diversity analysis in common bean (Phaseolus vulgaris Seed Co for the provision of the experimental stations for this study. L.). Theor Appl Genet. 2013;126(2):535–48. https:// doi. org/1 0.1 007/ s00122- 012- 1999-z. Author contributions Bruce RW, Torkamaneh D, Grainger C, Belzile F, Eskandari M, Rajcan I. Genome- AT conceptualization of the research, field work, data analysis, writing of the wide genetic diversity is maintained through decades of soybean original draft, reviewing and editing of the final manuscript, EG data analysis, breeding in Canada. Theor Appl Genet. 2019. https://d oi. org/ 10. 1007/ reviewing and editing, HM reviewing and editing, JFYE supervision, review- s00122- 019- 03408-y. ing and editing, PT supervision, reviewing and editing, EYD supervision and Chander S, Garcia-Oliveira AL, Gedil M, Shah T, Otusanya GO, Asiedu R, Chigeza reviewing, LM reviewing and editing, MZ selection of SNP markers, reviewing G. Genetic diversity and population structure of soybean lines adapted to and editing, EZ reviewing and editing, JD supervision, reviewing and editing. sub-saharan africa using single nucleotide polymorphism (Snp) markers. All authors read and approved the final manuscript. Agronomy. 2021. https:// doi.o rg/1 0. 3390/a grono my110 30604. Chen Y, Nelson RL. Relationship between origin and genetic diversity in Chi- Funding nese soybean germplasm. Crop Sci. 2005;45(4):1645–52. https:// doi. org/ The Research was funded by German Academic Exchange Service (DAAD) as 10.2 135/ crops ci2004.0 071. part of the PhD funding. Core TR. RStudio: Integrated development for R. RStudio, Inc., Boston. 2015. http:// www. rstud io.c om/. Accessed 15 Sept 2021. Availability of data and materials Cornelious BK, Sneller CH. Yield and molecular diversity of soybean lines The datasets used and/or analysed during the current study are available from derived from crosses of Northern and Southern Elite parents. Crop Sci. the corresponding author on reasonable request. 2002;42:642–7. Cortés AJ, Chavarro MC, Blair MW. SNP marker diversity in common bean Declarations (Phaseolus vulgaris L.). Theor Appl Genet. 2011;123(5):827–45. https:// doi. org/1 0. 1007/ s00122-0 11-1 630-8. Ethics approval and consent to participate Earl DA, VonHoldt BM. STRUC TUR E HARVESTER: a website and program for Not applicable. visualizing STRU CTU RE output and implementing the Evanno method. Conserv Genet Resour. 2012;4(2):359–61. https:// doi. org/1 0. 1007/ Consent for publication s12686- 011- 9548-7. Not applicable. Edwards D, Forster JW, Chagné D, Batley J. What are SNPs? Assoc Mapp Plants. 2007. https:// doi. org/ 10.1 007/ 978-0- 387- 36011-9_3. T sindi et al. CABI Agriculture and Bioscience (2023) 4:15 Page 13 of 14 FAOSTAT. Food and agriculture data. 2021. http:// www.f ao. org/ faost at/ en/# Makore F, Gasura E, Souta C, Mazarura U, Derera J, Zikhali M, Kamutando CN, data/ QC. Accessed 21 May 2022. Magorokosho C, Dari S. Molecular characterization of a farmer-preferred Fatokun C, Girma G, Abberton M, Gedil M, Unachukwu N, Oyatomi O, Yusuf maize landrace population from a multiple-stress-prone subtropical M, Rabbi I, Boukar O. Genetic diversity and population structure of a lowland environment. Biodiversitas. 2021;22(2):769–77. https:// doi.o rg/ mini-core subset from the world cowpea (Vigna unguiculata (L.) Walp.) 10.1 3057/ biodiv/ d2202 30. germplasm collection. Sci Rep. 2018;8(1):1–10. https:// doi. org/ 10. 1038/ Malik MFA, Ashraf M, Qureshi AS, Khan MR. Investigation and comparison of s41598- 018- 34555-9. some morphological traits of the soybean populations using cluster Garrison E, Marth G. Haplotype-based variant detection from short-read analysis. Pak J Bot. 2011;43(2):1249–55. sequencing. 2012. http://a rxiv. org/ abs/ 1207. 3907. Accessed 10 Apr 2019. Menardi G. Density-based Silhouette diagnostics for clustering methods. Stat Gower JC. A general coefficient of similarity and some of its properties. Biom- Comput. 2011;21(3):295–308. https://d oi. org/1 0.1 007/s 11222-0 10- 9169-0. etrics. 1971;27(4):857–74. Mohammadi SA, Prasanna BM. Analysis of genetic diversity in crop plants— Grieshop CM, Fahey GC Jr. Comparison of quality characteristics of soy- salient statistical tools and considerations. Crop Sci. 2003;43(4):1235–48. beans from Brazil, China, and the United States. J Agric Food Chem. https:// doi. org/1 0. 2135/ crops ci2003. 1235. 2001;49:2669–73. https:// doi.o rg/ 10.1 021/ jf0014 009. Nadeem MA, Nawaz MA, Shahid MQ, Doğan Y, Comertpay G, Yıldız M, Gwinner R, Alemu Setotaw T, Pasqual M, Dos Santos JB, Zuffo AM, Zambiazzi Hatipoğlu R, Ahmad F, Alsaleh A, Labhane N, Özkan H, Chung G, Baloch EV, Bruzi AT. Genetic diversity in Brazilian soybean germplasm. Crop Breed FS. DNA molecular markers in plant breeding: current status and recent Appl Biotechnol. 2017;17(4):373–81. https:// doi. org/ 10. 1590/1 984- 70332 advancements in genomic selection and genome editing. Biotechnol 017v1 7n4a56. Biotechnol Equip. 2018;32(2):261–85. https:// doi.o rg/1 0. 1080/ 131028 18. Hahn V, Würschum T. Molecular genetic characterization of Central European 2017.1 4004 01. soybean breeding germplasm. Plant Breed. 2014;133(6):748–55. https:// Nawaz MA, Lin X, Chan TF, Lam HM, Baloch FS, Ali MA, Golokhvast KS, Yang doi. org/1 0. 1111/ pbr. 12212. SH, Chung G. Genetic architecture of wild soybean (Glycine soja Sieb. Hyten DL, Choi IY, Song Q, Shoemaker RC, Nelson RL, Costa JM, Specht JE, and Zucc.) populations originating from different East Asian regions. Cregan PB. Highly variable patterns of linkage disequilibrium in multiple Genet Resour Crop Evol. 2021;68(4):1577–88. https:// doi. org/1 0. 1007/ soybean populations. Genetics. 2007;175(4):1937–44. https:// doi.o rg/1 0. s10722- 020- 01087-z. 1534/g enet ics. 106. 069740. Nemlı S, Kaygisiz Aşçioğul T, Ateş D, Eşıyok D, Tanyolaç MB. Diversity and Hyten Dl, Choi I, Song Q, Specht JE, Carter TE, Shoemaker RC, Hwang EY, Matu- genetic analysis through DArTseq in common bean (Phaseolus vulgaris kumalli LK, Cregan PB. A high density integrated genetic linkage map of L.) germplasm from Turkey. Turkish J Agric For. 2017;41(5):389–404. soybean and the development of a 1536 universal soy linkage panel for https:// doi. org/ 10.3 906/t ar-1 707-8 9. quantitative trait locus mapping. Crop Sci. 2010;50:960–8. Ojo DK, Ajayi AO, Oduwaye OA. Genetic relationships among soybean acces- Jeong N, Kim KS, Jeong S, Kim JY, Park SK, Lee JS, Jeong SC, Kang ST, Ha BK, sions based on morphological and RAPDs techniques. J Trop Agric Sci. Kim DY, Kim N, Moon JK, Choi MS. Korean soybean core collection: geno- 2012;35(2):237–48. typic and phenotypic diversity population structure and genome-wide Oliveira MF, Nelson RL, Geraldi IO, Cruz CD, de Toledo JFF. Establishing a soy- association study. PLoS ONE. 2019a;14(10):1–16. https:// doi.o rg/ 10.1 371/ bean germplasm core collection. Field Crop Res. 2010;119(2–3):277–89. journ al.p one. 022407 4. https:// doi.o rg/1 0.1 016/j. fcr. 2010.0 7.0 21. Jeong SC, Moon JK, Park SK, Kim MS, Lee K, Lee SR, Jeong N, Choi MS, Kim N, Orf J. Introduction. In: Biyeu K, Ratnaparkhe MB, Kole C, editors. Genetics, Kang ST, Park E. Genetic diversity patterns and domestication origin of gonomics and breeding of soybean. New Hampshire: CRC Press; 2010. soybean. Theor Appl Genet. 2019b;132(4):1179–93. https://d oi. org/1 0. p. 1–18. 1007/s 00122-0 18- 3271-7. Pant M, Radha T, Singh VP. Particle swarm optimization using Gaussian inertia Kim KH, Lee S, Seo MJ, Lee GA, Ma KH, Jeong SC, Lee SH, Park EH, Kwon YU, weight. Proceedings—international conference on computational Moon JK. Genetic diversity and population structure of wild soybean intelligence and multimedia applications, ICCIMA 2007, 2008; 1, 97–102. (Glycine soja Sieb. and Zucc.) accessions in Korea. Plant Genet Resour https://d oi.o rg/ 10. 1109/ ICCIMA. 2007. 328. Charact Util. 2014;12:48–51. https:// doi. org/1 0.1 017/S 1479 262114 0002 39. Peakall R, Smouse PE. GenAlEx 6.5: genetic analysis in Excel. Population Lee G-A, Choi Y-M, Yi J-Y, Chung J-W, Lee M-C, Ma K-H, Lee S, Cho J, Lee J-R. genetic software for teaching and research—an update. Bioinformatics. Genetic diversity and population structure of korean soybean collection 2012;28:2537–9. Using 75 microsatellite markers. Korean J Crop Sci. 2014;59(4):492–7. Porras-Hurtado L, Ruiz Y, Santos C, Phillips C, Carracedo Á, Lareu MV. An over- https:// doi.o rg/ 10. 7740/ kjcs. 2014. 59.4.4 92. view of STRU CTUR E: Applications, parameter settings, and supporting LGC Bioscience Technologies. SeqSNP targeted GBS as alternative for array software. Front Genet. 2012;4(MAY):1–13. https://d oi.o rg/1 0. 3389/f gene. genotyping in routine breeding programs. 2019. https:// biose arch-c dn. 2013.0 0098. azure edge.n et/ asset sv6/s eqsnp-t gbs-a ltern ative- genoty ping-r outi ne- Qin J, Shi A, Xiong H, Mou B, Motes D, Lu W, Miller JC, Scheuring DC, Nzaramba breed ing-p rogr ams.p df. Accessed 12 Feb 2020. MN, Weng Y, Yang W. Population structure analysis and association map- Li Y, Zhao S-C, Ma J-X, Li D, Yan L, Li J, Qi X, Guo X, Zhang L, He W, Chang R, ping of seed antioxidant content in USDA cowpea (Vigna unguiculata Liang Q, Guo Y, Ye C, Wang X, Tao Y, Guan R, Wang J, Liu Y, Jin L, Zhang L. Walp.) core collection using SNPs. Can J Plant Sci. 2016;96(6):1026–36. X, Liu Z, Zhang L, Chen J, Wang K, Nielsen R, Li R, Chen P, Li W, Reif J, https://d oi.o rg/1 0.1 139/ cjps- 2016-0 090. Purugganan M, Wang J, Zhang M, Wang J, Qiu L-J. Molecular footprints Rafalski A. Applications of single nucleotide polymorphisms in crop genetics. of domestication and improvement in soybean revealed by whole Curr Opin Plant Biol. 2002;5(2):94–100. https:// doi.o rg/ 10. 1016/ S1369- genome re-sequencing. BMC Genomics. 2013. https://d oi.o rg/ 10. 1186/ 5266(02) 00240-6. 1471- 2164- 14- 579. Rambaut A. FigTree: molecular evolution, phylogenetics and epidemiology. Li YH, Reif JC, Jackson SA, Ma YS, Chang RZ, Qiu LJ. Detecting SNPs underlying 2016. http:// tree. bio.e d. ac.u k/ softwa re/ figtre e/. Accessed 15 Sept 2021. domestication-related traits in soybean. BMC Plant Biol. 2014;14(1):1–8. Rousseeuw PJ. Silhouettes: a graphical aid to the interpretation and validation https://d oi.o rg/ 10.1 186/ s12870-0 14-0 251-1. of cluster analysis. J Comput Appl Math. 1987;20:53–65. https://d oi.o rg/ Liu K, Muse SV. PowerMarker: an integrated analysis environment for genetic 10.1 016/0 377- 0427(87) 90125-7. marker analysis. Bioinformatics. 2005;21:2128–9. https:// doi. org/ 10. 1093/ Singh N, Choudhury DR, Singh AK, Kumar S, Srinivasan K, Tyagi RK, Singh NK, bioin forma tics/ bti282. Singh R. Comparison of SSR and SNP markers in estimation of genetic Liu Z, Li H, Wen Z, Fan X, Li Y, Guan R, Guo Y, Wang S, Wang D, Qiu L. Com- diversity and population structure of Indian rice varieties. PLoS ONE. parison of genetic diversity between Chinese and American soybean 2013;8(12):1–14. https:// doi.o rg/ 10. 1371/ journ al.p one.0 08413 6. (Glycine max (L.)) accessions revealed by high-density SNPs. Front Plant Sodedji FAK, Agbahoungba S, Agoyi EE, Kafoutchoni MK, Choi J, Nguetta Sci. 2017. https:// doi. org/1 0. 3389/ fpls. 2017. 02014. SPA, Assogbadjo AE, Kim HY. Diversity, population structure, and linkage Ma YS, Wang WH, Wang LX, Ma FM, Wang PW, Chang RZ, Qiu LJ. Genetic diver- disequilibrium among cowpea accessions. Plant Genome. 2021. https:// sity of soybean and the establishment of a core collection focused on doi.o rg/ 10.1 002/t pg2. 20113. resistance to soybean cyst nematode. J Integr Plant Biol. 2006;48(6):722– Thinsungnoen T, Kaoungku N, Durongdumronchai P, Kerdprasop K, Kerd- 31. https:// doi.o rg/1 0. 1111/j.1 744-7 909. 2006. 00256.x. prasop N. The Clustering Validity with Silhouette and Sum of Squared Tsindi et al. CABI Agriculture and Bioscience (2023) 4:15 Page 14 of 14 Errors. In proceedings of the 3rd international conferance on industrial application engineering. Japan: The Institute of Industrial applications Engeineers. 2015; 44–51. https:// doi. org/ 10.1 2792/ iciae 2015. 012 Tiwari S, Tripathi N, Tsuji K, Tantwai K. Genetic diversity and population struc- ture of Indian soybean (Glycine max (L.) Merr.) as revealed by microsatel- lite markers. Physiol Mol Biol Plants. 2019;25(4):953–64. https:// doi. org/ 10. 1007/ s12298-0 19- 00682-4. Valliyodan B, Brown AV, Wang J, Patil G, Liu Y, Otyama PI, Nelson RT, Vuong T, Song Q, Musket TA, Wagner R, Marri P, Reddy S, Sessions A, Wu X, Grant D, Bayer PE, Roorkiwal M, Varshney RK, Liu X, Edwards D, Xu D, Joshi T, Cannon SB, Nguyen HT. Genetic variation among 481 diverse soybean accessions, inferred from genomic re-sequencing. Sci Data. 2021;8(1):1–9. https:// doi. org/ 10. 1038/ s41597-0 21- 00834-w. Wang Y, Guo J, Liu Y, Wang Y, Chen J, Li Y, Huang H, Qiu L. Population structure of the wild soybean (Glycine soja) in China: Implications from microsatel- lite analyses. Ann Bot. 2012;110(4):777–85. https:// doi. org/ 10.1 093/ aob/ mcs142. Wright S. Systems of mating. II. The effects of inbreeding on the genetic com- position of a population. Genetics. 1921;6:124–43. Yang S, Pang W, Ash G, Harper J, Carling J, Wenzl P, Huttner E, Zong X, Kilian A. Low level of genetic diversity in cultivated Pigeonpea compared to its wild relatives is revealed by diversity arrays technology. Theor Appl Genet. 2006;113(4):585–95. https:// doi. org/ 10.1 007/ s00122- 006- 0317-z. Zavinon F, Adoukonou-Sagbadja H, Keilwagen J, Lehnert H, Ordon F, Perovic D. Genetic diversity and population structure in Beninese pigeon pea [Cajanus cajan (L.) Huth] landraces collection revealed by SSR and genome wide SNP markers. Genet Resour Crop Evol. 2020;67(1):191–208. https:// doi. org/ 10. 1007/ s10722-0 19- 00864-9. Zhu YL, Song QJ, Hyten DL, Van Tassell CP, Matukumalli LK, Grimm DR, Hyatt SM, Fickus EW, Young ND, Cregan PB. Single-nucleotide polymorphisms in soybean. Genetics. 2003;163(3):1123–34. https://d oi. org/ 10. 1093/ genet ics/ 163.3. 1123. Ziervogel G, New M, van Garderen EA, Midgley G, Taylor A, Hamann R, Stuart- Hill S, Myers J, Warburton M. Climate change impacts and adaptation in South Africa. Wiley Interdiscip Rev Clim Ch. 2014. https:// doi.o rg/ 10. 1002/ wcc. 295. Žulj Mihaljević M, Šarčević H, Lovrić A, Andrijanić Z, Sudarić A, Jukić G, Pejić I. Genetic diversity of European commercial soybean [Glycine max (L.) Merr.] germplasm revealed by SSR markers. Genet Resour Crop Evol. 2020;67(6):1587–600. https:// doi.o rg/1 0. 1007/ s10722- 020- 00934-3. Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in pub- lished maps and institutional affiliations. Ready to submit your research ? Choose BMC and benefit from: • fast, convenient online submission • thorough peer review by experienced rese archers in your field • rapid publication on acceptance • support for research data, including large and complex data types • gold Open Access which fosters wider collaboration and increased citations • maximum visibility for your research: over 100M website views per year At BMC, research is always in progress. Learn more biomedcentral.com/submissions