Tsindi et al. CABI Agriculture and Bioscience            (2023) 4:15  CABI Agriculture
https://doi.org/10.1186/s43170-023-00158-2
and Bioscience
RESEARCH Open Access
Analysis of population structure and genetic 
diversity in a Southern African soybean 
collection based on single nucleotide 
polymorphism markers
A. Tsindi1,2, J. S. Y. Eleblu1, E. Gasura3, H. Mushoriwa4, P. Tongoona1, E. Y. Danquah1, L. Mwadzingeni2, M. Zikhali2, 
E. Ziramba2, G. Mabuyaye2 and J. Derera5*  
Abstract 
Soybean is an emerging strategic crop for nutrition, food security, and livestock feed in Africa, but improvement of its 
productivity is hampered by low genetic diversity. There is need for broadening the tropical germplasm base through 
incorporation and introgression of temperate germplasm in Southern Africa breeding programs. Therefore, this study 
was conducted to determine the population structure and molecular diversity among 180 temperate and 30 tropi-
cal soybean accessions using single nucleotide polymorphism (SNP) markers. The results revealed very low levels of 
molecular diversity among the 210 lines with implications for the breeding strategy. Low fixation index ( FST) value 
of 0.06 was observed, indicating low genetic differences among populations. This suggests high genetic exchange 
among different lines due to global germplasm sharing. Inference based on three tools, such as the Evanno method, 
silhouette plots and UPMGA phylogenetic tree showed the existence of three sub-populations. The UPMGA tree 
showed that the first sub-cluster is composed of three genotypes, the second cluster has two genotypes, while the 
rest of the genotypes constituted the third cluster. The third cluster revealed low variation among most genotypes. 
Negligible differences were observed among some of the lines, such as Tachiyukata and Yougestu, indicating shar-
ing of common parental backgrounds. However large phenotypic differences were observed among the accessions 
suggesting that there is potential for their utilization in the breeding programs. Rapid phenotyping revealed grain 
yield potential ranging from one to five tons per hectare for the 200 non-genetically modified accessions. Findings 
from this study will inform the crossing strategy for the subtropical soybean breeding programs. Innovation strate-
gies for improving genetic variability in the germplasm collection, such as investments in pre-breeding, increasing 
the geographic sources of introductions and exploitation of mutation breeding would be recommended to enhance 
genetic gain.
Keywords Glycine max, Molecular diversity, Phenotyping, Population structure, SNP, Soybean
*Correspondence:
J. Derera
j.derera@cgiar.org
Full list of author information is available at the end of the article
© The Author(s) 2023. Open Access  This article is licensed under a Creative Commons Attribution 4.0 International License, which 
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the 
original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or 
other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line 
to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory 
regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this 
licence, visit http:// creati veco mmons. org/l icens es/ by/4.0 /. The Creative Commons Public Domain Dedication waiver (http:// creati veco 
mmons. org/ publi cdomai n/ zero/1. 0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Tsindi et al. CABI Agriculture and Bioscience            (2023) 4:15 Page 2 of 14
Introduction improve local varieties. The major sources of soybean 
Soybean is an important nutritious crop used for food, germplasm lines and populations for Africa have been 
feed and industrial oils, worldwide. Its high utility is China, Japan, Korea and USA (Grieshop and Fahey 
explained by the high protein content of about 40% and 2001; Jeong et al. 2019a, b) where greater genetic diver-
high oil content reaching and exceeding 20% for some sity has been reported (Oliveira et al. 2010). China and 
genotypes (Bellaloui et  al. 2010; Orf 2010). In 2019, the the USA maintain large collections in their gene banks. 
worldwide production was over 300 million metric tons There are about thirty thousand accessions in the Chi-
produced on 120 million hectares of land (FAOSTAT nese Gene bank, while the USDA gene banks contain 
2021), which translate to a global average yield of 2.5 tons about 15000 accessions (Liu et  al. 2017). This germ-
per hectare. Production is dominated by a few countries. plasm cannot be directly used in the breeding programs 
The world’s leading soybean producers are Brazil, United in Africa. There is need to characterize the germplasm 
States of America, Argentina and China. Africa, contrib- before crossing is done. For example, 93% of the Chi-
utes only 0.9% to the total world production (FAOSTAT nese germplasm accessions are primitive cultivars but 
2021), which is negligible and does not match the highly diverse (Chen and Nelson (2005). These collec-
regional demand for soybean products. The major pro- tions are important sources of favorable alleles which 
ducers are South Africa, Nigeria, Ghana, Uganda, Ethio- can enhance breeding in Africa. However, when such 
pia, Zambia, Malawi and Zimbabwe. All these countries introductions are to be used for breeding purposes, 
fail to meet their national demand. As a result, Africa they need to be screened for their usefulness (Jeong 
imports soybean. et al. 2019a, b; Li et al. 2014) and inform the breeding 
There is need to develop varieties that are highly pro- strategy.
ductive and adapted to the tropical and subtropical A survey of the literature indicates that germplasm 
ecologies in Africa. Efforts are underway to identify such diversity characterization can be conducted follow-
varieties through the regional soybean breeding network ing two approaches. Both morphological or phenotypic 
that employs the Pan African Trials (PAT) under the and molecular genetic diversity studies have been used 
leadership of the Soybean Innovation Lab (SIL), in col- to assess variation in soybean (Abebe et  al. 2021; Ban-
laboration with the International Institute of Tropical dillo et  al. 2015; Chander et  al. 2021; Malik et  al. 2011; 
Agriculture (IITA), national public programs and private Jeong et al. 2019a, b; Ma et al. 2006; Nawaz et al. 2021; 
seed companies (https:// www.s oybea ninn ovati onlab. illin Ojo et al. 2012; Valliyodan et al. 2021; Wang et al. 2012; 
ois. edu/). The PAT shows a general low level of produc- Mihaljević et al. 2020). The advantages and limitations of 
tivity due to limited genetic improvements. However, both approaches have been discussed.
genetic improvement efforts are challenged by the low While morphological or phenotypic methods have 
genetic base of soybean (Cornelious and Sneller 2002; been successful for discriminating soybean genotypes, 
Lee et al. 2014; Li et al. 2013) owing to several domestica- their efficiency is compromised by complications which 
tion bottlenecks (Gwinner et al. 2017; Hyten et al. 2007; are caused by the genotype by environment interactions 
Rafalski 2002). (GxE) effects. GxE masks genotypic differences among 
The baseline genetic diversity of the soybean germ- the germplasm entries. The high levels of GxE effects 
plasm pool and introductions should be established requires that genotypes are evaluated at many sites. 
in order to devise a viable breeding strategy. Genetic However, due to the exorbitant costs for conducting 
improvement of any crop rests upon the diversity present multi-location trials, a few sites are often used resulting 
within and among the breeding populations (Biyeu et al. in a low resolution due to few data points. There are also 
2010). Knowledge of genetic variability helps in selec- challenges of waiting for a long time to get results. The 
tion of parental lines to be used when making crosses, length of the cycle from seed to seed is a hindrance as it 
establishment of core collections and enhanced utiliza- is time consuming, labor intensive and costly (Chander 
tion of the germplasm in breeding programs (Abebe et al. et al. 2021; Nadeem et al. 2018). As a result, use of molec-
2021; Bandillo et al. 2017). While there is limited diver- ular markers has increased. They are not affected by GxE 
sity among cultivars within country or regional breeding interactions, not growth specific and are abundant within 
programs because of sharing of common parents (Gwin- the genome (Nadeem et  al. 2018). Although molecu-
ner et al. 2017; Hahn and Würschum 2014; Tiwari et al. lar markers were initially expensive, there have been 
2019), introduction of exotic germplasm plays a crucial improvements such as invention of single nucleotide pol-
role in widening the genetic base from which parents can ymorphism (SNPs) DNA markers and their amenability 
be selected for use to make bi-parental crosses. to automation that have brought the costs per data point 
The tropical and subtropical soybean breeding pro- to a very competitive level compared to phenotypic data. 
grams in Africa utilizes temperate germplasm to 
T sindi et al. CABI Agriculture and Bioscience            (2023) 4:15 Page 3 of 14
Currently, SNPs are among the most widely used markers 10 genotypes from South Africa were planted in South 
(Zhu et al. 2003; Edwards et al. 2007; Nadeem et al. 2018). Africa while the other 200 were planted in Zimbabwe. An 
The SNPs are the markers of choice for molecular average of six leaf discs was sampled from a single plant 
diversity studies. SNP markers have been successfully from each of the genotypes at 3  weeks after emergence 
used for diversity studies for several crops including soy- using the LGC genomics plant sample collection kit. The 
bean (Abebe et  al. 2021; Chander et  al. 2021; Liu et  al. leaf discs were placed in 96 well plates and sealed with 
2017), cowpea (Fatokun et al. 2018; Qin et al. 2016; Sod- perforated strip caps. A desiccant sachet was placed on 
edji et  al. 2021), pigeon pea (Yang et  al. 2006; Zavinon top of the sealed tubes and a rack lid was fixed on top. 
et al. 2020) and common bean (Blair et al. 2013; Cortés The samples were placed in a sealable bag and shipped to 
et al. 2011; Nemlı et al. 2017). Assessment of the genetic LGC genomics, Germany, for genotyping using the tar-
diversity among elite lines and varieties developed by geted genotyping-by-sequencing (SeqSNP) method.
IITA using SNPs revealed high diversity within the germ-
plasm and grouped the germplasm into three clusters Rapid phenotypic screening
based on genetic relatedness (Abebe et  al. 2021). Simi- A total of 200 non-genetically modified accessions (tem-
larly, broad genetic base among tropical soybean lines perate and Tropical) were planted in Zimbabwe. The ten 
with a genetic diversity index of 0.414 using SNP mark- accessions from South Africa could not be evaluated in 
ers has been reported (Chander et  al. 2021). However, Zimbabwe because they are genetically modified (con-
previous studies cited low genetic diversity among the tain the  Roundup-ready  herbicide resistance  trait). The 
germplasm from Brazil, China, Europe and North Amer- rapid screening was conducted at the Rattray Arnold 
ica. Low genetic diversity was reported among Brazil- Research Station (RARS)  (17038′60" S  31014′24"E), near 
ian (Gwinner et al. 2017), USA and Chinese germplasm Harare. Rapid phenotypic screening for yield was done in 
(Liu et  al. 2017). Central European lines were reported an observation trial without replication in two row plots 
to be closely related to the Swiss and Canadian lines, but which were 1.5 m long and a spacing of 0.45 m inter row 
distantly related to the Chinese (Hahn and Würschum and 0.05 within row. Grain yield was recorded from the 
2014). These findings suggest the need for breeders to whole plot at maturity.
know the molecular diversity in the germplasm to guide 
breeding strategies. DNA extraction, SNP marker genotyping and data 
Improvement of soybean varieties for adaptation and pre‑processing
productivity ranks quite high on the product profile for DNA extraction was done using magnetic bead chemis-
the Southern Africa region. Early maturity in response try  (sbeadex™ mini plant kit from LGC, Biosearch Tech-
to climate change, which has rendered growing seasons nologies, Berlin, Germany) on KingFisher Flex. SNP 
short, is one of the important traits for soybean lines for marker genotyping was performed using SeqSNP, a tar-
deployment in sub-Saharan Africa (Ziervogel et al. 2014). geted genotyping by sequencing service offered by LGC, 
This requires sourcing of exotic germplasm with the which allows for genotyping of SNPs and small inser-
favorable alleles for early maturity. Temperate germplasm tions/deletions using a single primer enrichment tech-
is less sensitive to latitude, which is a major determinant nology (LGC Bioscience Technologies 2019). In order to 
of flowering and maturity time in soybean. The soybean design a SeqSNP assay, a total of 500 informative markers 
breeding programs in Africa have collected both temper- were selected from a panel of 1 082 markers in the LGC 
ate and tropical germplasm for utilization in breeding. database (https:// www.b iosea rcht ech.c om/ produc ts/p cr- 
However, the levels of molecular diversity in this col- kits- and- reage nts/ genot yping- assays/ kasp- genot yping- 
lection has not been established. The present study was chemi stry/ kasp- snp- libra ries/ soybe an- genot yping- libra 
therefore conducted to assess the population structure ry), which were designed from an original set of 1 536 
and genetic diversity of the temperate and tropical soy- SNP markers, the “Universal Soy Linkage Panel” (USLP 
bean accessions using SNP markers. 1.0) described in Hyten et  al. 2010. These SNP markers 
were selected based on the even distribution throughout 
Materials and methods each of the 20 consensus linkage groups, and for opti-
Plant material and sampling mum allele frequency in diverse germplasm. The physical 
Public (belonging to government/ national research starting and end positions of the markers for the con-
institutions) and private (from private institutions) struction of a BED file for use in sequencing were taken 
germplasm collection which comprised 210 lines from from the Soybase database (https://w ww. soybas e. org/) 
South Africa (10), Malawi (1), Zimbabwe (19), and USA with the reference genome as Williams 82.
(180) was used for the study. All the genotypes were The total number of targets that passed design was 496 
planted in plastic sleeves in a screen house in 2019. The covered by a total of 984 oligo probes, i.e. the number 
Tsindi et al. CABI Agriculture and Bioscience            (2023) 4:15 Page 4 of 14
of oligo probes per target being ~ 1.98. The total num- model (Gower 1971). The similarity matrix was then used 
ber of targets which passed the quality criteria, that is, to group the soybean genotypes using the Unweighted 
those that were successfully genotyped in at least 85% Pair Group Method using Arithmetic average (UPGMA) 
of all samples, was 485 (97.8%). NextSeq 500 sequenc- algorithm in R Statistics (Team R Core 2015) giving an 
ing was performed, with the number of pre-processed annotated phylogenetic tree (Rambaut 2016). The 30 
reads being 35 397 796 reads which is approximately 168 tropical and 180 temperate genotypes were isolated 
561 reads per sample. The percentage reads effectively and subjected to diversity analysis and a Dendogram 
used in genotyping was 83.4% and the average effective was drawn in R Statistics separately for each group of 
target SNP coverage 283x. The SNP genotyping pipeline genotypes.
and settings involved diploid genotyping with minimum Population structure analysis was performed using the 
coverage of 8 reads per sample and locus using Free Bayesian clustering approach in STRU CTUR E v2.3.4 
Bayes (Garrison & Marth 2012). A total of 437 (87.1%) (Porras-Hurtado et al. 2012). Structure analysis was run 
of the targets were polymorphic, 98.5% of all calls were using an Admixture model with 5 000 burning period 
homozygous and 1.5% heterozygous. Missing data was and 50 000 Markov-chain Monte Carlo replications. The 
reported with 1.4%. number of clusters (k) was set to range from 1 to 10 with 
Demultiplexing of all library groups was done using 3 iterations. The output from STRU CTUR E was then 
the Illumina bcl2fastq 2.17.1.14 software. One or two imported to Structure harvester (Earl and VonHoldt 
mismatches or Ns were allowed in the barcode read 2012) to visualize the delta K value which forms a distinct 
when barcode distances between all libraries on the lane peak, using the Evanno Method. Analysis of molecular 
allowed for it. Clipping of sequencing adapter remnants variance (AMOVA) was done using GenAlEx (Peakall 
was then done from all reads. Reads with final length  < 65 and Smouse 2012) to determine the variance compo-
bases were discarded. Quality trimming of adapter nents and the molecular diversity between and within 
clipped illumina reads was performed for the removal of populations. Bases were coded A = 1, C = 2, G = 3, T = 4 
reads containing Ns and trimming of reads at 3` end to and missing data 0. Clone Identification was also done 
get a minimum average Phred quality score of 30 over a in GenAlEx. The Nei’s nucleotide distance and the fixa-
window of ten bases. Reads with final length  < 65 bases tion Index ( FST) were also computed. The fixation index 
were discarded. FastQC reports for all FASTQ files were is a measure of genetic variation that can be explained 
then created. Read counts containing all read counts for by population structure and ranges from 0 (identical) to 
all samples at a glance were then generated. 1 (completely different with no common alleles shared) 
(Mohammadi and Prasanna 2003) calculated as;
Data analysis δ2
Alignment of quality trimmed reads against target sFST =
genome using Bowtie2 was done followed by variant p(1− p)
discovery and genotyping of samples with Freebayes where δ2s  is the variance in the frequency of the allele 
V1.0.2–16 (https:// github. com/ ekg/ freeb ayes# readme). between different subpopulations, weighted by the sizes 
Ploidy was set at 2 and genotypes were filtered for a mini- of the subpopulations, and p is the average frequency of 
mum coverage of 8 reads. SNP marker diversity and pro- an allele in the total population.
file were analyzed using the Powermarker and GenAlEx 
software. SNP data quality check was done by filtering, 
where SNPs with call rate greater than 90% were retained Results
and those with minor allele frequency (MAF) of  < 0.05 Phenotypic yield data
were discarded. The polymorphic information content The yield data showed that the tropical lines yielded more 
(PIC), observed heterozygosity (Ho), expected heterozy- than the temperate genotypes in Zimbabwe. The top ten 
gosity (He), allele frequency and Shannon Information performing genotypes were all tropical genotypes while 
Index (I) were computed in Powermaker (Liu and Muse all the bottom 10 were temperate genotypes (Table  1). 
2005) and GenAlEx (Peakall and Smouse 2012). The frequency of the performance data of the genotypes 
Genetic diversity analyses were conducted using the R is shown in Fig.  1. Only 15 genotypes were able to give 
software. The genotypes were subjected to Silhouette plot yield that was above 4000  kg/ha and these were mainly 
analysis in R Statistics 3.5.1 version (Team R Core 2015) of tropical origin. Out of the 49 genotypes which yielded 
to determine the probable number of clusters formed. between 3000 and 4000  kg/ha, 46 are of temperate ori-
Coefficients of similarity showing genetic distances gin. Most of the genotypes (70) were in the yield range of 
among the soybean lines (Matrix of similarities) were 2000–3000 kg/ha while no genotype gave a yield that was 
calculated in R Statistics following the Gower’s Distance below 1000 kg/ha (Fig. 1).
T sindi et al. CABI Agriculture and Bioscience            (2023) 4:15 Page 5 of 14
Table 1 Top ten and bottom ten yield data for the soybean genotypes evaluated in Zimbabwe
Rank and trial statistics Genotype name Adaptation Grain yield (kg/ha)
Top Ten performing genotypes Saga Tropical 4817.09
Safari Tropical 4761.03
Serenade Tropical 4734.82
Saxon Tropical 4730.93
Mwenezi Tropical 4501.17
Solitaire Tropical 4387.07
Spike Tropical 4375.53
S722-6-1E Tropical 4266.46
S1440-5-2E Tropical 4265.78
Squire Tropical 4243.98
Bottom Ten performing genotypes Ozark Temperate 1138.72
NC-Tinius Temperate 1119.99
Spencer Temperate 1112.50
UI.San Temperate 1107.63
HF93-035 Temperate 1086.65
HF93-083 Temperate 1075.04
Defiance Temperate 1052.57
Clifford Temperate 1042.83
LN83-2356 Temperate 1011.36
UA 4805 Temperate 1010.24
Statistics Mean 2552.00
SE mean 64.84
STD 917.00
P value  < 0.001
80 Table 2 SNP marker diversity for genotyping 210 diverse 
70 temperate and tropical soybean lines
60 Mean Min Max
50
Major allele frequency 0.76 0.00 1.00
40
66 70 Minor Allele Frequency (MAF) 0.24 0.05 0.5030
49 Expected Heterozygosity  (He) 0.31 0.00 0.94
20
Observed Heterozygosity  (Ho) 0.02 0.00 1.00
10 15 Polymorphic Information Content (PIC) 0.24 0.01 0.37
0 0 Allele number 1.88 1.00 3.00
<1000 1000-2000 2000-3000 3000-4000 >4000
Yield (kg/ha) Shannon information index 0.45 0.03 0.98
Fig. 1 Frequency distribution of 200 non-genetically modified 
soybean genotypes for grain yield
heterozygosity was 0.02. The mean polymorphic infor-
mation content (PIC) was 0.24.
SNP marker diversity and profile
After filtering, 403 SNP markers remained with minor Population structure
allele frequency  > 0.05. The SNP marker profiles are pre- The silhouette plots showed that considering two clus-
sented in Table 2. The average minor allele frequency was ters will produce one genotype with a negative silhou-
0.24. The number of alleles ranged from 1 to 3 with an ette value (Fig. 2a). When three clusters were considered, 
average of 1.88. The Shannon Information index ranged all the genotypes fitted perfectly into the three clus-
from 0.03 to 0.98 with a mean of 0.45. The mean expected ters (Fig.  1b). Having more clusters produced several 
heterozygosity (He) was 0.31, whilst the mean observed genotypes with negative values on the silhouette plots. 
Number of Genotypes
Tsindi et al. CABI Agriculture and Bioscience            (2023) 4:15 Page 6 of 14
Fig. 2 Silhouette plots showing the number of possible clusters formed from 210 genotyped soybean lines a. considering 2 clusters b. considering 
3 clusters
Therefore, three clusters were perfect in grouping all South African genotypes clustered together. Several USA 
the genotypes (Fig. 2b) thus three clusters were the best genotypes also clustered close to each other.
fit for all genotypes. In the first cluster, 205 individuals When only tropical lines were analysed three clusters 
were identified whilst cluster two and three had three were formed where all the Zimbabwean lines clustered 
and two lines, respectively. The average genetic distances together in the first cluster, while all the South Afri-
(GD) were 0.28, 0.11 and 0.13 for the Clusters 1, 2 and 3, can lines also clustered together in the second cluster 
respectively. (Fig. 4). The third cluster had Tikolore, the only line from 
According to the Gower’s genetic distances calculated Malawi. Sister lines clustered close to each other, for 
in R statistics, all  the 210  genotypes were also grouped example S1440-5-1E and S1440-5-2E, as well as LDC-5-3 
into three clusters as shown in the phylogenetic tree and LDC-5-9. Shortest genetic distance existed between 
drawn using UPGMA cluster analysis (Fig.  3). The first Stanza and Mwenezi (0.08) and Solitaire and Pan 1867 
cluster consisted of three temperate genotypes, Nitchuu with a genetic distance of 0.09. Greatest genetic distances 
47, Tara and Tousan, while the second cluster consisted were observed between Tikolore and Stanza (0.24), 
of two lines, namely Forrest and Fowler. The five geno- Tikolore and Mwenezi (0.17) and Tikolore and Serenade 
types in cluster one and two are all from USA. The third (0.12).
cluster consisted of 205 genotypes. The genotypes in this A UPMGA phylogenetic tree for temperate genotypes 
cluster consisted of all tropical genotypes from Zimba- only is shown in Fig. 5. While this tree shows three clus-
bwe, South Africa, Malawi and several temperate geno- ters for these lines, the same lines that clustered close 
types from the USA. There were genotypes which had together when all 210 lines were included (including tem-
short genetic distances (Fig.  3) between them such as perate lines), still clustered close to each other when these 
Pudou 426 and Usada Zairai (0.02); Yougestu and Tachi- temperate lines were used in the analysis. Most of the LD 
yukata (0.02), UI. San and IC. San (0.05), Saga and Santee lines clustered together just like when the temperate lines 
(0.07), Stanza and Mwenezi (0.08). Most of the lines from and tropical lines where used. Moreso, lines like Benning 
Zimbabwe are fitted in the third cluster. Three of the and Bingnan, Yougestu and Tachiyutaka and IC-San and 
T sindi et al. CABI Agriculture and Bioscience            (2023) 4:15 Page 7 of 14
Fig. 3 UPMGA phylogenetic tree showing three clusters for all the 210 soybean lines drawn using the Gower’s similarity distances
UI-San clustered close to each other with short genetic genotypes consisted of genomes made of at least two of 
distances of 0.08, 0.02 and 0.05, respectively. the subpopulations (Fig. 7).
The Evanno method was used to reveal the optimum 
k value for the genotyped soybean lines in STRU CTU Duplications
RE Harvester. The results of delta k (∆k) curve show that Clone analysis was done in GenAIEx to identify duplica-
the k peaked at 3 with a mean value of ln likelihood of tions. Table 3 shows the results. Two groups of duplicates 
-46516.5 and variance of ln likelihood of 3407.0 meaning were identified. Pudou-426 and Usada-Zairai were identi-
a total of three clusters or subpopulations contributed fied as duplicates while Tachiyukata and Yougestu were 
to the total variation in the soybean lines under study also identified as duplicates. The duplicate groups were 
(Fig. 6). labeled as A and B, respectively.
Population structure was constructed to reveal the 
architecture within the population. In agreement with Genetic diversity among soybean lines
the Evanno method, three sub populations were recog- Analysis of molecular variance (AMOVA) was performed 
nised (Fig. 7). Each of the colors (red, green and blue) in using the GenAIex for the three subpopulations identi-
the population struture represents each cluster. The lines fied in STRU CTU RE. The AMOVA showed that total 
Fowler and Forrest (188 and 180 respectively) clustered variation within the population can be partitioned into 
close to each other while these are also closely clustered among- and within population sources, accounting for 
to Tousan (102), Tara (147) and Nutchu 47 which were in 4% and 96% of the total variation, respectively (Table 4). 
another cluster according to the UPMGA. Several other The  FST value of 0.06 was low.
Tsindi et al. CABI Agriculture and Bioscience            (2023) 4:15 Page 8 of 14
Fig. 4 Dendogram showing clustering of the 30 tropical soybean lines
Table  5 shows genetic variability among and within These accessions can be utilized in soybean breeding pro-
populations and the fixation index  (FST) for the soybean grams for introgression of important traits, such as rust 
lines. The Nei’s net nucleotide distance ranged from 0.06 resistance and phenotypic maturity date if screened for 
between cluster 1 and cluster 2 to 0.12 between cluster such traits as this would reduce linkage drag effects on 
2 and cluster 3. Cluster 1 and cluster 3 had a nucleotide productivity (Abebe et al. 2021).
distance of 0.09. This means that cluster 2 and 3 were fur-
thest apart, whereas cluster 1 and 2 were closer to each SNP marker diversity and profile
other. The least within population variation was recorded The SNPs used were quite informative and desirable for 
in cluster 3 with an expected heterozygosity  (He) of 0.21, differentiating the soybean genotypes under study. The 
whilst cluster 2 had the highest within population varia- allelic number ranging from 1 to 3 can be  attributed to 
tion of 0.31. The fixation index ( FST) were 0.06 (Cluster the crop being self-pollinated, which is consistent with 
1), 0.29 (cluster 2) and 0.02 (cluster 3). Cluster 3 had the previous reports for low allelic diversity and heterozy-
lowest genetic variance proportion of 0.02 (Table 5). gosity levels for soybean (Abebe et  al. 2021; Wright 
1921). The mean minor allele frequency (MAF) value of 
Discussion 0.24, which is above 0 reflects the SNPs were informa-
Phenotypic yield data tive. The MAF values measures the ability of markers to 
The results showed that the tropical lines yielded more discriminate genotypes. With SNP markers due to their 
than the temperate lines which indicates the tropical bi-allelic nature, a value above 0 is considered informa-
lines are well adapted to the Zimbabwean environment. tive or discriminating. In the present study, 60% of the 
This is usually expected especially when lines are intro- markers had a MAF between 0.3 and 0.5 which is com-
duced from a different region with different environmen- parable to values reported on soybean in previous stud-
tal conditions in terms of rainfall, latitude, altitude and ies (Chander et al. 2021; Abebe et al. 2021). The mean 
temperatures. While the temperate genotypes yielded PIC value of 0.24 also indicates that the markers were 
less than the tropical, 46 temperate genotypes yielded rel- informative. Considering the bi-allelic nature of SNPs 
atively better above 3000 kg/ha, indicating their potential where the PIC cannot exceed 0.5 (Singh et  al. 2013), 
utility for tropical and subtropical breeding programs. the PIC values obtained in this study were desirable 
 Tsindi et al. CABI Agriculture and Bioscience            (2023) 4:15 Page 9 of 14
Fig. 5 UPMGA phylogenetic tree showing clusters of the 180 temperate soybean lines only
for differentiating the 210 soybean genotypes. Similar discriminating hence they can be recommended for 
results were reported in soybean by Abebe et al. (2021) diversity studies in other soybean populations.
who reported a mean PIC value of 0.25 among elite 
lines developed by the IITA. In other self-pollinated Population structure and genetic diversity
crops, Singh et al. (2013) reported a mean PIC value of The study was effective for determining the population 
0.23 in rice. The observed heterozygosity  (Ho) of 0.02 structure and level of diversity in the germplasm col-
was lower than the expected heterozygosity  (He) in this lection. There was consistency in the outcome from the 
study. This implies high possibilities of inbreeding and Silhouette plots, UPMGA and Evanno method in STRU 
fixation at most of the loci (Nawaz et  al. 2021). Over- CTU RE used to discriminate the 210 soybean genotypes 
all, the SNPs used in this study were informative and into clusters based on genetic similarity. The silhouette 
Tsindi et al. CABI Agriculture and Bioscience            (2023) 4:15 Page 10 of 14
region and external sources from other regions, such as 
Asia and America. An analysis of seed shipments indi-
cates that there is a lot of germplasm exchange between 
the soybean breeding programs in Southern Africa and 
the USA. This implies that the soybean lines were derived 
from shared backgrounds and were selected for the same 
market requirements leading to utilization of the same 
set of alleles. According to the literature and actual pedi-
gree analysis of this germplasm set, most soybean lines 
were developed from a narrow genetic base derived from 
a few ancestral lines. A survey of the literature indicates 
extensive utilization of external germplasm from differ-
ent countries, such as China, Japan and Korea (Abebe 
et al. 2021; Bruce et al. 2019; Jeong et al. 2019a, b; Kim 
et al. 2014). It is a standard and recommended industry 
Fig. 6 Graph showing the best k value using the Evanno method practice for breeders to continuously incorporate and 
integrate external germplasm in their breeding programs.
According to the phylogenetic tree of the 210 geno-
types and a separate analysis of the tropical lines only, 
Table 3 Duplications of the soybean lines derived from clone Zimbabwean and South African lines are clustered 
analysis together separately. These lines were bred to satisfy the 
same market requirements with common trait prefer-
Sample No Sample Pop Number of Label of 
duplications duplication ences and common allelic constitutions. Several other 
genotypes clustered close to each other in accordance 
160 Pudou-426 2 2 A with their origin, adding credence to the possibility of 
125 Usada-Zairai 2 0 A utilizing common genetic background in breeding pro-
55 Tachiyukata 2 2 B grams. Similar results of soybean genotypes that were 
7 Yougestu 2 0 B clustered in accordance with the place of origin have 
been reported (Lee et al. 2014; Liu et al. 2017). This has 
also been reported for other legume crops, such as cow-
plots grouped the genotypes into three clusters perfectly, pea (Fatokun et al. 2018; Sodedji et al. 2021) and sesame 
indicating that these were the effective number of clus- (Basak et al. 2019). In the analysis involving tropical lines 
ters which could be formed from the germplasm used in only, Tikolore was classified alone in its own cluster 
this study. The silhouette plots are generally used to visu- showing its potential for use in the tropical breeding pro-
alize how well the data points belongs to the cluster. The grams for introgression of important traits.
silhouette scores which range from -1 to 1 measure how Duplications show high level of genetic similarities 
similar an object is to its own cluster compared to other (Makore et  al. 2021) which was revealed in this study 
clusters (Menardi 2011; Pant et al. 2008; Rousseeuw 1987; which is consistent with the findings from the phyloge-
Thinsungnoen et al. 2015). This finding was confirmed by netic tree that shows low genetic distances between some 
two additional tools used in the study. lines. Seemingly, the observations of duplications and 
The Unweighted Pair Group Method using Arithme- minimal genetic distances indicates that there are intro-
tic average (UPGMA) produced a phylogenetic tree with ductions that were given different names by different 
three populations which corroborated the findings from breeders.
the silhouette plots and the Evanno method. While five The results from analysis of molecular variance 
genotypes from the USA (Nitchuu 47, Tara, Tousan, For- (AMOVA) supports the possibility of high gene flow 
rest and Fowler) were grouped in clusters one and two, as shown by the variation among populations that 
all other genotypes were grouped in the third cluster. The accounted for just 4% of the total variation, whilst within 
genotypes included in the third cluster were from differ- populations variation was about 96% of the total vari-
ent sources, from the USA, Zimbabwe, South Africa and ation. The  F  value of 0.06 indicated that there is low 
Malawi. This means that there was limited molecular STgenetic difference among populations, suggesting high 
variation among the genotypes used in this study. This gene exchange. This observation is consistent with the 
could be attributed to exchange of genetic material across literature. Wang et al. (2012) reported that most popula-
the different breeding programs in the Southern Africa tions were exhibiting the effects of genetic bottlenecks. 
T sindi et al. CABI Agriculture and Bioscience            (2023) 4:15 Page 11 of 14
Fig. 7 Population structure of the 210 soybean lines
Table 4 Analysis of molecular variance (AMOVA) for the 210 soybean lines
Source df SS MS Est. Var % FST
Among pops 2 194.021 97.010 3.451 4 0.06
Within pops 207 16574.232 80.069 80.069 96
Total 209 16768.252 80.230 83.520 100
Table 5 Allele-frequency divergence among populations (Nei’s Basak et al. (2019) also reported similar results in sesame. 
Net nucleotide distance) and within populations (expected Abebe et al. (2021) cited moderate genetic variation and 
heterozygosity) and Fixation Index  (FST) for 210 soybean lines that 11% of the total variation was attributed to among 
Population Nei’s nucleotide distance Expected F clusters and 71% was due to individual genotypes and ST
Heterozygosity an  FST value of 0.11 in soybean. Generally, low F  val-Cluster 2 Cluster 3 STues close to 0 indicate that subpopulations are similar 
1 0.06 0.09 0.30 0.06 in almost all alleles or there is little divergence within 
2 – 0.12 0.31 0.29 the population, whilst  FST value of 1 means the subpop-
3 – – 0.21 0.02 ulation is fixed at all alleles (Basak et  al. 2019; Moham-
madi and Prasanna 2003). In the current studies, the low 
 FST values has an implication in breeding in that little 
Tsindi et al. CABI Agriculture and Bioscience            (2023) 4:15 Page 12 of 14
improvement can be done through simple hybridization Competing interests
in some traits of economic importance, for example yield. The authors declare that they have no competing interests.
However, the low diversity can be utilized in conserva- Author details
tion of such important traits by crossing the related gen- 1 West Africa Centre for Crop Improvement, College of Basic and Applied Sci-
otypes. For example, crossing genotypes within cluster 3 ences, University of Ghana, PMB 30, Legon, Accra, Ghana. 
2 Seed Co Limited, 
Rattray Arnold Research Station, Chisipite, P. O. Box CH142, Harare, Zimbabwe. 
to maintain high yields in some of the genotypes while 3 University of Zimbabwe, MT Pleasant, P. O. Box MP167, Harare, Zimbabwe. 
taking advantage of some rare or minor alleles found in 4 International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), 
other genotypes. Minor alleles that can be leveraged on Matopos Research Station, P.O. Box 776, Bulawayo, Zimbabwe. 
5 International 
Institute of Tropical Agriculture (IITA), PMB 5320, Ibadan 200001, Nigeria. 
in such germplasm could be for earliness found in most 
USA genotypes. Genotypes from cluster 2 and 3 can be Received: 3 February 2023   Accepted: 22 May 2023
hybridized for improved varieties although the improve-
ment has a certain ceiling because of the low genetic vari-
ation within the whole germplasm used in this study.
References
Abebe AT, Kolawole AO, Unachukwu N, Chigeza G, Tefera H, Gedil M. Assess-
ment of diversity in tropical soybean (Glycine max (L.) Merr.) varieties 
Conclusions and recommendations and elite breeding lines using single nucleotide polymorphism markers. 
The SNP markers used were informative and displayed Plant Genet Resour Charact Util. 2021;19(1):20–8. https:// doi.o rg/ 10. 1017/ 
S1479 262121 00003 4.
high discrimination capacity, hence the results from this Bandillo N, Jarquin D, Song Q, Nelson R, Cregan P, Specht J, Lorenz A. A 
study were useful for molecular characterization of this population structure and genome-wide association analysis on the USDA 
soybean collection in Southern Africa. The 210 germ- soybean germplasm collection. Plant Genome. 2015. https:// doi. org/ 10. 
3835/ plant genome 2015.0 4.0 024.
plasm lines were consistently grouped into three clusters Bandillo NB, Anderson JE, Kantar MB, Stupar RM, Specht JE, Graef GL, Lorenz 
using three tools. Low molecular diversity was evident. AJ. Dissecting the genetic basis of local adaptation in soybean. Sci Rep. 
These findings have serious implications for the breed- 2017;7(1):1–12. https://d oi.o rg/1 0. 1038/ s41598- 017-1 7342-w.
Basak M, Uzun B, Yol E. Genetic diversity and population structure of the 
ing programs that aim to improve soybean varieties by Mediterranean sesame core collection with use of genome-wide SNPs 
utilizing this germplasm collection. Innovation strate- developed by double digest RAD-Seq. PLoS ONE. 2019;14(10):1–15. 
gies for improving variability in the germplasm collec- https://d oi. org/ 10.1 371/ journ al. pone.0 2237 57.
Bellaloui N, Bruns HA, Gillen AM, Abbas HK, Zablotowicz RM, Mengistu A, Paris 
tion, such as investments in pre-breeding, increasing the RL. Soybean seed protein, oil, fatty acids, and mineral composition as 
geographic sources of introductions and exploitation of influenced by soybean-corn rotation. Agric Sci. 2010;1(3):102–9. https:// 
mutation breeding would be recommended to enhance doi. org/1 0. 4236/ as.2 010.1 3013.
Biyeu K, Ratnaparkhe MB, Kole C. Genetics, genomics and breeding of soy-
genetic gain. bean. New Hampshire: CRC Press; 2010. p. 1–18.
Blair MW, Cortés AJ, Penmetsa RV, Farmer A, Carrasquilla-Garcia N, Cook DR. 
Acknowledgements A high-throughput SNP marker system for parental polymorphism 
The authors would like to acknowledge DAAD for funding the research and screening, and diversity analysis in common bean (Phaseolus vulgaris 
Seed Co for the provision of the experimental stations for this study. L.). Theor Appl Genet. 2013;126(2):535–48. https:// doi. org/1 0.1 007/ 
s00122- 012- 1999-z.
Author contributions Bruce RW, Torkamaneh D, Grainger C, Belzile F, Eskandari M, Rajcan I. Genome-
AT conceptualization of the research, field work, data analysis, writing of the wide genetic diversity is maintained through decades of soybean 
original draft, reviewing and editing of the final manuscript, EG data analysis, breeding in Canada. Theor Appl Genet. 2019. https://d oi. org/ 10. 1007/ 
reviewing and editing, HM reviewing and editing, JFYE supervision, review- s00122- 019- 03408-y.
ing and editing, PT supervision, reviewing and editing, EYD supervision and Chander S, Garcia-Oliveira AL, Gedil M, Shah T, Otusanya GO, Asiedu R, Chigeza 
reviewing, LM reviewing and editing, MZ selection of SNP markers, reviewing G. Genetic diversity and population structure of soybean lines adapted to 
and editing, EZ reviewing and editing, JD supervision, reviewing and editing. sub-saharan africa using single nucleotide polymorphism (Snp) markers. 
All authors read and approved the final manuscript. Agronomy. 2021. https:// doi.o rg/1 0. 3390/a grono my110 30604.
Chen Y, Nelson RL. Relationship between origin and genetic diversity in Chi-
Funding nese soybean germplasm. Crop Sci. 2005;45(4):1645–52. https:// doi. org/ 
The Research was funded by German Academic Exchange Service (DAAD) as 10.2 135/ crops ci2004.0 071.
part of the PhD funding. Core TR. RStudio: Integrated development for R. RStudio, Inc., Boston. 2015. 
http:// www. rstud io.c om/. Accessed 15 Sept 2021.
Availability of data and materials Cornelious BK, Sneller CH. Yield and molecular diversity of soybean lines 
The datasets used and/or analysed during the current study are available from derived from crosses of Northern and Southern Elite parents. Crop Sci. 
the corresponding author on reasonable request. 2002;42:642–7.
Cortés AJ, Chavarro MC, Blair MW. SNP marker diversity in common bean 
Declarations (Phaseolus vulgaris L.). Theor Appl Genet. 2011;123(5):827–45. https:// doi. 
org/1 0. 1007/ s00122-0 11-1 630-8.
Ethics approval and consent to participate Earl DA, VonHoldt BM. STRUC TUR E HARVESTER: a website and program for 
Not applicable. visualizing STRU CTU RE output and implementing the Evanno method. 
Conserv Genet Resour. 2012;4(2):359–61. https:// doi. org/1 0. 1007/ 
Consent for publication s12686- 011- 9548-7.
Not applicable. Edwards D, Forster JW, Chagné D, Batley J. What are SNPs? Assoc Mapp Plants. 
2007. https:// doi. org/ 10.1 007/ 978-0- 387- 36011-9_3.
T sindi et al. CABI Agriculture and Bioscience            (2023) 4:15 Page 13 of 14
FAOSTAT. Food and agriculture data. 2021. http:// www.f ao. org/ faost at/ en/# Makore F, Gasura E, Souta C, Mazarura U, Derera J, Zikhali M, Kamutando CN, 
data/ QC. Accessed 21 May 2022. Magorokosho C, Dari S. Molecular characterization of a farmer-preferred 
Fatokun C, Girma G, Abberton M, Gedil M, Unachukwu N, Oyatomi O, Yusuf maize landrace population from a multiple-stress-prone subtropical 
M, Rabbi I, Boukar O. Genetic diversity and population structure of a lowland environment. Biodiversitas. 2021;22(2):769–77. https:// doi.o rg/ 
mini-core subset from the world cowpea (Vigna unguiculata (L.) Walp.) 10.1 3057/ biodiv/ d2202 30.
germplasm collection. Sci Rep. 2018;8(1):1–10. https:// doi. org/ 10. 1038/ Malik MFA, Ashraf M, Qureshi AS, Khan MR. Investigation and comparison of 
s41598- 018- 34555-9. some morphological traits of the soybean populations using cluster 
Garrison E, Marth G. Haplotype-based variant detection from short-read analysis. Pak J Bot. 2011;43(2):1249–55.
sequencing. 2012. http://a rxiv. org/ abs/ 1207. 3907. Accessed 10 Apr 2019. Menardi G. Density-based Silhouette diagnostics for clustering methods. Stat 
Gower JC. A general coefficient of similarity and some of its properties. Biom- Comput. 2011;21(3):295–308. https://d oi. org/1 0.1 007/s 11222-0 10- 9169-0.
etrics. 1971;27(4):857–74. Mohammadi SA, Prasanna BM. Analysis of genetic diversity in crop plants—
Grieshop CM, Fahey GC Jr. Comparison of quality characteristics of soy- salient statistical tools and considerations. Crop Sci. 2003;43(4):1235–48. 
beans from Brazil, China, and the United States. J Agric Food Chem. https:// doi. org/1 0. 2135/ crops ci2003. 1235.
2001;49:2669–73. https:// doi.o rg/ 10.1 021/ jf0014 009. Nadeem MA, Nawaz MA, Shahid MQ, Doğan Y, Comertpay G, Yıldız M, 
Gwinner R, Alemu Setotaw T, Pasqual M, Dos Santos JB, Zuffo AM, Zambiazzi Hatipoğlu R, Ahmad F, Alsaleh A, Labhane N, Özkan H, Chung G, Baloch 
EV, Bruzi AT. Genetic diversity in Brazilian soybean germplasm. Crop Breed FS. DNA molecular markers in plant breeding: current status and recent 
Appl Biotechnol. 2017;17(4):373–81. https:// doi. org/ 10. 1590/1 984- 70332 advancements in genomic selection and genome editing. Biotechnol 
017v1 7n4a56. Biotechnol Equip. 2018;32(2):261–85. https:// doi.o rg/1 0. 1080/ 131028 18. 
Hahn V, Würschum T. Molecular genetic characterization of Central European 2017.1 4004 01.
soybean breeding germplasm. Plant Breed. 2014;133(6):748–55. https:// Nawaz MA, Lin X, Chan TF, Lam HM, Baloch FS, Ali MA, Golokhvast KS, Yang 
doi. org/1 0. 1111/ pbr. 12212. SH, Chung G. Genetic architecture of wild soybean (Glycine soja Sieb. 
Hyten DL, Choi IY, Song Q, Shoemaker RC, Nelson RL, Costa JM, Specht JE, and Zucc.) populations originating from different East Asian regions. 
Cregan PB. Highly variable patterns of linkage disequilibrium in multiple Genet Resour Crop Evol. 2021;68(4):1577–88. https:// doi. org/1 0. 1007/ 
soybean populations. Genetics. 2007;175(4):1937–44. https:// doi.o rg/1 0. s10722- 020- 01087-z.
1534/g enet ics. 106. 069740. Nemlı S, Kaygisiz Aşçioğul T, Ateş D, Eşıyok D, Tanyolaç MB. Diversity and 
Hyten Dl, Choi I, Song Q, Specht JE, Carter TE, Shoemaker RC, Hwang EY, Matu- genetic analysis through DArTseq in common bean (Phaseolus vulgaris 
kumalli LK, Cregan PB. A high density integrated genetic linkage map of L.) germplasm from Turkey. Turkish J Agric For. 2017;41(5):389–404. 
soybean and the development of a 1536 universal soy linkage panel for https:// doi. org/ 10.3 906/t ar-1 707-8 9.
quantitative trait locus mapping. Crop Sci. 2010;50:960–8. Ojo DK, Ajayi AO, Oduwaye OA. Genetic relationships among soybean acces-
Jeong N, Kim KS, Jeong S, Kim JY, Park SK, Lee JS, Jeong SC, Kang ST, Ha BK, sions based on morphological and RAPDs techniques. J Trop Agric Sci. 
Kim DY, Kim N, Moon JK, Choi MS. Korean soybean core collection: geno- 2012;35(2):237–48.
typic and phenotypic diversity population structure and genome-wide Oliveira MF, Nelson RL, Geraldi IO, Cruz CD, de Toledo JFF. Establishing a soy-
association study. PLoS ONE. 2019a;14(10):1–16. https:// doi.o rg/ 10.1 371/ bean germplasm core collection. Field Crop Res. 2010;119(2–3):277–89. 
journ al.p one. 022407 4. https:// doi.o rg/1 0.1 016/j. fcr. 2010.0 7.0 21.
Jeong SC, Moon JK, Park SK, Kim MS, Lee K, Lee SR, Jeong N, Choi MS, Kim N, Orf J. Introduction. In: Biyeu K, Ratnaparkhe MB, Kole C, editors. Genetics, 
Kang ST, Park E. Genetic diversity patterns and domestication origin of gonomics and breeding of soybean. New Hampshire: CRC Press; 2010. 
soybean. Theor Appl Genet. 2019b;132(4):1179–93. https://d oi. org/1 0. p. 1–18.
1007/s 00122-0 18- 3271-7. Pant M, Radha T, Singh VP. Particle swarm optimization using Gaussian inertia 
Kim KH, Lee S, Seo MJ, Lee GA, Ma KH, Jeong SC, Lee SH, Park EH, Kwon YU, weight. Proceedings—international conference on computational 
Moon JK. Genetic diversity and population structure of wild soybean intelligence and multimedia applications, ICCIMA 2007, 2008; 1, 97–102. 
(Glycine soja Sieb. and Zucc.) accessions in Korea. Plant Genet Resour https://d oi.o rg/ 10. 1109/ ICCIMA. 2007. 328.
Charact Util. 2014;12:48–51. https:// doi. org/1 0.1 017/S 1479 262114 0002 39. Peakall R, Smouse PE. GenAlEx 6.5: genetic analysis in Excel. Population 
Lee G-A, Choi Y-M, Yi J-Y, Chung J-W, Lee M-C, Ma K-H, Lee S, Cho J, Lee J-R. genetic software for teaching and research—an update. Bioinformatics. 
Genetic diversity and population structure of korean soybean collection 2012;28:2537–9.
Using 75 microsatellite markers. Korean J Crop Sci. 2014;59(4):492–7. Porras-Hurtado L, Ruiz Y, Santos C, Phillips C, Carracedo Á, Lareu MV. An over-
https:// doi.o rg/ 10. 7740/ kjcs. 2014. 59.4.4 92. view of STRU CTUR E: Applications, parameter settings, and supporting 
LGC Bioscience Technologies. SeqSNP targeted GBS as alternative for array software. Front Genet. 2012;4(MAY):1–13. https://d oi.o rg/1 0. 3389/f gene. 
genotyping in routine breeding programs. 2019. https:// biose arch-c dn. 2013.0 0098.
azure edge.n et/ asset sv6/s eqsnp-t gbs-a ltern ative- genoty ping-r outi ne- Qin J, Shi A, Xiong H, Mou B, Motes D, Lu W, Miller JC, Scheuring DC, Nzaramba 
breed ing-p rogr ams.p df. Accessed 12 Feb 2020. MN, Weng Y, Yang W. Population structure analysis and association map-
Li Y, Zhao S-C, Ma J-X, Li D, Yan L, Li J, Qi X, Guo X, Zhang L, He W, Chang R, ping of seed antioxidant content in USDA cowpea (Vigna unguiculata 
Liang Q, Guo Y, Ye C, Wang X, Tao Y, Guan R, Wang J, Liu Y, Jin L, Zhang L. Walp.) core collection using SNPs. Can J Plant Sci. 2016;96(6):1026–36. 
X, Liu Z, Zhang L, Chen J, Wang K, Nielsen R, Li R, Chen P, Li W, Reif J, https://d oi.o rg/1 0.1 139/ cjps- 2016-0 090.
Purugganan M, Wang J, Zhang M, Wang J, Qiu L-J. Molecular footprints Rafalski A. Applications of single nucleotide polymorphisms in crop genetics. 
of domestication and improvement in soybean revealed by whole Curr Opin Plant Biol. 2002;5(2):94–100. https:// doi.o rg/ 10. 1016/ S1369- 
genome re-sequencing. BMC Genomics. 2013. https://d oi.o rg/ 10. 1186/ 5266(02) 00240-6.
1471- 2164- 14- 579. Rambaut A. FigTree: molecular evolution, phylogenetics and epidemiology. 
Li YH, Reif JC, Jackson SA, Ma YS, Chang RZ, Qiu LJ. Detecting SNPs underlying 2016. http:// tree. bio.e d. ac.u k/ softwa re/ figtre e/. Accessed 15 Sept 2021.
domestication-related traits in soybean. BMC Plant Biol. 2014;14(1):1–8. Rousseeuw PJ. Silhouettes: a graphical aid to the interpretation and validation 
https://d oi.o rg/ 10.1 186/ s12870-0 14-0 251-1. of cluster analysis. J Comput Appl Math. 1987;20:53–65. https://d oi.o rg/ 
Liu K, Muse SV. PowerMarker: an integrated analysis environment for genetic 10.1 016/0 377- 0427(87) 90125-7.
marker analysis. Bioinformatics. 2005;21:2128–9. https:// doi. org/ 10. 1093/ Singh N, Choudhury DR, Singh AK, Kumar S, Srinivasan K, Tyagi RK, Singh NK, 
bioin forma tics/ bti282. Singh R. Comparison of SSR and SNP markers in estimation of genetic 
Liu Z, Li H, Wen Z, Fan X, Li Y, Guan R, Guo Y, Wang S, Wang D, Qiu L. Com- diversity and population structure of Indian rice varieties. PLoS ONE. 
parison of genetic diversity between Chinese and American soybean 2013;8(12):1–14. https:// doi.o rg/ 10. 1371/ journ al.p one.0 08413 6.
(Glycine max (L.)) accessions revealed by high-density SNPs. Front Plant Sodedji FAK, Agbahoungba S, Agoyi EE, Kafoutchoni MK, Choi J, Nguetta 
Sci. 2017. https:// doi. org/1 0. 3389/ fpls. 2017. 02014. SPA, Assogbadjo AE, Kim HY. Diversity, population structure, and linkage 
Ma YS, Wang WH, Wang LX, Ma FM, Wang PW, Chang RZ, Qiu LJ. Genetic diver- disequilibrium among cowpea accessions. Plant Genome. 2021. https:// 
sity of soybean and the establishment of a core collection focused on doi.o rg/ 10.1 002/t pg2. 20113.
resistance to soybean cyst nematode. J Integr Plant Biol. 2006;48(6):722– Thinsungnoen T, Kaoungku N, Durongdumronchai P, Kerdprasop K, Kerd-
31. https:// doi.o rg/1 0. 1111/j.1 744-7 909. 2006. 00256.x. prasop N. The Clustering Validity with Silhouette and Sum of Squared 
Tsindi et al. CABI Agriculture and Bioscience            (2023) 4:15 Page 14 of 14
Errors. In proceedings of the 3rd international conferance on industrial 
application engineering. Japan: The Institute of Industrial applications 
Engeineers. 2015; 44–51. https:// doi. org/ 10.1 2792/ iciae 2015. 012
Tiwari S, Tripathi N, Tsuji K, Tantwai K. Genetic diversity and population struc-
ture of Indian soybean (Glycine max (L.) Merr.) as revealed by microsatel-
lite markers. Physiol Mol Biol Plants. 2019;25(4):953–64. https:// doi. org/ 10. 
1007/ s12298-0 19- 00682-4.
Valliyodan B, Brown AV, Wang J, Patil G, Liu Y, Otyama PI, Nelson RT, Vuong T, 
Song Q, Musket TA, Wagner R, Marri P, Reddy S, Sessions A, Wu X, Grant 
D, Bayer PE, Roorkiwal M, Varshney RK, Liu X, Edwards D, Xu D, Joshi T, 
Cannon SB, Nguyen HT. Genetic variation among 481 diverse soybean 
accessions, inferred from genomic re-sequencing. Sci Data. 2021;8(1):1–9. 
https:// doi. org/ 10. 1038/ s41597-0 21- 00834-w.
Wang Y, Guo J, Liu Y, Wang Y, Chen J, Li Y, Huang H, Qiu L. Population structure 
of the wild soybean (Glycine soja) in China: Implications from microsatel-
lite analyses. Ann Bot. 2012;110(4):777–85. https:// doi. org/ 10.1 093/ aob/ 
mcs142.
Wright S. Systems of mating. II. The effects of inbreeding on the genetic com-
position of a population. Genetics. 1921;6:124–43.
Yang S, Pang W, Ash G, Harper J, Carling J, Wenzl P, Huttner E, Zong X, Kilian 
A. Low level of genetic diversity in cultivated Pigeonpea compared to 
its wild relatives is revealed by diversity arrays technology. Theor Appl 
Genet. 2006;113(4):585–95. https:// doi. org/ 10.1 007/ s00122- 006- 0317-z.
Zavinon F, Adoukonou-Sagbadja H, Keilwagen J, Lehnert H, Ordon F, Perovic 
D. Genetic diversity and population structure in Beninese pigeon pea 
[Cajanus cajan (L.) Huth] landraces collection revealed by SSR and 
genome wide SNP markers. Genet Resour Crop Evol. 2020;67(1):191–208. 
https:// doi. org/ 10. 1007/ s10722-0 19- 00864-9.
Zhu YL, Song QJ, Hyten DL, Van Tassell CP, Matukumalli LK, Grimm DR, Hyatt 
SM, Fickus EW, Young ND, Cregan PB. Single-nucleotide polymorphisms 
in soybean. Genetics. 2003;163(3):1123–34. https://d oi. org/ 10. 1093/ genet 
ics/ 163.3. 1123.
Ziervogel G, New M, van Garderen EA, Midgley G, Taylor A, Hamann R, Stuart-
Hill S, Myers J, Warburton M. Climate change impacts and adaptation in 
South Africa. Wiley Interdiscip Rev Clim Ch. 2014. https:// doi.o rg/ 10. 1002/ 
wcc. 295.
Žulj Mihaljević M, Šarčević H, Lovrić A, Andrijanić Z, Sudarić A, Jukić G, Pejić 
I. Genetic diversity of European commercial soybean [Glycine max (L.) 
Merr.] germplasm revealed by SSR markers. Genet Resour Crop Evol. 
2020;67(6):1587–600. https:// doi.o rg/1 0. 1007/ s10722- 020- 00934-3.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in pub-
lished maps and institutional affiliations.
Ready to submit your research ?  Choose BMC and benefit from: 
• fast, convenient online submission
•  thorough peer review by experienced rese archers in your field
•   rapid publication on acceptance
•  support for research data, including large and complex data types
•  gold Open Access which fosters wider collaboration and increased citations 
•  maximum visibility for your research: over 100M website views per year 
 
At   BMC, research is always in progress.
Learn more biomedcentral.com/submissions