The innovative RTM-GWAS procedure provides a relatively thorough detection of QTL and their multiple alleles for germplasm population characterization, gene network identification, and genomic selection strategy innovation in plant breeding. The previous genome-wide association studies (GWAS) have been concentrated on finding a handful of major quantitative trait loci (QTL), but plant breeders are interested in revealing the whole-genome QTL-allele constitution in breeding materials/germplasm (in which tremendous historical allelic variation has been accumulated) for genome-wide improvement. To match this requirement, two innovations were suggested for GWAS: first grouping tightly linked sequential SNPs into linkage disequilibrium blocks (SNPLDBs) to form markers with multi-allelic haplotypes, and second utilizing two-stage association analysis for QTL identification, where the markers were preselected by single-locus model followed by multi-locus multi-allele model stepwise regression. Our proposed GWAS procedure is characterized as a novel restricted two-stage multi-locus multi-allele GWAS (RTM-GWAS, https://github.com/njau-sri/rtm-gwas ). The Chinese soybean germplasm population (CSGP) composed of 1024 accessions with 36,952 SNPLDBs (generated from 145,558 SNPs, with reduced linkage disequilibrium decay distance) was used to demonstrate the power and efficiency of RTM-GWAS. Using the CSGP marker information, simulation studies demonstrated that RTM-GWAS achieved the highest QTL detection power and efficiency compared with the previous procedures, especially under large sample size and high trait heritability conditions. A relatively thorough detection of QTL with their multiple alleles was achieved by RTM-GWAS compared with the linear mixed model method on 100-seed weight in CSGP. A QTL-allele matrix (402 alleles of 139 QTL × 1024 accessions) was established as a compact form of the population genetic constitution. The 100-seed weight QTL-allele matrix was used for genetic characterization, candidate gene prediction, and genomic selection for optimal crosses in the germplasm population.
Summary Soybean (Glycine max) is a major contributor to the world oilseed production. Its seed oil content has been increased through soybean domestication and improvement. However, the genes underlying the selection are largely unknown. The present contribution analyzed the expression patterns of genes in the seed oil quantitative trait loci with strong selective sweep signals, then used association, functional study and population genetics to reveal a sucrose efflux transporter gene, GmSWEET39, controlling soybean seed oil content and under selection. GmSWEET39 is highly expressed in soybean seeds and encodes a plasma membrane‐localized protein. Its expression level is positively correlated with soybean seed oil content. The variation in its promoter and coding sequence leads to different natural alleles of this gene. The GmSWEET39 allelic effects on total oil content were confirmed in the seeds of soybean recombinant inbred lines, transgenic Arabidopsis, and transgenic soybean hairy roots. The frequencies of its superior alleles increased from wild soybean to cultivated soybean, and are much higher in released soybean cultivars. The findings herein suggest that the sequence variation in GmSWEET39 affects its relative expression and oil content in soybean seeds, and GmSWEET39 has been selected to increase seed oil content during soybean domestication and improvement.
Utilizing an innovative GWAS in CSLRP, 44 QTL 199 alleles with 72.2 % contribution to SIFC variation were detected and organized into a QTL-allele matrix for cross design and gene annotation. The seed isoflavone content (SIFC) of soybeans is of great importance to health care. The Chinese soybean landrace population (CSLRP) as a genetic reservoir was studied for its whole-genome quantitative trait loci (QTL) system of the SIFC using an innovative restricted two-stage multi-locus genome-wide association study procedure (RTM-GWAS). A sample of 366 landraces was tested under four environments and sequenced using RAD-seq (restriction-site-associated DNA sequencing) technique to obtain 116,769 single nucleotide polymorphisms (SNPs) then organized into 29,119 SNP linkage disequilibrium blocks (SNPLDBs) for GWAS. The detected 44 QTL 199 alleles on 16 chromosomes (explaining 72.2 % of the total phenotypic variation) with the allele effects (92 positive and 107 negative) of the CSLRP were organized into a QTL-allele matrix showing the SIFC population genetic structure. Additional differentiation among eco-regions due to the SIFC in addition to that of genome-wide markers was found. All accessions comprised both positive and negative alleles, implying a great potential for recombination within the population. The optimal crosses were predicted from the matrices, showing transgressive potentials in the CSLRP. From the detected QTL system, 55 candidate genes related to 11 biological processes were χ (2)-tested as an SIFC candidate gene system. The present study explored the genome-wide SIFC QTL/gene system with the innovative RTM-GWAS and found the potentials of the QTL-allele matrix in optimal cross design and population genetic and genomic studies, which may have provided a solution to match the breeding by design strategy at both QTL and gene levels in breeding programs.
A representative sample comprising 366 accessions from the Chinese soybean landrace population (CSLRP) was tested under four growth environments for determination of the whole-genome quantitative trait loci (QTLs) system of the 100-seed weight trait (ranging from 4.59g to 40.35g) through genome-wide association study (GWAS). A total of 116 769 single nucleotide polymorphisms (SNPs) were identified and organized into 29 121 SNP linkage disequilibrium blocks (SNPLDBs) to fit the property of multiple alleles/haplotypes per locus in germplasm. An innovative two-stage GWAS was conducted using a single locus model for shrinking the marker number followed by a multiple loci model utilizing a stepwise regression for the whole-genome QTL identification. In total, 98.45% of the phenotypic variance (PV) was accounted for by four large-contribution major QTLs (36.33%), 51 small-contribution major QTLs (43.24%), and a number of unmapped minor QTLs (18.88%), with the QTL×environment variance representing only 1.01% of the PV. The allele numbers of each QTL ranged from two to 10. A total of 263 alleles along with the respective allele effects were estimated and organized into a 263×366 matrix, giving the compact genetic constitution of the CSLRP. Differentiations among the ecoregion matrices were found. No landrace had alleles which were all positive or all negative, indicating a hidden potential for recombination. The optimal crosses within and among ecoregions were predicted, and showed great transgressive potential. From the QTL system, 39 candidate genes were annotated, of which 26 were involved with the gene ontology categories of biological process, cellular component, and molecular function, indicating that diverse genes are involved in directing the 100-seed weight.
Soybean is one of the most important oil crops in the world. Revealing the molecular basis and exploring key candidate genes for seed oil synthesis has great significance for soybean improvement. In this study, we found that oil accumulation rates and gene expression levels changed dynamically during soybean seed development. The expression levels of genes in metabolic pathways such as carbon fixation, photosynthesis, glycolysis, and fatty acid biosynthesis were significantly up-regulated during the rapid accumulation of oil in developing soybean seeds. Through weighted correlation network analysis, we identified six co-expression modules associated with soybean seed oil content and the pink module was the most positively correlated (r = 0.83, p = 7 × 10−4) network. Through the integration of differential expression and co-expression analysis, we predicted 124 candidate genes potentially affecting soybean seed oil content, including seven genes in lipid metabolism pathway, two genes involved in glycolysis, one gene in sucrose metabolism, and 12 genes belonged to transcription factors as well as other categories. Among these, three genes (GmABI3b, GmNFYA and GmFAD2-1B) have been shown to control oil and fatty acid content in soybean seeds, and other newly identified candidate genes would broaden our knowledge to understand the molecular basis for oil accumulation in soybean seeds.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.