Recent substantial advances in high-throughput field phenotyping have provided plant breeders with affordable and efficient tools for evaluating a large number of genotypes for important agronomic traits at early growth stages. Nevertheless, the implementation of large datasets generated by high-throughput phenotyping tools such as hyperspectral reflectance in cultivar development programs is still challenging due to the essential need for intensive knowledge in computational and statistical analyses. In this study, the robustness of three common machine learning (ML) algorithms, multilayer perceptron (MLP), support vector machine (SVM), and random forest (RF), were evaluated for predicting soybean (Glycine max) seed yield using hyperspectral reflectance. For this aim, the hyperspectral reflectance data for the whole spectra ranged from 395 to 1005 nm, which were collected at the R4 and R5 growth stages on 250 soybean genotypes grown in four environments. The recursive feature elimination (RFE) approach was performed to reduce the dimensionality of the hyperspectral reflectance data and select variables with the largest importance values. The results indicated that R5 is more informative stage for measuring hyperspectral reflectance to predict seed yields. The 395 nm reflectance band was also identified as the high ranked band in predicting the soybean seed yield. By considering either full or selected variables as the input variables, the ML algorithms were evaluated individually and combined-version using the ensemble–stacking (E–S) method to predict the soybean yield. The RF algorithm had the highest performance with a value of 84% yield classification accuracy among all the individual tested algorithms. Therefore, by selecting RF as the metaClassifier for E–S method, the prediction accuracy increased to 0.93, using all variables, and 0.87, using selected variables showing the success of using E–S as one of the ensemble techniques. This study demonstrated that soybean breeders could implement E–S algorithm using either the full or selected spectra reflectance to select the high-yielding soybean genotypes, among a large number of genotypes, at early growth stages.
Soybean [Glycine max (L.) Merrill] seed oil is the primary global source of edible oil and a major renewable and sustainable feedstock for biodiesel production. Therefore, increasing the relative oil concentration in soybean is desirable; however, that goal is complex due to the quantitative nature of the oil concentration trait and possible effects on major agronomic traits such as seed yield or protein concentration. The objectives of the present study were to study the relationship between seed oil concentration and important agronomic and seed quality traits, including seed yield, 100-seed weight, protein concentration, plant height, and days to maturity, and to identify oil quantitative trait loci (QTL) that are co-localized with the traits evaluated. A population of 203 F4:6 recombinant inbred lines, derived from a cross between moderately high oil soybean genotypes OAC Wallace and OAC Glencoe, was developed and grown across multiple environments in Ontario, Canada, in 2009 and 2010. Among the 11 QTL associated with seed oil concentration in the population, which were detected using either single-factor ANOVA or multiple QTL mapping methods, the number of QTL that were co-localized with other important traits QTL were six for protein concentration, four for seed yield, two for 100-seed weight, one for days to maturity, and one for plant height. The oil-beneficial allele of the QTL tagged by marker Sat_020 was positively associated with seed protein concentration. The oil favorable alleles of markers Satt001 and GmDGAT2B were positively correlated with seed yield. In addition, significant two-way epistatic interactions, where one of the interacting markers was solely associated with seed oil concentration, were identified for the selected traits in this study. The number of significant epistatic interactions was seven for yield, four for days to maturity, two for 100-seed weight, one for protein concentration, and one for plant height. The identified molecular markers associated with oil-related QTL in this study, which also have positive effects on other important traits such as seed yield and protein concentration, could be used in the soybean marker breeding programs aimed at developing either higher seed yield and oil concentration or higher seed protein and oil concentration per hectare. Alternatively, selecting complementary parents with greater breeding values due to positive epistatic interactions could lead to the development of higher oil soybean cultivars.
Soybean seed is a major source of oil for human consumption worldwide and the main renewable feedstock for biodiesel production in North America. Increasing seed oil concentration in soybean [Glycine max (L.) Merrill] with no or minimal impact on protein concentration could be accelerated by exploiting quantitative trait loci (QTL) or gene-specific markers. Oil concentration in soybean is a polygenic trait regulated by many genes with mostly small effects and which is negatively associated with protein concentration. The objectives of this study were to discover and validate oil QTL in two recombinant inbred line (RIL) populations derived from crosses between three moderately high-oil soybean cultivars, OAC Wallace, OAC Glencoe, and RCAT Angora. The RIL populations were grown across several environments over 2 years in Ontario, Canada. In a population of 203 F(3:6) RILs from a cross of OAC Wallace and OAC Glencoe, a total of 11 genomic regions on nine different chromosomes were identified as associated with oil concentration using multiple QTL mapping and single-factor ANOVA. The percentage of the phenotypic variation accounted for by each QTL ranged from 4 to 11 %. Of the five QTL that were tested in a population of 211 F(3:5) RILs from the cross RCAT Angora × OAC Wallace, a "trait-based" bidirectional selective genotyping analysis validated four QTL (80 %). In addition, a total of seven two-way epistatic interactions were identified for oil concentration in this study. The QTL and epistatic interactions identified in this study could be used in marker-assisted introgression aimed at pyramiding high-oil alleles in soybean cultivars to increase oil concentration for biodiesel as well as edible oil applications.
Recent advanced high-throughput field phenotyping combined with sophisticated big data analysis methods have provided plant breeders with unprecedented tools for a better prediction of important agronomic traits, such as yield and fresh biomass (FBIO), at early growth stages. This study aimed to demonstrate the potential use of 35 selected hyperspectral vegetation indices (HVI), collected at the R5 growth stage, for predicting soybean seed yield and FBIO. Two artificial intelligence algorithms, ensemble-bagging (EB) and deep neural network (DNN), were used to predict soybean seed yield and FBIO using HVI. Considering HVI as input variables, the coefficients of determination (R2) of 0.76 and 0.77 for yield and 0.91 and 0.89 for FBIO were obtained using DNN and EB, respectively. In this study, we also used hybrid DNN-SPEA2 to estimate the optimum HVI values in soybeans with maximized yield and FBIO productions. In addition, to identify the most informative HVI in predicting yield and FBIO, the feature recursive elimination wrapper method was used and the top ranking HVI were determined to be associated with red, 670 nm and near-infrared, 800 nm, regions. Overall, this study introduced hybrid DNN-SPEA2 as a robust mathematical tool for optimizing and using informative HVI for estimating soybean seed yield and FBIO at early growth stages, which can be employed by soybean breeders for discriminating superior genotypes in large breeding populations.
Improving genetic yield potential in major food grade crops such as soybean (Glycine max L.) is the most sustainable way to address the growing global food demand and its security concerns. Yield is a complex trait and reliant on various related variables called yield components. In this study, the five most important yield component traits in soybean were measured using a panel of 250 genotypes grown in four environments. These traits were the number of nodes per plant (NP), number of non-reproductive nodes per plant (NRNP), number of reproductive nodes per plant (RNP), number of pods per plant (PP), and the ratio of number of pods to number of nodes per plant (P/N). These data were used for predicting the total soybean seed yield using the Multilayer Perceptron (MLP), Radial Basis Function (RBF), and Random Forest (RF), machine learning (ML) algorithms, individually and collectively through an ensemble method based on bagging strategy (E-B). The RBF algorithm with highest Coefficient of Determination (R2) value of 0.81 and the lowest Mean Absolute Errors (MAE) and Root Mean Square Error (RMSE) values of 148.61 kg.ha-1, and 185.31 kg.ha-1, respectively, was the most accurate algorithm and, therefore, selected as the metaClassifier for the E-B algorithm. Using the E-B algorithm, we were able to increase the prediction accuracy by improving the values of R2, MAE, and RMSE by 0.1, 0.24 kg.ha-1, and 0.96 kg.ha-1, respectively. Furthermore, for the first time in this study, we allied the E-B with the genetic algorithm (GA) to model the optimum values of yield components in an ideotype genotype in which the yield is maximized. The results revealed a better understanding of the relationships between soybean yield and its components, which can be used for selecting parental lines and designing promising crosses for developing cultivars with improved genetic yield potential.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.