A Benchmarking Between Deep Learning, Support Vector Machine and Bayesian Threshold Best Linear Unbiased Prediction for Predicting Ordinal Traits in Plant Breeding

Montesinos‐López, Osval A.; Martín-Vallejo, Javier; Crossa, José; Gianola, Daniel; Hernandez-Suarez, Carlos; Montesinos‐López, Abelardo; Juliana, Philomin; Singh, Ravi P.

doi:10.1534/g3.118.200998

Cited by 115 publications

(121 citation statements)

References 42 publications

Supporting

Mentioning

115

Contrasting

Order By: Relevance

“…DL is relatively straightforward to implement (https://keras.io/whyuse-keras/), but optimum performance depends on an adequate hyperparameter choice, which is not trivial and requires considerable computational resources (Young et al, 2015;Chan et al, 2018). Although previous, limited evidence does not show a consistent advantage of DL over penalized linear methods for genomic prediction (GP) purposes (González-Recio et al, 2014;Ma et al, 2017;Bellot et al, 2018;Montesinos-López et al, 2018a;Montesinos-López et al, 2018b;Montesinos-López et al, 2019a), more efforts are needed to fully understand the behavior and potential constraints and capabilities of DL in GP scenarios.…”

Section: Introductionmentioning

confidence: 99%

Exploring Deep Learning for Complex Trait Genomic Prediction in Polyploid Outcrossing Species

et al. 2020

View full text Add to dashboard Cite

Section: Introductionmentioning

confidence: 99%

Exploring Deep Learning for Complex Trait Genomic Prediction in Polyploid Outcrossing Species

et al. 2020

View full text Add to dashboard Cite

“…It is risky to make sweeping statements arguing in favor of a specific treatment of data as outcomes are heavily dependent on the biological architecture of the traits considered, and on the data structure as well. The picture emerging from two decades of experience in genome-enabled prediction in the fields of animal and plant breeding is that is largely futile to categorize methods in terms of expected predictive performance using broad criteria, in view of the large variability of performance with respect to data structure for any given prediction machine (Morota and Gianola 2014; Gianola and Rosa 2015; Momen et al 2018; Montesinos-López et al 2019 a,b,c,d; Azodi et al 2019).…”

Section: Resultsmentioning

confidence: 99%

“…However, it was found that MBL was better than MT Bayesian BLUP for the two pine tree traits. After almost two decades of genome-enabled prediction it is now clear that no universally best prediction machine exists (Gianola et al 2011; Heslot 2012; de los Campos et al 2013; Momen et al 2018; Bellot et al 2018; Montesinos-López et al 2018a, b, c, d) even when non-parametric or deep learning techniques are brought into the comparisons.…”

Section: Resultsmentioning

confidence: 99%

A multiple-trait Bayesian Lasso for genome-enabled analysis and prediction of complex traits

Gianola

Fernando

2019

Preprint

Self Cite

View full text Add to dashboard Cite

4 1 Abstract 5 A multiple-trait Bayesian LASSO (MBL) for genome-based analysis and prediction of quanti-6 tative traits is presented and applied to two real data sets. The data-generating model is a 7 multivariate linear Bayesian regression on possibly a huge number of molecular markers, and 8 with a Gaussian residual distribution posed. Each (one per marker) of the T 1 vectors of 9 regression coe¢ cients (T : number of traits) is assigned the same T variate Laplace prior dis-10 tribution, with a null mean vector and unknown scale matrix : The multivariate prior reduces 11 to that of the standard univariate Bayesian LASSO when T = 1: The covariance matrix of the 12 residual distribution is assigned a multivariate Je¤reys prior and is given an inverse-Wishart 13 prior. The unknown quantities in the model are learned using a Markov chain Monte Carlo sam-14 pling scheme constructed using a scale-mixture of normal distributions representation. MBL is 15 demonstrated in a bivariate context employing two publicly available data sets using a bivariate 16 genomic best linear unbiased prediction model (GBLUP) for benchmarking results. The …rst 17data set is one where wheat grain yields in two di¤erent environments are treated as distinct 18 traits. The second data set comes from genotyped Pinus trees with each individual was mea-19 sured for two traits, rust bin and gall volume. In MBL, the bivariate marker e¤ects are shrunk 20 1 di¤erentially, i.e., "short" vectors are more strongly shrunk towards the origin than in GBLUP; 21 conversely, "long" vectors are shrunk less. A predictive comparison was carried out as well where, 22 in wheat, the comparators of MBL where bivariate GBLUP and bivariate Bayes C ; a variable 23 selection procedure. A training-testing layout was used, with 100 random reconstructions of 24 training and testing sets. For the wheat data, all methods produced similar predictions. In 25 The preceding is the density of a double exponential (DE) distribution with null mean, parameter 168 p 4 and variance V ar( ) = 8 . As mentioned earlier, Tibshirani (1996) and Park and Casella 169 (2008) used the DE distribution as conditional (given ) prior for regression coe¢ cients in the 170 BL, a member of the "Bayesian Alphabet" (Gianola et al. 2009). Gianola et al. (2018) assigned 171 the DE distribution to residuals of a linear model for the purpose of attenuating outliers and Li 172 et al. (2015) used the MLAP distribution for the residuals in a "robust" linear regression model 173 for QTL mapping. 174 MLAP is therefore an interesting candidate prior for multi-trait marker e¤ects in a multiple 175 trait generalization of the Bayesian LASSO (MBL). A zero-mean MLAP distribution has a 176 235 IW; the kernel of the density is often written as exp 1 2 tr R 1 0 (N + T )) S e , where S e = 236 S e = (N + T ) : 237 256 Hastings algorithm tailored for making draws from the distribution having density (29). A brief 257 description of the procedure follows. 258 10 Laplace distribution. First, six independent chains of 1500 ...

show abstract

“…A large number of researches had tried to apply single machine learning methods in genomic prediction [11,14,37,38]. However, the single machine learning methods in the most previous studies only performed well on several traits [13,14,38,39]. Therefore, we proposed a new strategy to utilize machine learning methods in genomic prediction.…”

Section: Discussionmentioning

confidence: 99%

“…Ogutu et al compared the prediction accuracy of random forest (RF), boosting and support vector machine (SVM) with rrBLUP in simulated dataset, in which rrBLUP outperformed the three machine learning methods [13]. Montesinos-López et al compared the prediction performance of multi-layer prediction, support vector machine with the Bayesian threshold genomic best linear unbiased prediction (TGBLUP) and believed that the reliability of two machine learning methods was comparable to TGBLUP, in some case, outperformed TGBLUP [14]. Even though the achievement of ML in GS had not been fantastic, the breeders still had the confidence in exploration of ML because of its outstanding performance in other majors.…”

Section: Introductionmentioning

confidence: 99%

A stacking ensemble learning framework for genomic prediction

Liang

Chang

et al. 2020

Preprint

View full text Add to dashboard Cite

Background: Machine learning (ML) is perhaps the most useful for the interpretation of large genomic datasets. However, the performance of a single machine learning method in genomic selection (GS) was unsatisfactory in existing research. To improve the genomic predictions, we constructed a stacking ensemble learning framework (SELF) integrated three machine learning methods to predict genomic estimated breeding values (GEBVs). Results: We evaluated the prediction ability of SELF by three real datasets and compared the prediction accuracy of SELF, base learners, GBLUP and BayesB. For each trait, SELF performed better than base learners, which included support vector regression (SVR), kernel ridge regression (KRR) and elastic net (ENET). The prediction accuracy of SELF had an average 7.70% improvement compared with GBLUP in three datasets. Except for the milk fat percentage (MFP) traits of the German Holstein dairy cattle dataset, SELF more robust than BayesB in the remaining traits.Conclusions: In this study, we utilized a stacking ensemble learning framework (SELF) to genomic prediction and it performed much better than GBLUP and BayesB in three real datasets with different genetic architecture. Therefore, we believed SEFL had the potential to be promoted to estimate GEBVs in other animals and plants.

show abstract

A Benchmarking Between Deep Learning, Support Vector Machine and Bayesian Threshold Best Linear Unbiased Prediction for Predicting Ordinal Traits in Plant Breeding

Cited by 115 publications

References 42 publications

Exploring Deep Learning for Complex Trait Genomic Prediction in Polyploid Outcrossing Species

Exploring Deep Learning for Complex Trait Genomic Prediction in Polyploid Outcrossing Species

A multiple-trait Bayesian Lasso for genome-enabled analysis and prediction of complex traits

A stacking ensemble learning framework for genomic prediction

Contact Info

Product

Resources

About