Models based on additive marker effects and on epistatic interactions can be translated into genomic relationship models. This equivalence allows to perform predictions based on complex gene interaction models and reduces computational effort significantly. In the theory of genome-assisted prediction, the equivalence of a linear model based on independent and identically normally distributed marker effects and a model based on multivariate Gaussian distributed breeding values with genomic relationship as covariance matrix is well known. In this work, we demonstrate equivalences of marker effect models incorporating epistatic interactions and corresponding mixed models based on relationship matrices and show how to exploit these equivalences computationally for genome-assisted prediction. In particular, we show how models with epistatic interactions of higher order (e.g., three-factor interactions) translate into linear models with certain covariance matrices and demonstrate how to construct epistatic relationship matrices for the linear mixed model, if we restrict the model to interactions defined a priori. We illustrate the practical relevance of our results with a publicly available data set on grain yield of wheat lines growing in four different environments. For this purpose, we select important interactions in one environment and use this knowledge on the network of interactions to increase predictive ability of grain yield under other environmental conditions. Our results provide a guide for building relationship matrices based on knowledge on the structure of trait-related gene networks.
Background Several conventional genomic Bayesian (or no Bayesian) prediction methods have been proposed including the standard additive genetic effect model for which the variance components are estimated with mixed model equations. In recent years, deep learning (DL) methods have been considered in the context of genomic prediction. The DL methods are nonparametric models providing flexibility to adapt to complicated associations between data and output with the ability to adapt to very complex patterns. Main body We review the applications of deep learning (DL) methods in genomic selection (GS) to obtain a meta-picture of GS performance and highlight how these tools can help solve challenging plant breeding problems. We also provide general guidance for the effective use of DL methods including the fundamentals of DL and the requirements for its appropriate use. We discuss the pros and cons of this technique compared to traditional genomic prediction approaches as well as the current trends in DL applications. Conclusions The main requirement for using DL is the quality and sufficiently large training data. Although, based on current literature GS in plant and animal breeding we did not find clear superiority of DL in terms of prediction power compared to conventional genome based prediction models. Nevertheless, there are clear evidences that DL algorithms capture nonlinear patterns more efficiently than conventional genome based. Deep learning algorithms are able to integrate data from different sources as is usually needed in GS assisted breeding and it shows the ability for improving prediction accuracy for large plant breeding data. It is important to apply DL to large training-testing data sets.
""Sparse testing" refers to reduced multi-environment breeding trials in which not all genotypes of interest are grown in each environment. Using genomic-enabled prediction and a model embracing genotype × environment interaction (GE), the non-observed genotype-in-environment combinations can be predicted. Consequently, the overall costs can be reduced and the testing capacities can be increased. The accuracy of predicting the unobserved data depends on different factors including (1) how many genotypes overlap between environments, (2) in how many environments each genotype is grown, and (3) which prediction method is used. In this research, we studied the predictive ability obtained when using a fixed number of plots and different sparse testing designs. The considered designs included the extreme cases of (1) no overlap of genotypes between environments, and (2) complete overlap of the genotypes between environments. In the latter case, the prediction set fully consists of genotypes that have not been tested at all. Moreover, we gradually go from one extreme to the other considering (3) intermediates between the two previous cases with varying numbers of different or non-overlapping (NO)/overlapping (O) genotypes. The empirical study is built upon two different maize hybrid data sets consisting of different genotypes crossed to two different testers (T1 and T2) and each data set was analyzed separately. For each set, phenotypic records on yield from three different environments are available. Three different prediction models were implemented, two main effects models (M1 and M2), and a model (M3) including the genotype-by-environment interaction term (GE). The results showed that the genome-based model including GE (M3) captured more phenotypic variation than the models that did not include this component. Also, M3 provided higher prediction accuracy than models M1 and M2 for the different allocation scenarios. Reducing the size of the calibration sets decreased the prediction accuracy under all allocation designs with M3being the less affected model; however, using the genome-enabled models (i.e., M2 and M3) the predictive ability is recovered when more genotypes are tested across environments. Our results indicate that a substantial part of the testing resources can be saved when using genome-based models including GE for optimizing sparse testing designs.
BackgroundEpistasis marker effect models incorporating products of marker values as predictor variables in a linear regression approach (extended GBLUP, EGBLUP) have been assessed as potentially beneficial for genomic prediction, but their performance depends on marker coding. Although this fact has been recognized in literature, the nature of the problem has not been thoroughly investigated so far.ResultsWe illustrate how the choice of marker coding implicitly specifies the model of how effects of certain allele combinations at different loci contribute to the phenotype, and investigate coding-dependent properties of EGBLUP. Moreover, we discuss an alternative categorical epistasis model (CE) eliminating undesired properties of EGBLUP and show that the CE model can improve predictive ability. Finally, we demonstrate that the coding-dependent performance of EGBLUP offers the possibility to incorporate prior experimental information into the prediction method by adapting the coding to already available phenotypic records on other traits.ConclusionBased on our results, for EGBLUP, a symmetric coding {−1,1} or {−1,0,1} should be preferred, whereas a standardization using allele frequencies should be avoided. Moreover, CE can be a valuable alternative since it does not possess the undesired theoretical properties of EGBLUP. However, which model performs best will depend on characteristics of the data and available prior information. Data from previous experiments can for instance be incorporated into the marker coding of EGBLUP.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1439-1) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.