We developed an integrated R library called BWGS to enable easy computation of Genomic Estimates of Breeding values (GEBV) for genomic selection. BWGS, for BreedWheat Genomic selection, was developed in the framework of a cooperative private-public partnership project called Breedwheat (https://breedwheat.fr) and relies on existing R-libraries, all freely available from CRAN servers. The two main functions enable to run 1) replicated random cross validations within a training set of genotyped and phenotyped lines and 2) GEBV prediction, for a set of genotyped-only lines. Options are available for 1) missing data imputation, 2) markers and training set selection and 3) genomic prediction with 15 different methods, either parametric or semi-parametric. The usefulness and efficiency of BWGS are illustrated using a population of wheat lines from a real breeding programme. Adjusted yield data from historical trials (highly unbalanced design) were used for testing the options of BWGS. On the whole, 760 candidate lines with adjusted phenotypes and genotypes for 47 839 robust SNP were used. With a simple desktop computer, we obtained results which compared with previously published results on wheat genomic selection. As predicted by the theory, factors that are most influencing predictive ability, for a given trait of moderate heritability, are the size of the training population and a minimum number of markers for capturing every QTL information. Missing data up to 40%, if randomly distributed, do not degrade predictive ability once imputed, and up to 80% randomly distributed missing data are still acceptable once imputed with Expectation-Maximization method of package rrBLUP. It is worth noticing that selecting markers that are most associated to the trait do improve predictive ability, compared with the whole set of markers, but only when marker selection is made on the whole population. When marker selection is made only on the sampled training set, this advantage nearly disappeared, since it was clearly due to overfitting. Few differences are observed between the 15 prediction models with this dataset. Although non-parametric methods that are supposed to capture non-additive effects have slightly better predictive accuracy, differences remain small. Finally, the GEBV from the 15 prediction models are all highly correlated to each other. These results are encouraging for an efficient use of genomic selection in applied breeding programmes and BWGS is a simple and powerful toolbox to apply in breeding programmes or training activities.
BackgroundAssociation studies are of great interest to identify genes explaining trait variation since they deal with more than just a few alleles like classical QTL analyses. They are usually performed using collections representing a wide range of variability but which could present a genetic substructure. The aim of this paper is to demonstrate that association studies can be performed using synthetic varieties obtained after several panmictic generations. This demonstration is based on an example of association between the gibberellic acid insensitive gene (GAI) polymorphism and leaf length polymorphism in 'Herbie', a synthetic variety of perennial ryegrass.MethodsLeaf growth parameters, consisted of leaf length, maximum leaf elongation rate (LERmax) and leaf elongation duration (LED), were evaluated in spring and autumn on 216 plants of Herbie with three replicates. For each plant, a sequence of 370 bp in GAI was analysed for polymorphism.ResultsGenetic effect was highly significant for all traits. Broad sense heritabilities were higher for leaf length and LERmax with about 0.7 in each period and 0.5 considering both periods than for LED with about 0.4 in each period and 0.3 considering both periods. GAI was highly polymorphic with an average of 12 bp between two consecutive SNPs and 39 haplotypes in which 9 were more frequent. Linkage disequilibrium declined rapidly with distance with r 2 values lower than 0.2 beyond 150 bp. Sequence polymorphism of GAI explained 8-14% of leaf growth parameter variation. A single SNP explained 4% of the phenotypic variance of leaf length in both periods which represents a difference of 33 mm on an average of 300 mm.ConclusionsSynthetic varieties in which linkage disequilibrium declines rapidly with distance are suitable for association studies using the "candidate gene" approach. GAI polymorphism was found to be associated with leaf length polymorphism which was more correlated to LERmax than to LED in Herbie. It is a good candidate to explain leaf length variation in other plant material.
Synthetic varieties obtained after three to four panmictic generations are variable, not structured and so can be used for association studies. The pattern of linkage disequilibrium (LD) decay determines whether a genome scan or a candidate gene approach can be used for an association study between genotype and phenotype. Our goal was to evaluate the effect of the number of parents used to build the synthetic varieties on the pattern of LD decay. LD was investigated in the gibberelic acid insensitive gene (GAI) region in three synthetic varieties of perennial ryegrass (Lolium perenne L.) chosen for their contrasted number of parents in the initial polycrosses. Results were compared with those obtained from a core collection. STS and SSR markers were used to evaluate variation, structuration and LD in each variety. As expected, the varieties variability increased with the number of parents almost up to the core collection variability. No structuration was observed in the varieties. Significant LDs were observed up to 1.6 Mb in a variety originated from six related parents and not above 174 kb in a variety originated from 336 parents. These results suggest that a candidate gene approach can be used when varieties have a large number of parents and a genome scan approach can be envisaged in specific regions when varieties have a low number of parents. Nevertheless, we strongly recommend to estimate the pattern of LD decay in the population and in the genomic region studied before performing an association study.
Key messagePhenomic selection accurately predicts heading date and grain yield of wheat breeding lines. Combining spectra from different environments and optimising the training set maximise the predictive ability of the method.Phenomic selection (PS) is a recent breeding approach similar to genomic selection (GS) except that genotyping is replaced by near infrared (NIR) spectroscopy. PS can potentially account for non-additive effects and has the major advantage of being low cost and high throughput. Factors influencing GS predictive abilities have been intensively studied, but little is known about PS. We tested and compared the abilities of PS and GS to predict grain yield and heading date from several datasets of bread wheat lines corresponding to the first or second years of trial evaluation from two breeding companies and one research institute in France. We evaluated several factors affecting PS predictive abilities including the possibility of combining spectra collected in different environments.A simple H-BLUP model predicted both traits with prediction ability from 0.26 to 0.62, and with an efficient computation time. Our results showed that the environments in which lines are grown had a crucial impact on predictive ability based on the spectra acquired and was specific to the trait considered. Models combining NIR spectra from different environments were the best PS models and were at least as accurate as GS in most of the datasets. Furthermore, a GH-BLUP model combining genotyping and NIR spectra was the best model of all (prediction ability from 0.31 to 0.73). We demonstrated also that as for GS, the size and the composition of the training set has a crucial impact on predictive ability. PS could therefore replace or complement GS for efficient wheat breeding programs.
HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d'enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.