Genotyping‐by‐sequencing (GBS) is an alternative genotyping method to single‐nucleotide polymorphism (SNP) arrays that has received considerable attention in the plant breeding community. In this study we use simulation to quantify the potential of low‐coverage GBS and imputation for cost‐effective genomic selection in biparental segregating populations. The simulations comprised a range of scenarios where SNP array or GBS data were used to train the genomic selection model, to predict breeding values, or both. The GBS data were generated with sequencing coverages (x) from 4x to 0.01x. The data were used either nonimputed or imputed by the AlphaImpute program. The size of the training and prediction sets was either held fixed or was increased by reducing sequencing coverage per individual. The results show that nonimputed 1x GBS data provided comparable prediction accuracy and bias, and for the used measurement of return on investment, outperformed the SNP array data. Imputation allowed for further reduction in sequencing coverage, to as low as 0.1x with 10,000 markers or 0.01x with 100,000 markers. The results suggest that using such data in biparental families gave up to 5.63 times higher return on investment than using the SNP array data. Reduction of sequencing coverage per individual and imputation can be leveraged to genotype larger training sets to increase prediction accuracy and larger prediction sets to increase selection intensity, which both allow for higher response to selection and higher return on investment.
Genomic selection has great potential to increase the efficiency of plant breeding, but its implementation is hindered by the high costs of collecting the necessary data. In this study we evaluated the potential of accurate within‐family imputation for enabling cost‐effective genomic selection. We have simulated a breeding program with inbred parents and their segregating progeny distributed among families, of which some were used as a training set and some were used as a prediction set. Parents were genotyped at high density (20,000 markers), while progeny were genotyped at high or low density (500, 200, 100, or 50 markers) and imputed. Low‐density markers were chosen to segregate within each family separately. The assumed low‐density genotyping costs accounted for this assumption. Six sets of scenarios were analyzed in which imputation was leveraged to maximize cost effectiveness of genomic selection by (i) decreasing the genotyping costs, (ii) increasing selection intensity by genotyping more individuals at fewer markers, or (iii) increasing prediction accuracy by genotyping more phenotyped individuals at fewer markers. The results show that, with a constant size of the training and prediction sets, the prediction accuracy was unimpaired when at least 200 low‐density markers were used. However, the return on investment was maximal (5.67 times that of the baseline scenario) when only 50 low‐density markers were used because that enabled maximal reduction in the genotyping costs and only minimal reduction in the prediction accuracy. Increasing either the training set or prediction set further increased the return on investment when imputed genotypes were used, but not when the true high‐density genotypes were used. The results show how plant breeding programs can implement genomic selection in a cost‐effective way.
BackgroundThis paper describes a combined heuristic and hidden Markov model (HMM) method to accurately impute missing genotypes in livestock datasets. Genomic selection in breeding programs requires high-density genotyping of many individuals, making algorithms that economically generate this information crucial. There are two common classes of imputation methods, heuristic methods and probabilistic methods, the latter being largely based on hidden Markov models. Heuristic methods are robust, but fail to impute markers in regions where the thresholds of heuristic rules are not met, or the pedigree is inconsistent. Hidden Markov models are probabilistic methods which typically do not require specific family structures or pedigree information, making them very flexible, but they are computationally expensive and, in some cases, less accurate.ResultsWe implemented a new hybrid imputation method that combined heuristic and HMM methods, AlphaImpute and MaCH, and compared the computation time and imputation accuracy of the three methods. AlphaImpute was the fastest, followed by the hybrid method and then the HMM. The computation time of the hybrid method and the HMM increased linearly with the number of iterations used in the hidden Markov model, however, the computation time of the hybrid method increased almost linearly and that of the HMM quadratically with the number of template haplotypes. The hybrid method was the most accurate imputation method for low-density panels when pedigree information was missing, especially if minor allele frequency was also low. The accuracy of the hybrid method and the HMM increased with the number of template haplotypes. The imputation accuracy of all three methods increased with the marker density of the low-density panels. Excluding the pedigree information reduced imputation accuracy for the hybrid method and AlphaImpute. Finally, the imputation accuracy of the three methods decreased with decreasing minor allele frequency.ConclusionsThe hybrid heuristic and probabilistic imputation method is able to impute all markers for all individuals in a population, as the HMM. The hybrid method is usually more accurate and never significantly less accurate than a purely heuristic method or a purely probabilistic method and is faster than a standard probabilistic method.Electronic supplementary materialThe online version of this article (doi:10.1186/s12711-017-0300-y) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.