Background Compared to medium-density single nucleotide polymorphism (SNP) data, high-density SNP data contain abundant genetic variants and provide more information for the genetic evaluation of livestock, but it has been shown that they do not confer any advantage for genomic prediction and heritability estimation. One possible reason is the uneven distribution of the linkage disequilibrium (LD) along the genome, i.e., LD heterogeneity among regions. The aim of this study was to effectively use genome-wide SNP data for genomic prediction and heritability estimation by using models that control LD heterogeneity among regions. Methods The LD-adjusted kinship (LDAK) and LD-stratified multicomponent (LDS) models were used to control LD heterogeneity among regions and were compared with the classical model that has no such control. Simulated and real traits of 2000 dairy cattle individuals with imputed high-density (770K) SNP data were used. Five types of phenotypes were simulated, which were controlled by very strongly, strongly, moderately, weakly and very weakly tagged causal variants, respectively. The performances of the models with high- and medium-density (50K) panels were compared to verify that the models that controlled LD heterogeneity among regions were more effective with high-density data. Results Compared to the medium-density panel, the use of the high-density panel did not improve and even decreased prediction accuracies and heritability estimates from the classical model for both simulated and real traits. Compared to the classical model, LDS effectively improved the accuracy of genomic predictions and unbiasedness of heritability estimates, regardless of the genetic architecture of the trait. LDAK applies only to traits that are mainly controlled by weakly tagged causal variants, but is still less effective than LDS for this type of trait. Compared with the classical model, LDS improved prediction accuracy by about 13% for simulated phenotypes and by 0.3 to ~ 10.7% for real traits with the high-density panel, and by ~ 1% for simulated phenotypes and by − 0.1 to ~ 6.9% for real traits with the medium-density panel. Conclusions Grouping SNPs based on regional LD to construct the LD-stratified multicomponent model can effectively eliminate the adverse effects of LD heterogeneity among regions, and greatly improve the efficiency of high-density SNP data for genomic prediction and heritability estimation.
The size of reference population is an important factor affecting genomic prediction. Thus, combining different populations in genomic prediction is an attractive way to improve prediction ability. However, combining multireference population roughly cannot increase the prediction accuracy as well as expected in pig. This may be due to different linkage disequilibrium (LD) pattern differences between population. In this study, we used the imputed whole-genome sequencing (WGS) data to construct LD-based haplotypes for genomic prediction in combined population to explore the impact of different single-nucleotide polymorphism (SNP) densities, variant representation (SNPs or haplotype alleles), and reference population size on the prediction accuracy for reproduction traits. Our results showed that genomic best linear unbiased prediction (GBLUP) using the WGS data can improve prediction accuracy in multi-population but not within-population. Not only the genomic prediction accuracy of the haplotype method using 80 K chip data in multi-population but also GBLUP for the multi-population (3.4–5.9%) was higher than that within-population (1.2–4.3%). More importantly, we have found that using the haplotype method based on the WGS data in multi-population has better genomic prediction performance, and our results showed that building haploblock in this scenario based on low LD threshold (r2 = 0.2–0.3) produced an optimal set of variables for reproduction traits in Yorkshire pig population. Our results suggested that whether the use of the haplotype method based on the chip data or GBLUP (individual SNP method) based on the WGS data were beneficial for genomic prediction in multi-population, while simultaneously combining the haplotype method and WGS data was a better strategy for multi-population genomic evaluation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.