Low-pass sequencing (sequencing a genome to an average depth less than 1× coverage) combined with genotype imputation has been proposed as an alternative to genotyping arrays for trait mapping and calculation of polygenic scores. To empirically assess the relative performance of these technologies for different applications, we performed low-pass sequencing (targeting coverage levels of 0.5× and 1×) and array genotyping (using the Illumina Global Screening Array [GSA]) on 120 DNA samples derived from African- and European-ancestry individuals that are part of the 1000 Genomes Project. We then imputed both the sequencing data and the genotyping array data to the 1000 Genomes Phase 3 haplotype reference panel using a leave-one-out design. We evaluated overall imputation accuracy from these different assays as well as overall power for GWAS from imputed data and computed polygenic risk scores for coronary artery disease and breast cancer using previously derived weights. We conclude that low-pass sequencing plus imputation, in addition to providing a substantial increase in statistical power for genome-wide association studies, provides increased accuracy for polygenic risk prediction at effective coverages of ∼0.5× and higher compared to the Illumina GSA.
Low-pass sequencing (sequencing a genome to an average depth less than 1x coverage) combined with genotype imputation has been proposed as an alternative to genotyping arrays for trait mapping and calculation of polygenic scores; however, the current literature is largely limited to simulation-and downsampling-based approaches. To empirically assess the relative performance of these technologies for different applications, we performed low-pass sequencing (targeting coverage levels of 0.5x and 1x) and array genotyping (using the Illumina Global Screening Array) on 120 DNA samples derived from African and European-ancestry individuals that are part of the 1000 Genomes Project. We then imputed both the sequencing data and the genotyping array data to the 1000 Genomes Phase 3 haplotype reference panel using a leaveone-out design. First, we evaluated overall imputation accuracy from these different assays as measured by genotype concordance; we introduce the concept of effective coverage that accounts for evenness of sequencing and show that this metric is a better predictor of imputation accuracy than nominal mapped coverage for low-pass sequencing data. Next, we evaluated overall power for genome-wide association studies (GWAS) as measured by the squared correlation between imputed and true genotypes. In the African individuals, at common variants (> 5% minor allele frequency), imputation r 2 averaged 0.83 for the array data and ranged from 0.89 to 0.95 for the low-pass sequencing data, corresponding to an effective 7 − 15% increase in GWAS discovery power. For the same variants in the European individuals, imputation r 2 averaged 0.91 for the array data and ranged from 0.92-0.96 for the low-pass sequencing data, corresponding to an effective 1-6% increase in GWAS discovery power. Finally, we computed polygenic risk scores for breast cancer and coronary artery disease from the different assays. We observed consistently lower measurement error for risk scores computed from low-pass sequencing data above an effective coverage of ∼ 0.5x. The mean squared error of the array-based estimates was three to four times that of the estimates from samples sequenced at an effective coverage of ∼ 1.2x for coronary artery disease, with qualitatively similar results for breast cancer. We conclude that low-pass sequencing plus imputation, in addition to providing a substantial increase in statistical power for genome wide association studies, provides increased accuracy for polygenic risk prediction at effective coverages of ∼ 0.5x and higher.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.