2022
DOI: 10.1101/2022.10.19.512867
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Accurate rare variant phasing of whole-genome and whole-exome sequencing data in the UK Biobank

Abstract: The UK Biobank performed whole-genome sequencing (WGS) and whole-exome sequencing (WES) across hundreds of thousands of individuals, allowing researchers to study the effects of both common and rare variants. Haplotype phasing distinguishes the two inherited copies of each chromosome into haplotypes and unlocks novel analyses at the haplotype level. In this work, we describe a new phasing method, SHAPEIT5, that accurately and rapidly phases large sequencing datasets and illustrates its key features on the UK B… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
64
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
2

Relationship

4
4

Authors

Journals

citations
Cited by 35 publications
(64 citation statements)
references
References 60 publications
0
64
0
Order By: Relevance
“…For the SNP array data, we applied the standard QC recommended by the original authors [24], and phased the data using SHAPEIT4 [25]resulting in 976,754 haplotypes and a total of 670,741 SNPs. For the whole genome sequencing data available on the UK Biobank research analysis platform [1], we used data recently processed and phased by the SHAPEIT5 authors [26], for a total of 300,238 haplotypes and 13,780,193 bi-allelic SNPs and indels on chromosome 20. For the UK Biobank WGS dataset, we applied our method independently to 13 regions of at least 4 megabases and 4 centimorgans on chromosome 20.…”
Section: Resultsmentioning
confidence: 99%
“…For the SNP array data, we applied the standard QC recommended by the original authors [24], and phased the data using SHAPEIT4 [25]resulting in 976,754 haplotypes and a total of 670,741 SNPs. For the whole genome sequencing data available on the UK Biobank research analysis platform [1], we used data recently processed and phased by the SHAPEIT5 authors [26], for a total of 300,238 haplotypes and 13,780,193 bi-allelic SNPs and indels on chromosome 20. For the UK Biobank WGS dataset, we applied our method independently to 13 regions of at least 4 megabases and 4 centimorgans on chromosome 20.…”
Section: Resultsmentioning
confidence: 99%
“…Rare variants, which are of the greatest interest in Mendelian diseases, are also challenging to phase using population-based approaches given the small numbers of shared haplotypes from which to make phasing estimates in the population. Recent methods have shown accurate phasing of rare variants using genome sequencing data [25][26][27] , but relies on a large genome reference panel. In our work, there were limited numbers of genome sequences available for use in a population-based phasing approach.…”
Section: Discussionmentioning
confidence: 99%
“…To demonstrate the benefits of using sequenced biobanks for lcWGS imputation, we phased the recent release of the UK Biobank (UKB) WGS data 3,4 using SHAPEIT5 5 and created a UKB reference panel of 280,238 haplotypes and 582,534,516 markers ( Supplementary Note S1 ). We used the UKB panel to impute lcWGS samples with GLIMPSE2 and other recently released imputation methods: GLIMPSE1 1 and QUILT v1.0.4 2 .…”
Section: Mainmentioning
confidence: 99%