2022
DOI: 10.1093/bioinformatics/btac413
|View full text |Cite
|
Sign up to set email alerts
|

XSI—a genotype compression tool for compressive genomics in large biobanks

Abstract: Motivation Generation of genotype data has been growing exponentially over the last decade. With the large size of recent datasets comes a storage and computational burden with ever increasing costs. To reduce this burden we propose XSI, a file format with reduced storage footprint that also allows computation on the compressed data and we show how this can improve future analyses. Results We show that XSI allows for a file s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
4

Relationship

3
6

Authors

Journals

citations
Cited by 12 publications
(7 citation statements)
references
References 30 publications
0
7
0
Order By: Relevance
“…49, 50, 51], and many methods have been suggested to improve on these properties. Most approaches balance compression with performance on particular types of queries, typically using a command line interface (CLI) and outputting VCF text [50, 51, 52, 53, 54, 55, 56, 57, 58, 59]. Several specialised algorithms for compressing the genotype matrix (i.e., just the genotype calls without additional VCF information) have been proposed [60, 61, 62, 63, 64, 65] most notably the Positional Burrows–Wheeler Transform (PBWT) [66].…”
Section: Resultsmentioning
confidence: 99%
“…49, 50, 51], and many methods have been suggested to improve on these properties. Most approaches balance compression with performance on particular types of queries, typically using a command line interface (CLI) and outputting VCF text [50, 51, 52, 53, 54, 55, 56, 57, 58, 59]. Several specialised algorithms for compressing the genotype matrix (i.e., just the genotype calls without additional VCF information) have been proposed [60, 61, 62, 63, 64, 65] most notably the Positional Burrows–Wheeler Transform (PBWT) [66].…”
Section: Resultsmentioning
confidence: 99%
“… 16 , 17 ). To cope with the large numbers of rare variants, SHAPEIT5 uses a sparse data representation for rare variants: only genotypes carrying at least one copy of the minor allele are stored in memory and considered for computation, thereby discarding all genotypes being homozygous for the major allele 21 , 22 . SHAPEIT5 phases each rare heterozygous genotype conditioning on a small number of informative haplotypes (Fig.…”
Section: Resultsmentioning
confidence: 99%
“…Once the first phasing stage is completed at common variants, the resulting haplotypes are used in a second stage as a scaffold onto which rare variants (MAF < 0.1%) are phased one after another. To cope with the large numbers of rare variants, SHAPEIT5 uses a sparse data representation: only genotypes carrying at least one copy of the minor allele are stored in memory and considered for computation, thereby discarding all genotypes being homozygous for the major allele 15,19,20 . SHAPEIT5 phases each rare heterozygous genotype conditioning on a small number of informative haplotypes in the dataset (Figure 1b).…”
Section: Resultsmentioning
confidence: 99%