2022
DOI: 10.1093/bioinformatics/btab870
|View full text |Cite
|
Sign up to set email alerts
|

Pairs and Pairix: a file format and a tool for efficient storage and retrieval for Hi-C read pairs

Abstract: Summary As the amount of three-dimensional chromosomal interaction data continues to increase, storing and accessing such data efficiently becomes paramount. We introduce Pairs, a block-compressed text file format for storing paired genomic coordinates from Hi-C data, and Pairix, an open-source C application to index and query Pairs files. Pairix (also available in Python and R) extends the functionalities of Tabix to paired coordinates data. We have also developed PairsQC, a collapsible HTML… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
13
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 16 publications
(13 citation statements)
references
References 13 publications
0
13
0
Order By: Relevance
“…The ENCODE Hi-C pipeline has been developed with the Aiden lab using their Juicer suite of software tools 38 , with some updates to mapping parameters and chimeric read handling. There are essentially five steps in the pipeline (Fig 8A ); mapping (with bwa-mem) and filtering plus Pairix 39 to form a set of contacts, or pairs file. The genome is then binned into 14 resolutions (between 10bp and 2.5Mbp) by Juicer to form contact matrix (.hic) files.…”
Section: The Encode Hi-c Pipelinementioning
confidence: 99%
“…The ENCODE Hi-C pipeline has been developed with the Aiden lab using their Juicer suite of software tools 38 , with some updates to mapping parameters and chimeric read handling. There are essentially five steps in the pipeline (Fig 8A ); mapping (with bwa-mem) and filtering plus Pairix 39 to form a set of contacts, or pairs file. The genome is then binned into 14 resolutions (between 10bp and 2.5Mbp) by Juicer to form contact matrix (.hic) files.…”
Section: The Encode Hi-c Pipelinementioning
confidence: 99%
“…The modular structure of pairtools and its usage of the .pairs format 28 already make it useful in many pipelines. pairtools is used in the 4DN pipeline (standard Hi-C) 3 , the PORE-C pipeline (multi-way Hi-C) 75 , HI-CAR nf-core pipeline (open-chromatin-associated contacts) 76 , and iMARGI pipelines (RNA-DNA contacts) 49 77 .…”
Section: Discussionmentioning
confidence: 99%
“…Parse detects such cases, and merges the alignments, “rescuing” the true contact pair (Supplementary Figure 1d). The output of parse adheres to the standard format .pairs 28 , discussed below.…”
Section: Essential Building Blocks For 3c+ Pair Processingmentioning
confidence: 99%
See 1 more Smart Citation
“…The ENCODE Hi-C pipeline has been developed with the Aiden lab using their Juicer suite of software tools 39 , with some updates to mapping parameters and chimeric read handling. There are essentially five steps in the pipeline (Fig 8A ); mapping (with bwa-mem) and filtering plus Pairix 40 to form a set of contacts, or pairs file. The genome is then binned into 14 resolutions (between 10bp and 2.5Mbp) by Juicer to form contact matrix (.hic) files.…”
Section: The Encode Hi-c Pipelinementioning
confidence: 99%