2022
DOI: 10.1101/2022.11.21.517407
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

excluderanges: exclusion sets for T2T-CHM13, GRCm39, and other genome assemblies

Abstract: SummaryExclusion regions are sections of reference genomes with abnormal pileups of short sequencing reads. Removing reads overlapping them improves biological signal, and these benefits are most pronounced in differential analysis settings. Several labs created exclusion region sets, available primarily through ENCODE and Github. However, the variety of exclusion sets creates uncertainty which sets to use. Furthermore, gap regions (e.g., centromeres, telomeres, short arms) create additional considerations in … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(5 citation statements)
references
References 16 publications
0
5
0
Order By: Relevance
“…The initial bam file was then filtered for a minimum MAPQ of 30. Subsequently, PCR duplicates and reads mapping to the mitochondrial chromosome or within regions from a unified forbidden list from ENCODE and 78, 79 . Finally, only correctly mated mapped reads were retained.…”
Section: Methodsmentioning
confidence: 99%
“…The initial bam file was then filtered for a minimum MAPQ of 30. Subsequently, PCR duplicates and reads mapping to the mitochondrial chromosome or within regions from a unified forbidden list from ENCODE and 78, 79 . Finally, only correctly mated mapped reads were retained.…”
Section: Methodsmentioning
confidence: 99%
“…Issue-prone regions of the genome for each build were defined based on both official issue reports from the consortium that produced the assembly (hg19 and hg38: Genome Reference Consortium 9, 43 ; CHM13: Telomere2Telomere Consortium 28 ), and region blacklists generated by independent sources 77, 78 . The hg19 and hg38 exclusion regions (previously “blacklisted regions”) were defined by ENCODE as difficult-to-sequence regions with tendencies towards high multimapping rates or high mapping variability 77 .…”
Section: Methodsmentioning
confidence: 99%
“…Bed files delineating the ENCODE blacklisted regions for hg19 and hg38 were accessed for this study on January 23, 2023. As no official ENCODE exclusion list for CHM13 was available at the time of this publication, the corresponding blacklisted regions for CHM13 were obtained as a bed file on February 21, 2023 from excluderanges, a bioconductor package for tracking problematic genomic regions across genome assemblies 78, 79 .…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…For example, an enrichment of assay for transposase-accessible chromatin with sequencing (ATAC-seq) peaks near certain genes may indicate a regulatory relationship ( Lee et al 2020 ), and enrichment of genome-wide association study (GWAS) single nucleotide polymorphisms (SNPs) near tissue-specific ATAC-seq peaks may suggest mechanisms underlying the GWAS trait. Such analyses rely on specifying a null distribution, where one strategy is to uniformly shuffle one set of the genomic ranges in the genome, possibly considering a set of excluded regions where ranges should not be placed ( Ogata et al 2022 ). However, uniformly distributed null sets will not exhibit the clumping property common with genomic regions.…”
Section: Introductionmentioning
confidence: 99%