Allo: Accurate allocation of multi-mapped reads enables regulatory element analysis at repeats

Morrissey, Alexis; Shi, Jeffrey; James, Daniela Q.; Mahony, Shaun

doi:10.1101/2023.09.12.556916

2023

DOI: 10.1101/2023.09.12.556916

|View full text |Cite

Preprint

Allo: Accurate allocation of multi-mapped reads enables regulatory element analysis at repeats

Alexis Morrissey,

Jeffrey Shi,

Daniela Q. James

et al.

Abstract: Transposable elements (TEs) and other repetitive regions have been shown to contain gene regulatory elements, including transcription factor binding sites. Unfortunately, regulatory elements harbored by repeats have proven difficult to characterize using short-read sequencing assays such as ChIP-seq or ATAC-seq. Most regulatory genomics analysis pipelines discard "multi-mapped" reads that align equally well to multiple genomic locations. Since multi-mapped reads arise predominantly from repeats, current analys… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2024

Publication Types

Select...

Preprint1

Relationship

Self Cite0

Independent1

Authors

Journals

Cited by 1 publication

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Alevin-fry-atac enables rapid and memory frugal mapping of single-cell ATAC-seq data using virtual colors for accurate genomic pseudoalignment

Singh,

Khan,

Patro

2024

Preprint

View full text Add to dashboard Cite

Ultrafast mapping of short reads to transcriptomic and metagenomic references via lightweight mapping techniques such as pseudoalignment has demonstrated success in substantially accelerating several types of analyses without much loss in accuracy compared to alignment-based approaches. The application of pseudoalignment to large reference sequences - like the genome - is, however, not trivial, due to the large size of the references or "targets" (i.e. chromosomes) and the presence of repetitive sequences within an individual reference sequence. This can lead to multiple matching locations for a k-mer within a single reference, which in turn can lead to false positive mappings and incorrect reference assignments for a read when the colors across the k-mer matches for a read are aggregated. Even when the read is determined to map to the appropriate reference, the increased occurrence of k-mer multi-matches within a reference can prevent the determination of the correct approximate position of the read, which is often critical in applications that map short reads to the genome. We propose a new and modified pseudoalignment scheme that partitions each reference into "virtual colors". These are essentially overlapping bins of fixed maximal extent on the reference sequences that are treated as distinct "colors" from the perspective of the pseudoalignment algorithm. A mapped k-mer is assigned a virtual color id that encodes the combination of the reference and within-reference bin in which the k-mer occurs. When the k-mers across a read are aggregated, the intersection is performed on virtual colors instead of the original colors (references), to determine the compatible set of targets (bins). The virtual colors can then be mapped back to the original references to provide the final mappings. The projection of the original reference sequences into virtual color space, and the corresponding modifications to the pseudoalignment procedure, can be applied dynamically at program invocation and without any modification of the underlying index itself. This makes the setting and modification of instance-appropriate parameters efficient and straightforward and the approach widely applicable. We apply this modified pseudoalignment procedure to process and map single-cell ATAC-seq data in our new tool alevin-fry-atac. We compare alevin-fry-atac to both Chromap and Cell Ranger ATAC. Alevin-fry-atac is highly scalable and, when using 32 threads, is approximately 1.78 times faster than Chromap (the second fastest approach) while using approximately 3 times less memory and mapping slightly more reads. The resulting peaks and clusters generated from alevin-fry-atac show high concordance with those obtained from both Chromap and the Cell Ranger ATAC pipeline, demonstrating that virtual color-enhanced pseudoalignment directly to the genome provides a fast, memory-frugal, and accurate alternative to existing approaches for single-cell ATAC-seq processing. The development of alevin-fry-atac brings single-cell ATAC-seq processing into a unified ecosystem with single-cell RNA-seq processing (via alevin-fry) to work toward providing a truly open alternative to many of the varied capabilities of CellRanger. Furthermore, our modified pseudoalignment approach should be easily applicable and extendable to other genome-centric mapping-based tasks and modalities such as standard DNA-seq, DNase-seq, Chip-seq and Hi-C.

show abstract

Alevin-fry-atac enables rapid and memory frugal mapping of single-cell ATAC-seq data using virtual colors for accurate genomic pseudoalignment

Singh,

Khan,

Patro

2024

Preprint

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Allo: Accurate allocation of multi-mapped reads enables regulatory element analysis at repeats

Cited by 1 publication

References 39 publications

Alevin-fry-atac enables rapid and memory frugal mapping of single-cell ATAC-seq data using virtual colors for accurate genomic pseudoalignment

Alevin-fry-atac enables rapid and memory frugal mapping of single-cell ATAC-seq data using virtual colors for accurate genomic pseudoalignment

Contact Info

Product

Resources

About