AtacWorks: A deep convolutional neural network toolkit for epigenomics

A, Lal; Chiang, Zachary; Yakovenko, Nikolai; Duarte, Fabiana M.; Israeli, Johnny; Buenrostro, Jason D.

doi:10.1101/829481

Cited by 8 publications

(26 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In line with the previous assessment of a single punch from the mouse SSp, cell type separation can be distinct for major cell types when leveraging larger peak sets than the limited number that can be called on small cell count datasets. This supports the assertion that computational improvements to enable peak calling on low cell count datasets can substantially boost analytical power 29 .…”

Section: Spatial Trajectories Of Single-cell Atac-seq In the Human Cosupporting

confidence: 77%

Spatially mapped single-cell chromatin accessibility

Thornton

Mulqueen

Torkenczy

et al. 2019

Preprint

View full text Add to dashboard Cite

High-throughput single-cell genomic assays resolve the heterogeneity of cell states in complex tissues, however, the spatial orientation within the network of interconnected cells is lost. As cell localization is a necessary dimension in understanding complex tissues and disease states, we present a novel method for highly-scalable spatially-resolved single-cell profiling of chromatin states. We use high density multiregional sampling to perform single-cell combinatorial indexing on Microbiopsies Assigned to Positions for the Assay for Transposase Accessible Chromatin (sciMAP-ATAC) to produce single-cell data of equivalent quality to non-spatial single-cell ATAC-seq. We apply sciMAP-ATAC in the adult mouse cortex to discriminate cortical layering of glutamatergic neurons and establish the spatial ordering of single cells within intact tissue. We then leverage this spatiallyoriented cell dataset by combining it with non-spatially resolved whole brain sci-ATAC-seq data and assess layer-specific marker gene chromatin accessibility and transcription factor motif enrichment. Using sciMAP-ATAC seq, we identify sets of regulatory elements that spatially vary in the cortex, which includes canonical layer-specific markers and previously unannotated putative regulatory elements.

show abstract

Section: Spatial Trajectories Of Single-cell Atac-seq In the Human Cosupporting

confidence: 77%

Spatially mapped single-cell chromatin accessibility

Thornton

Mulqueen

Torkenczy

et al. 2019

Preprint

View full text Add to dashboard Cite

show abstract

“…We also achieve nearly linear scaling with our implementation on multiple sockets of Intel ® Xeon ® Cascade/Cooper/Ice Lake CPUs. We demonstrate that our execution on multiple CPU sockets is significantly faster than the published results for DGX-1 [20] box with 8 V100s [2] without any loss of accuracy. For a fair comparison with DGX-1 box, we use CPU systems with similar power envelop.…”

Section: Introductionmentioning

confidence: 73%

“…Subsequently, we scale our experiments by increasing the number of CPU sockets, dataset size, and ATAC-seq signal track size. We show multi-socket CPU scaling results and compare them with multi-GPU results published in [2].…”

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Accelerating Identification of Chromatin Accessibility from noisy ATAC-seq Data using Modern CPUs

Chaudhary

Misra

Kalamkar

et al. 2021

Preprint

View full text Add to dashboard Cite

Identifying accessible chromatin regions is a fundamental problem in epigenomics with ATAC-seq being a commonly used assay. Exponential rise in single cell ATAC-seq experiments has made it critical to accelerate processing of ATAC-seq data. ATAC-seq data can have a low signal-to-noise ratio for various reasons including low coverage or low cell count. To denoise and identify accessible chromatin regions from noisy ATAC-seq data, use of deep learning on 1D data - using large filter sizes, long tensor widths, and/or dilation - has recently been proposed. Here, we present ways to accelerate the end-to-end training performance of these deep learning based methods using CPUs. We evaluate our approach on the recently released AtacWorks toolkit. Compared to an Nvidia DGX-1 box with 8 V100 GPUs, we get up to 2.27x speedup using just 16 CPU sockets. To achieve this, we build an efficient 1D dilated convolution layer and demonstrate reduced precision (BFloat16) training.

show abstract

“…Our work demonstrates the feasibility of training such complex models (thousands of free parameters) on limited data sets (hundreds rather than thousands of samples), and we have tested that it can handle other data sets of much larger size (tens of thousands of samples, data not shown). It stands in contrast to other available implementations of neural networks for regulatory genomics, which are targeted to modeling epigenomic (39, 59, 60) and cistromic (36, 38) data, or do not explicitly model the dependence of sequence function on cellular descriptors such as TF levels (61). This feature allows CoNSEPT to make predictions for varying cellular conditions.…”

Section: Discussionmentioning

confidence: 99%

Deciphering enhancer sequence using thermodynamics-based models and convolutional neural networks

Dibaeinia

Sinha

2021

Preprint

View full text Add to dashboard Cite

Deciphering the sequence-function relationship encoded in enhancers holds the key to interpreting non-coding variants and understanding mechanisms of transcriptomic variation. Several quantitative models exist for predicting enhancer function and underlying mechanisms; however, there has been no systematic comparison of these models characterizing their relative strengths and shortcomings. Here, we interrogated a rich data set of neuroectodermal enhancers in Drosophila, representing cis- and trans- sources of expression variation, with a suite of biophysical and machine learning models. We performed rigorous comparisons of thermodynamics-based models implementing different mechanisms of activation, repression, and cooperativity. Moreover, we developed a convolutional neural network (CNN) model, called CoNSEPT, that learns enhancer "grammar" in an unbiased manner. CoNSEPT is the first general-purpose CNN tool for predicting enhancer function in varying conditions, and we show that such complex models can suggest interpretable mechanisms. We found model-based evidence for mechanisms previously established for the studied system, including cooperative activation and short-range repression. The data also favored one hypothesized activation mechanism over another and suggested an intriguing role for a direct, distance-independent repression mechanism. Our modeling shows that while fundamentally different models can yield similar fits to data, they vary in their utility for mechanistic inference. CoNSEPT is freely available at: https://github.com/PayamDiba/CoNSEPT.

show abstract

AtacWorks: A deep convolutional neural network toolkit for epigenomics

Cited by 8 publications

References 33 publications

Spatially mapped single-cell chromatin accessibility

Spatially mapped single-cell chromatin accessibility

Accelerating Identification of Chromatin Accessibility from noisy ATAC-seq Data using Modern CPUs

Deciphering enhancer sequence using thermodynamics-based models and convolutional neural networks

Contact Info

Product

Resources

About