2021
DOI: 10.1101/2021.09.08.459495
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

scBasset: Sequence-based modeling of single cell ATAC-seq using convolutional neural networks

Abstract: Single cell ATAC-seq (scATAC) shows great promise for studying cellular heterogeneity in epigenetic landscapes, but there remain significant challenges in the analysis of scATAC data due to the inherent high dimensionality and sparsity. Here we introduce scBasset, a sequence-based convolutional neural network method to model scATAC data. We show that by leveraging the DNA sequence information underlying accessibility peaks and the expressiveness of a neural network model, scBasset achieves state-of-the-art per… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 7 publications
(10 citation statements)
references
References 38 publications
0
10
0
Order By: Relevance
“…The neighbor score varies between 0 when both representations disagree completely to 1 when both representations are identical (see Methods and Figure 1B). This evaluation has been previously used in [18, 19] and is currently the standard for evaluating modality alignment tasks in recent community benchmarks such as https://openproblems.bio/.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…The neighbor score varies between 0 when both representations disagree completely to 1 when both representations are identical (see Methods and Figure 1B). This evaluation has been previously used in [18, 19] and is currently the standard for evaluating modality alignment tasks in recent community benchmarks such as https://openproblems.bio/.…”
Section: Resultsmentioning
confidence: 99%
“…Overall, looking at the available datasets and prior biological knowledge, it appears reasonable and the best we can have without proper labels. We can also note that this approach has already been successfully used in evaluating scATAC-seq pipelines in [19, 18].…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Following their default variable peak selection, peaks accessible in fewer than 5% of cells were removed, leaving 38,502 peaks for training. scBasset was trained for 45 epochs, at which point the validation loss and AUC had leveled and the correlation between the intercept and the library size was 0.99 5 .…”
Section: Methodsmentioning
confidence: 99%
“…This approach can indeed group cells by their cell type identity but introduces bias through a priori motif choice and therefore may miss latent structure in the data. Recently, scBasset 5 used a multi-task neural network approach to learn a sequence model for chromatin accessible peaks that passes through a low-dimensional bottleneck layer, together with cell-specific model vectors that predict whether a peak-given its bottleneck representation-will be accessible in the cell. This approach yields a low dimensional representation of cells via the model vectors and can assign TF accessibility scores to cells by passing a sequence with a planted motif through the model.…”
Section: Mainmentioning
confidence: 99%