2023
DOI: 10.1101/2023.08.01.551452
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Fast clustering and cell-type annotation of scATAC data using pre-trained embeddings

Nathan J. LeRoy,
Jason P. Smith,
Guangtao Zheng
et al.

Abstract: MotivationData from the single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) is now widely available. One major computational challenge is dealing with high dimensionality and inherent sparsity, which is typically addressed by producing lower-dimensional representations of single cells for downstream clustering tasks. Current approaches produce such individual cell embeddings directly through a one-step learning process. Here, we propose an alternative approach by building embed… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1

Relationship

4
0

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 35 publications
0
2
0
Order By: Relevance
“…Our approach leverages recent advances in neural embedding methods and the growing corpus of epigenome data to tie natural language to genomic intervals. Neural embedding approaches show great promise for a variety of biological applications [ 22 , 23 , 24 , 25 , 26 , 27 , 28 ]. In particular, the StarSpace neural embedding approach has been recently used to learn representations of cancer mutational signatures and has shown to be resource-efficient, flexible, and scalable [ 29 ].…”
Section: Introductionmentioning
confidence: 99%
“…Our approach leverages recent advances in neural embedding methods and the growing corpus of epigenome data to tie natural language to genomic intervals. Neural embedding approaches show great promise for a variety of biological applications [ 22 , 23 , 24 , 25 , 26 , 27 , 28 ]. In particular, the StarSpace neural embedding approach has been recently used to learn representations of cancer mutational signatures and has shown to be resource-efficient, flexible, and scalable [ 29 ].…”
Section: Introductionmentioning
confidence: 99%
“…We previously demonstrated the value of region set embeddings. We reason that the underlying region embeddings could be useful independently; for example, we can infer the function of an unknown region based on its closeness to other known regions in the embedding space, or use them for fast annotation and clustering on single-cell data (17). To develop such methods and concepts further, we first sought a way to evaluate region embeddings independently and objectively.…”
Section: Introductionmentioning
confidence: 99%