2023
DOI: 10.1101/2023.03.24.534055
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

xTrimoGene: An Efficient and Scalable Representation Learner for Single-Cell RNA-Seq Data

Abstract: The advances in high-throughput sequencing technology have led to significant progress in measuring gene expressions in single-cell level. The amount of publicly available single-cell RNA-seq (scRNA- seq) data is already surpassing 50M records for human with each record measuring 20,000 genes. This highlights the need for unsupervised representation learning to fully ingest these data, yet classical transformer architectures are prohibitive to train on such data in terms of both computation and memory. To addr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 12 publications
(9 citation statements)
references
References 43 publications
0
9
0
Order By: Relevance
“…First, we do not exhaustively test all available gene transcriptional profiling-based models. Because some models are not open-source [14] [36], directly converting certain foundation models into gene expression signatures is challenging [12] [11]. Taking various factors into consideration, we select scGPT [8], but this does not imply that other large-scale models perform worse.…”
Section: Discussionmentioning
confidence: 99%
“…First, we do not exhaustively test all available gene transcriptional profiling-based models. Because some models are not open-source [14] [36], directly converting certain foundation models into gene expression signatures is challenging [12] [11]. Taking various factors into consideration, we select scGPT [8], but this does not imply that other large-scale models perform worse.…”
Section: Discussionmentioning
confidence: 99%
“…We developed xTrimoGene, a scalable transformer-based model with both algorithmically efficient and engineering acceleration strategies 14 . It included an embedding module and an asymmetric encoder-decoder structure.…”
Section: The Scfoundation Pre-training Frameworkmentioning
confidence: 99%
“…pLMs have steadily gained traction across diverse applications for protein design, including antibody engineering 18 and drug discovery 19,20 . In addition, AI models trained on unlabeled single-cell RNA sequencing (scRNA-seq) data have been published and used for cell annotation purposes [21][22][23][24][25][26][27] . Thus, masking models have proven to substantially outperform previous conventional methods in effectiveness and show great potential in biomedical applications.…”
Section: Introductionmentioning
confidence: 99%