2022
DOI: 10.1101/2022.01.31.478596
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Generative pretraining from large-scale transcriptomes: Implications for single-cell deciphering and clinical translation

Abstract: Exponential accumulation of single-cell transcriptomes poses great challenge for efficient assimilation. Here, we present an approach entitled tGPT towards integration of 22.3 million single-cell transcriptomes by modeling gene expression rankings as generative pretraining task. tGPT is conceptually simple in that it autoregressively models the ranking of a gene in the context of its preceding neighbors. We demonstrated the high performance of tGPT on a range of fundamental single-cell analysis tasks and novel… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
20
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(20 citation statements)
references
References 49 publications
0
20
0
Order By: Relevance
“…Despite these results, there have been few attempts to adopt the transformer architecture into single-cell biology and applications thereof. Shen et al (2022) use the transformer decoder setup to learn the gene name sequence of highly expressed genes, without considering the actual sequenced expression abundance. This leads to loss of major biological signal, as the expression values are informative of cell state and gene-gene relationships.…”
Section: Related Workmentioning
confidence: 99%
“…Despite these results, there have been few attempts to adopt the transformer architecture into single-cell biology and applications thereof. Shen et al (2022) use the transformer decoder setup to learn the gene name sequence of highly expressed genes, without considering the actual sequenced expression abundance. This leads to loss of major biological signal, as the expression values are informative of cell state and gene-gene relationships.…”
Section: Related Workmentioning
confidence: 99%
“…tGPT [34], GeneCompass [35], SCimilarity [36], UCE [37] and CellPLM [38]. Details of these models can be found in Appendices D and E. Furthermore, no studies to date have comprehensively evaluated the utility of these models and provided guidance for model training.…”
Section: Introductionmentioning
confidence: 99%
“…There is a limited number of robust pre-trained models (known as single-cell LLMs) capable of managing multiple tasks in single-cell research. Some single-cell LLMs focus on cell-type annotation or gene function prediction, including scBERT [28], tGPT [29], CellLM [30], and Geneformer [31], while others aim to create a foundation model in this area that can handle multiple tasks, including scGPT [32], scFoundation [33] and SCimilarity [34]. Details of these models can be found in Appendices D and E. Furthermore, no studies to date have comprehensively evaluated the utility of these models and provide guidance for model training.…”
Section: Introductionmentioning
confidence: 99%
“…These initial explorations have revealed a specific key challenge: the inability to transfer the insights gained by these models across different applications due to their distinct architectures. To overcome this, a wave of research initiatives [11,12,13,14,15] is underway, aiming to develop a foundational model that first deciphers latent information from unlabeled scRNA-seq data and then adapts this knowledge to various tasks. In scBERT [11], introduced by Yang and team in 2022, represents genes as tokens and employs an advanced transformer mechanism [16] to analyze over 16,000 gene tokens per cell.…”
Section: Introductionmentioning
confidence: 99%