2023
DOI: 10.1101/2023.04.30.538439
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

scGPT: Towards Building a Foundation Model for Single-Cell Multi-omics Using Generative AI

Abstract: Generative pre-trained models have achieved remarkable success in various domains such as natural language processing and computer vision. Specifically, the combination of large-scale diverse datasets and pre-trained transformers has emerged as a promising approach for developing foundation models. While texts are made up of words, cells can be characterized by genes. This analogy inspires us to explore the potential of foundation models for cell and gene biology. By leveraging the exponentially growing single… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
98
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 111 publications
(100 citation statements)
references
References 64 publications
2
98
0
Order By: Relevance
“…AvgBIO is the arithmetic mean of three individual metrics: ASW, Normalized Mutual Information (NMI), and Adjusted Rand Index (ARI), as defined in [6]. NMI and ARI are calculated based on Louvain clusters generated directly from the embedding space [22, 6]. AvgBIO is normalized to a 0-1 scale, with higher values indicating better alignment between clusters and ground truth labels.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…AvgBIO is the arithmetic mean of three individual metrics: ASW, Normalized Mutual Information (NMI), and Adjusted Rand Index (ARI), as defined in [6]. NMI and ARI are calculated based on Louvain clusters generated directly from the embedding space [22, 6]. AvgBIO is normalized to a 0-1 scale, with higher values indicating better alignment between clusters and ground truth labels.…”
Section: Methodsmentioning
confidence: 99%
“…To evaluate the performance of scGPT in its pretraining objective, we used the mean squared error (MSE), as used by the authors for the model’s loss [6]. To evaluate Geneformer’s performance in its pretraining objective, we measured the Pearson’s correlation between the true and predicted ranked lists.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Similar transfer learning approaches have been used to successfully predict the effect of novel drug-induced perturbations in cancer cell lines from previously learned latent embeddings. 173,174 Recently, several models trained on a large corpus of scRNA-seq data (eg, single-cell Generative Pre-Trained Transformer [scGPT], 175 single-cell bidirectional encoder representations from transformers [scBERT], 176 Geneformer 177 ), have been put forward as generalist base models to be fine-tuned by users with smaller, more targeted datasets. The goal of such resources is to make transfer learning both more robust and easily accessible to the research community.…”
Section: Unsupervised ML Approachesmentioning
confidence: 99%
“…To keep up with the pace at which AI is advancing, rather than wait for large uniform datasets to be created, researchers should focus on developing novel, composite methods for large heterogeneous datasets, integrated from different sources, such as those recently developed in the single-cell genomics field. [175][176][177]37,38 Focusing on developing methods that do not rely on high-dimensional uniform data will ensure experimental research into neurodegenerative disease advances alongside AI.…”
Section: Future Applicationsmentioning
confidence: 99%