2023
DOI: 10.1101/2023.09.08.555192
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Evaluating the Utilities of Foundation Models in Single-cell Data Analysis

Tianyu Liu,
Kexing Li,
Yuge Wang
et al.

Abstract: Large Language Models (LLMs) have made significant strides in both industrial and scientific domains. In this paper, we evaluate the performance of LLMs in single-cell sequencing data analysis through comprehensive experiments across eight downstream tasks pertinent to single-cell data. By comparing seven different single-cell LLMs with task-specific methods, we found that single-cell LLMs may not consistently excel in all tasks than task-specific methods. However, the emergent abilities and the successful app… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

1
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 13 publications
(2 citation statements)
references
References 149 publications
1
1
0
Order By: Relevance
“…JOINTLY performs on par with state-of-the-art batch integration tools, such as scVI 3 and Harmony 2 , in clustering tasks and has a similar trade-off between biological heterogeneity and batch mixing as scVI. In line with a recent benchmark 47 , we found that JOINTLY and several task-specific models, outperformed scGPT, a foundational single-cell RNA-sequencing model. As a future perspective, we envision that the performance of JOINTLY can be even further improved by initialising the algorithm using cell type labels.…”
Section: Discussionsupporting
confidence: 85%
“…JOINTLY performs on par with state-of-the-art batch integration tools, such as scVI 3 and Harmony 2 , in clustering tasks and has a similar trade-off between biological heterogeneity and batch mixing as scVI. In line with a recent benchmark 47 , we found that JOINTLY and several task-specific models, outperformed scGPT, a foundational single-cell RNA-sequencing model. As a future perspective, we envision that the performance of JOINTLY can be even further improved by initialising the algorithm using cell type labels.…”
Section: Discussionsupporting
confidence: 85%
“…The diversity and complexity of these tasks are useful to thoroughly probe the model's performance and to evaluate the robustness of the learned representation and the model's ability to generalize to complex predictive tasks. Current results are promising but not entirely replicated in independent benchmarks [45][46][47][48][49][50] . Notably, to date, none of these models account for spatial relationships of cells during training, with the exception of CellPLM 40 , which, however, is trained on a limited dataset of 9 million dissociated and 2 million spatial transcriptomics cells 40 and not fine-tuned on spatial tasks beyond gene imputation.We propose Nicheformer, a novel spatial omics foundation model to understand tissue dependencies.…”
mentioning
confidence: 99%