Large Language Models (LLMs) have made significant strides in both industrial and scientific domains. In this paper, we evaluate the performance of LLMs in single-cell sequencing data analysis through comprehensive experiments across eight downstream tasks pertinent to single-cell data. By comparing seven different single-cell LLMs with task-specific methods, we found that single-cell LLMs may not consistently excel in all tasks than task-specific methods. However, the emergent abilities and the successful applications of cross-species/cross-modality transfer learning of LLMs are promising. In addition, we present a systematic evaluation of the effects of hyper-parameters, initial settings, and stability for training single-cell LLMs based on a proposedscEvalframework, and provide guidelines for pre-training and fine-tuning. Our work summarizes the current state of single-cell LLMs, and points to their constraints and avenues for future developments.