Robust characterization of cellular phenotypes from single-cell gene expression data is of paramount importance in understanding complex biological systems and diseases. Single-cell RNA-seq (scRNA-seq) datasets are inherently noisy due to small amounts of starting RNA. Over the last few years, several methods have been developed to make single-cell analysis fast and efficient. Most of these methods are based on statistical and machine learning principles. In the current work, we describe SCellBOW, which encodes single-cell expression vectors as documents, thereby enabling the application of powerful language models. Beyond the identification of robust cell type clusters, our algorithm provides a latent representation of single-cells in a manner that captures the ‘semantics’ associated with cellular phenotypes. These representations,akaembeddings, allow algebraic operations such as ‘+’ and ‘-’. We use this hitherto unexplored utility to stratify cancer clones in terms of their aggressiveness and contribution to disease prognosis. Further, the application of SCellBOW to a scRNA-seq dataset comprising human splenocytes and matched peripheral blood mononuclear cells (∼5000 cells) identifies unknown cell states that bear significance in advancing our understanding of spleen biology.