Highlights d Automatically assign cell types based on an expression matrix and a marker file d Astir is robust to imbalances in cell type compositions and cell segmentations d Output probabilities are interpretable and correlate with image staining quality
The creation of scalable single-cell and highly-multiplexed imaging technologies that profile the protein expression and phosphorylation status of heterogeneous cellular populations has led to multiple insights into disease processes including cancer initiation and progression. A major analytical challenge in interpreting the resulting data is the assignment of cells to a priori known cell types in a robust and interpretable manner. Existing approaches typically solve this by clustering cells followed by manual annotation of individual clusters or by strategies that gate protein expression at predefined thresholds. However, these often require several subjective analysis choices such as selecting the number of clusters and do not automatically assign cell types in line with prior biological knowledge. They further lack the ability to explicitly assign cells to an unknown or uncharacterized type, which exist in most highly multiplexed imaging experiments due to the limited number of markers quantified. To address these issues we present Astir, a probabilistic model to assign cells to cell types by integrating prior knowledge of marker proteins. Astir uses deep recognition neural networks for fast Bayesian inference, allowing for cell type annotations at the million-cell scale and in the absence of previously annotated reference data across multiple experimental modalities and antibody panels. We demonstrate that Astir outperforms existing approaches in terms of accuracy and robustness by applying it to over 2.1 million single cells from several suspension and imaging mass cytometry and microscopy datasets in multiple tissue contexts. We further showcase that Astir can be used for the fast analysis of the spatial architecture of the tumour microenvironment, automatically quantifying the immune influx and spatial heterogeneity of patient samples. Astir is freely available as an open source Python package at https://www.github.com/camlab-bioml/astir.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.