9Large amount of single-cell RNA sequencing data produced by various technologies is 10 accumulating rapidly. An efficient cell querying method facilitates integrating existing data 11 and annotating new data. Here we present a novel cell querying method Cell BLAST based 12 on deep generative modeling, together with a well-curated reference database and a user-13 friendly Web interface at http://cblast.gao-lab.org, as an accurate and robust solution to large-14 scale cell querying. 15 65Additionally, we exploit stability of query-hit distance across multiple models to improve 66 specificity (Methods, Supplementary Figure 4L). An empirical p-value is computed for each 67 query hit as a measure of "confidence", by comparing posterior distance to the empirical 68 NULL distribution obtained from randomly selected pairs of cells in the queried database. 69
70The high specificity of Cell BLAST is especially important for discovering novel cell types. 71Two recent studies ("Montoro" 14 and "Plasschaert" 15 ) independently reported a rare tracheal 72 cell type named pulmonary ionocyte. We artificially removed ionocytes from the "Montoro" 73 dataset, and used it as reference to annotate query cells from the "Plasschaert" dataset. In 74 addition to accurately annotating 95.9% of query cells, Cell BLAST correctly rejects 12 out 75 of 19 "Plasschaert" ionocytes ( Figure 1E). Moreover, it highlights the existence of a putative 76 novel cell type as a well-defined cluster with large p-values among all 156 rejected cells, 77 which corresponds to ionocytes ( Figure 1F-G, Supplementary Figure 6A, also see 78Supplementary Figure 5 for more detailed analysis on the remaining 7 cells). In contrary, 79 scmap-cell 2 only rejected 7 "Plasschaert" ionocytes despite higher overall rejection number 80 of 401 (i.e. more false negatives, Supplementary Figure 6B-E).
106Besides batch effect among multiple reference datasets, bona fide biological similarity could 107 also be confounded by large, undesirable bias between query and reference data. Taking 108 advantage of the dedicated adversarial batch alignment component, we implemented a 109 particular "online tuning" mode to handle such often-neglected confounding factor. Briefly, 110 the combination of reference and query data is used to fine-tune the existing reference-based 111 model, with query-reference batch effect added as an additional component to be removed by 112 adversarial batch alignment (Method). Using this strategy, we successfully transferred cell 113 fate from the above "Tusi" dataset to an independent human hematopoietic progenitor dataset The authors thank Drs.