We introduce TreeTop, an algorithm for single-cell data analysis to identify and assess statistical significance of branch points in biological processes with possibly multi-level branching hierarchies. We demonstrate branch point identification for processes with varying topologies, including T cell maturation, B cell differentiation and hematopoiesis. Our analyses are consistent with recent experimental studies suggesting a shallow hierarchy of differentiation events in hematopoiesis, rather than the classical multi-level hierarchy.
MainMany important biological processes, such as differentiation in developmental and immune biology, and clonal evolution in cancer, can be conceived of as bi-or multi-furcated cellular state trajectories. Hematopoiesis is such a process, where hematopoietic stem cells (HSCs) give rise to multiple distinct mature blood cell types via a sequence of lineage commitments. The exact sequence is still debated 1 , either assuming a hierarchical architecture of multiple fate decisions via distinct oligopotent progenitor cell states 2-4 , or a flat hierarchy of hematopoiesis, which does not include oligopotent progenitors, and where HSCs differentiate directly into committed lineages 5-7 .High dimensional single-cell technologies, such as single-cell RNA sequencing 8 and mass cytometry 9 , constitute increasingly widely used tools to investigate such opposing models of differentiation, and other branching processes. These technologies allow the evaluation of the state of single cells, i.e. the transcriptional or proteomic abundance profile in the case of single cell RNA sequencing or mass cytometry, respectively. Biological processes can be conceived of as trajectories through state space: temporal sequences of cellular states that can either be derived from time series or reconstructed from non-time series single-cell data 10 . We define a branch point as the location in state space where three or more distinct . CC-BY-NC-ND 4.0 International license not peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was . http://dx.doi.org/10.1101/200923 doi: bioRxiv preprint first posted online Oct. 10, 2017; cellular state trajectories meet. Branch points dissect these trajectories into distinct state trajectory branches.Identifying branch points is challenging because for each single cell measurement, both branch membership and ordering within each branch must be learned simultaneously. Existing approaches include pseudotime ordering, which learns a latent time variable along a mean trajectory through state space is limited to non-branching processes [11][12] . SPADE overcomes this limitation by fitting a single minimum spanning tree to non-deterministically clustered data 13 . Monocle 11,14 fits smoothed trees to a low dimensional representation of single cell data, where branch points in the tree are assumed to correspond to branch points in the data. Both Monocle and SPADE by definition impose a tree topology, regardless ...