We have developed a maximum likelihood framework called CellPhy for inferring phylogenetic trees from single-cell DNA sequencing (scDNA-seq) data, that can be directly applied to somatic cells and clones. CellPhy is based on a finite-site Markov nucleotide substitution model with 10 diploid states, akin to those typically used in statistical phylogenetics. It includes a dedicated error function for single cells that explicitly incorporates amplification/sequencing error and allelic dropout (ADO). Moreover, it can explicitly consider the uncertainty of the variant calling process by using genotype likelihoods as input. We implemented CellPhy in a widely used open-source phylogenetic inference package (RAxML-NG) that provides statistical confidence measurements on the estimated tree and scales particularly well on large phylogenies with hundreds or even thousands of cells. To benchmark CellPhy, we carried out 19,400 coalescent simulations of cell samples from exponentially-growing tumors for which the true phylogeny was known. We evolved single-cell diploid DNA genotypes along the simulated genealogies under different scenarios including infinite- and finite-sites nucleotide mutation models, trinucleotide mutational signatures, sequencing and amplification errors, allele dropouts, and doublet cells. Our simulations suggest that CellPhy is robust to amplification/sequencing errors and to ADO and that it outperforms the state-of-the-art methods under realistic scDNA-seq scenarios both in terms of accuracy and speed. In addition, we sequenced 24 single-cell whole genomes from a colorectal cancer, and together with three published scDNA-seq data sets, analyzed them to illustrate how CellPhy can provide more reliable biological insights than competing methods. CellPhy is freely available at https://github.com/amkozlov/cellphy.
How and when tumoral clones start spreading to surrounding and distant tissues is currently unclear. Here we leveraged a model-based evolutionary framework to investigate the demographic and biogeographic history of a colorectal cancer. Our analyses strongly support an early monoclonal metastatic colonization, followed by a rapid population expansion at both primary and secondary sites. Moreover, we infer a hematogenous metastatic spread under positive selection, plus the return of some tumoral cells from the liver back to the colon lymph nodes. This study illustrates how sophisticated techniques typical of organismal evolution can provide a detailed, quantitative picture of the complex tumoral dynamics over time and space.
Tumor samples most often comprise a mixture of different cell lineages. Multiregional trees built from bulk mutational profiles do not consider this heterogeneity and can potentially lead to erroneous evolutionary inferences, including biased timing of somatic mutations, spurious parallel mutation events, and/or incorrect chronological ordering of metastatic events.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.