2022
DOI: 10.1186/s13059-021-02583-w
|View full text |Cite
|
Sign up to set email alerts
|

CellPhy: accurate and fast probabilistic inference of single-cell phylogenies from scDNA-seq data

Abstract: We introduce CellPhy, a maximum likelihood framework for inferring phylogenetic trees from somatic single-cell single-nucleotide variants. CellPhy leverages a finite-site Markov genotype model with 16 diploid states and considers amplification error and allelic dropout. We implement CellPhy into RAxML-NG, a widely used phylogenetic inference package that provides statistical confidence measurements and scales well on large datasets with hundreds or thousands of cells. Comprehensive simulations suggest that Cel… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
44
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
8
1

Relationship

1
8

Authors

Journals

citations
Cited by 45 publications
(44 citation statements)
references
References 101 publications
0
44
0
Order By: Relevance
“…Our finding that copy number profiles broadly support the TE-based phylogeny of S2 sublines suggests that the major Clades A and B we identify are not artifacts of our approach, and complements recent results showing that different types of genetic variation (SNP, TE, and local duplications) generate similar clustering of independently derived Drosophila cell line genomes ( Lewerentz et al 2022 ). Nevertheless, future work using other sources of genetic variation is worthwhile to cross-validate and resolve remaining uncertainties in the TE-based phylogeny of S2 sublines presented here, perhaps using extensions to methods developed for the analysis of single cell phylogenies ( Kozlov et al 2022 ).…”
Section: Discussionmentioning
confidence: 99%
“…Our finding that copy number profiles broadly support the TE-based phylogeny of S2 sublines suggests that the major Clades A and B we identify are not artifacts of our approach, and complements recent results showing that different types of genetic variation (SNP, TE, and local duplications) generate similar clustering of independently derived Drosophila cell line genomes ( Lewerentz et al 2022 ). Nevertheless, future work using other sources of genetic variation is worthwhile to cross-validate and resolve remaining uncertainties in the TE-based phylogeny of S2 sublines presented here, perhaps using extensions to methods developed for the analysis of single cell phylogenies ( Kozlov et al 2022 ).…”
Section: Discussionmentioning
confidence: 99%
“…Our relaxation therefore only considers violations which should have an additional signal in the data beyond the infinite sites base model and typical sequencing noise. The relaxation is correspondingly more conservative than transition-based classical phylogenetic models adapted for SCS ( Kozlov et al , 2020 ; Zafar et al , 2017 ).…”
Section: Discussionmentioning
confidence: 99%
“…Binarized single-cell data have allowed us to test such assumptions, and find that it may be often violated in real tumour samples ( Kuipers et al , 2017b ). More complex phylogenetic models mitigating, or entirely avoiding the infinite sites assumption, have also been developed ( El-Kebir, 2018 ; Kozlov et al , 2020 ; Satas et al , 2020 ; Zafar et al , 2017 , 2019 ), though there is an apparent trade-off in model complexity between too simple models that cannot capture all relevant aspects of the evolutionary process, and too complex models that are prone to over-fitting or computationally too expensive to be learned efficiently from data. The existing models rely on processed data, where the mutations have already been called.…”
Section: Introductionmentioning
confidence: 99%
“…Several recent studies have proposed various mathematical methods to infer mutation order (Fig 1C -1E) from data arising from somatic mutations (i.e., [16,18,[22][23][24][25]). Among these, we focus specifically on the methods of Jahn et al [22], Zafar et al [16], and El-Kebir [18], called SCITE, SiFit and SPhyR, respectively, as these methods use single-cell data for inference of the order in which mutations arise along a phylogeny as part of their estimation procedure.…”
Section: Plos Computational Biologymentioning
confidence: 99%