Distance Measures for Tumor Evolutionary Trees

13Mutation trees are rooted trees of arbitrary node degree in which each node is labeled with a 14 mutation set. These trees, also referred to as clonal trees, are used in computational oncology to 15 represent the mutational history of tumours. Classical tree metrics such as the popular Foulds distance are of limited use for the comparison of mutation trees. One reason is that mutation 17 trees inferred with different methods or for different patients usually contain different sets of mutation 18 labels. Here, we generalize the Robinson-Foulds distance into a set of distance metrics called Bourque 19 distances for comparing mutation trees. A connection between the Robinson-Foulds distance and 20 the nearest neighbor interchange distance is also presented. 21 2012 ACM Subject Classification 22 Robinson-Foulds distance, Bourque distance 24 Digital Object Identifier 10.4230/LIPIcs... 25 31Robinson-Foulds (RF) [3, 35, 36], nearest-neighbor interchange (NNI) [31, 35] and triple(t) 32 distances [7] for phylogenetic trees; gene duplication, gene loss and reconciliation costs [17, 27] 33 for gene and species trees; and the tree-edit distances [41, 37, 44] for tree models of secondary 34 RNA structures, etc. [2, 21, 26, 32, 42] 35With advances in next-generation sequencing and single-cell sequencing technologies, a 36 huge amount of genomic data is now available for identifying tumour subclones and inferring 37 their evolutionary relationships. The most common representation of these relationships are 38 mutation trees, also known as clonal trees, which encode the (partial) temporal order in which 39 mutations were acquired. Formally, a mutation tree on a finite set of mutations Γ is a rooted 40 tree T with k nodes and a partition of Γ into k disjoint non-empty parts P i so that each P i 41 is assigned as the label of a node of T [16, 33]. A large number of computational approaches 42 for reconstructing mutation trees from bulk sequencing data [9, 11, 12, 28, 34], single-cell 43 sequencing data [5, 14, 19, 43], or a combination of both [29, 30] have been developed over 44 the last years. Unlike phylogenetic trees, mutation trees inferred with these methods will 45 not only differ in their topology but may also be defined on different sets of mutations. The 46 latter happens in the comparison of methods using different data (e. g. single-cell vs. bulk) 47 or divergent criteria for mutation calling. For that reason, classical tree distance measures 48 are not immediately applicable to mutation trees. Instead novel measures have recently 49 been developed [1, 4, 6, 10, 18, 20], but no standard approach for mutation tree comparison 50 has yet emerged. Instead, shortcomings of some of these measures such as the inability to 51 resolve major differences between trees have recently been demonstrated [6]. Additionally, 52 computing the distances between two mutation trees takes quadratic time for each of these 53 measures. 54Here, we generalize the Robinson-Foulds metric, a classic distance measure for unrooted ...

show abstract

Simpler and Faster Development of Tumor Phylogeny Pipelines

Ali

Ciccolella

Lucarella

et al. 2021

Preprint

View full text Add to dashboard Cite

In the recent years there has been an increasing amount of single-cell sequencing (SCS) studies, producing a considerable number of new datasets. This has particularly affected the field of cancer analysis, where more and more papers are published using this sequencing technique that allows for capturing more detailed information regarding the specific genetic mutations on each individually sampled cell. As the amount of information increases, it is necessary to have more sophisticated and rapid tools for analyzing the samples. To this goal we developed *plastic*, an easy-to-use and quick to adapt pipeline that integrates three different steps: (1) to simplify the input data; (2) to infer tumor phylogenies; and (3) to compare the phylogenies. We have created a pipeline submodule for each of those steps, and developed new in-memory data structures that allow for easy and transparent sharing of the information across the tools implementing the above steps. While we use existing open source tools for those steps, we have extended the tool used for simplifying the input data, incorporating two machine learning procedures --- which greatly reduce the running time without affecting the quality of the downstream analysis. Moreover, we have introduced the capability of producing some plots to quickly visualize results.

show abstract

Distance Measures for Tumor Evolutionary Trees

Cited by 3 publications

References 37 publications

Multiregion Sequence Analysis to Predict Intratumor Heterogeneity and Clonal Evolution

Multiregion Sequence Analysis to Predict Intratumor Heterogeneity and Clonal Evolution

The Bourque Distances for Mutation Trees of Cancers

Simpler and Faster Development of Tumor Phylogeny Pipelines

Contact Info

Product

Resources

About