There are many metrics available to compare phylogenetic trees since this is a fundamental task in computational biology. In this paper, we focus on one such metric, the ∞ -cophenetic metric introduced by Cardona et al. This metric works by representing a phylogenetic tree with n labeled leaves as a point in R n(n+1)/2 known as the cophenetic vector, then comparing the two resulting Euclidean points using the ∞ distance. Meanwhile, the interleaving distance is a formal categorical construction generalized from the definition of Chazal et al., originally introduced to compare persistence modules arising from the field of topological data analysis. We show that the ∞ -cophenetic metric is an example of an interleaving distance. To do this, we define phylogenetic trees as a category of merge trees with some additional structure; namely labelings on the leaves plus a requirement that morphisms respect these labels. Then we can use the definition of a flow on this category to give an interleaving distance. Finally, we show that, because of the additional structure given by the categories defined, the map sending a labeled merge tree to the cophenetic vector is, in fact, an isometric embedding, thus proving that the ∞ -cophenetic metric is, in fact, an interleaving distance.
The interleaving distance was originally defined in the field of Topological Data Analysis (TDA) by Chazal et al. as a metric on the class of persistence modules parametrized over the real line. Bubenik et al. subsequently extended the definition to categories of functors on a poset, the objects in these categories being regarded as 'generalized persistence modules'. These metrics typically depend on the choice of a lax semigroup of endomorphisms of the poset. The purpose of the present paper is to develop a more general framework for the notion of interleaving distance using the theory of 'actegories'. Specifically, we extend the notion of interleaving distance to arbitrary categories equipped with a flow, i.e. a lax monoidal action by the monoid [0, ∞). In this way, the class of objects in such a category acquires the structure of a Lawvere metric space. Functors that are colax [0, ∞)-equivariant yield maps that are 1-Lipschitz. This leads to concise proofs of various known stability results from TDA, by considering appropriate colax [0, ∞)-equivariant functors. Along the way, we show that several common metrics, including the Hausdorff distance and the L ∞ -norm, can be realized as interleaving distances in this general perspective.
Inspired by the interval decomposition of persistence modules and the extended Newick format of phylogenetic networks, we show that, inside the larger category of ordered Reeb graphs, every Reeb graph with n leaves and first Betti number s, is equal to a coproduct of at most 2 s trees with (n + s) leaves. Reeb graphs are therefore classified up to isomorphism by their tree decomposition. An implication of this result, is that the isomorphism problem for Reeb graphs is fixed parameter tractable when the parameter is the first Betti number. We propose ordered Reeb graphs as a model for time consistent phylogenetic networks and propose a certain Hausdorff distance as a metric on these structures.
Background Matrix-assisted laser desorption/ionization mass spectrometry imaging (MALDI MSI) displays significant potential for applications in cancer research, especially in tumor typing and subtyping. Lung cancer is the primary cause of tumor-related deaths, where the most lethal entities are adenocarcinoma (ADC) and squamous cell carcinoma (SqCC). Distinguishing between these two common subtypes is crucial for therapy decisions and successful patient management. Results We propose a new algebraic topological framework, which obtains intrinsic information from MALDI data and transforms it to reflect topological persistence. Our framework offers two main advantages. Firstly, topological persistence aids in distinguishing the signal from noise. Secondly, it compresses the MALDI data, saving storage space and optimizes computational time for subsequent classification tasks. We present an algorithm that efficiently implements our topological framework, relying on a single tuning parameter. Afterwards, logistic regression and random forest classifiers are employed on the extracted persistence features, thereby accomplishing an automated tumor (sub-)typing process. To demonstrate the competitiveness of our proposed framework, we conduct experiments on a real-world MALDI dataset using cross-validation. Furthermore, we showcase the effectiveness of the single denoising parameter by evaluating its performance on synthetic MALDI images with varying levels of noise. Conclusion Our empirical experiments demonstrate that the proposed algebraic topological framework successfully captures and leverages the intrinsic spectral information from MALDI data, leading to competitive results in classifying lung cancer subtypes. Moreover, the framework’s ability to be fine-tuned for denoising highlights its versatility and potential for enhancing data analysis in MALDI applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.