We propose a new space of phylogenetic trees which we call wald space. The motivation is to develop a space suitable for statistical analysis of phylogenies, but with a geometry based on more biologically principled assumptions than existing spaces: in wald space, trees are close if they induce similar distributions on genetic sequence data. As a point set, wald space contains the previously developed Billera–Holmes–Vogtmann (BHV) tree space; it also contains disconnected forests, like the edge-product (EP) space but without certain singularities of the EP space. We investigate two related geometries on wald space. The first is the geometry of the Fisher information metric of character distributions induced by the two-state symmetric Markov substitution process on each tree. Infinitesimally, the metric is proportional to the Kullback–Leibler divergence, or equivalently, as we show, to any f-divergence. The second geometry is obtained analogously but using a related continuous-valued Gaussian process on each tree, and it can be viewed as the trace metric of the affine-invariant metric for covariance matrices. We derive a gradient descent algorithm to project from the ambient space of covariance matrices to wald space. For both geometries we derive computational methods to compute geodesics in polynomial time and show numerically that the two information geometries (discrete and continuous) are very similar. In particular, geodesics are approximated extrinsically. Comparison with the BHV geometry shows that our canonical and biologically motivated space is substantially different.
Evolutionary relationships between species are represented by phylogenetic trees, but these relationships are subject to uncertainty due to the random nature of evolution. A geometry for the space of phylogenetic trees is necessary in order to properly quantify this uncertainty during the statistical analysis of collections of possible evolutionary trees inferred from biological data. Recently, the wald space has been introduced: a length space for trees which is a certain subset of the manifold of symmetric positive definite matrices. In this work, the wald space is introduced formally and its topology and structure is studied in detail. In particular, we show that wald space has the topology of a disjoint union of open cubes, it is contractible, and by careful characterization of cube boundaries, we demonstrate that wald space is a Whitney stratified space of type (A). Imposing the metric induced by the affine invariant metric on symmetric positive definite matrices, we prove that wald space is a geodesic Riemann stratified space. A new numerical method is proposed and investigated for construction of geodesics, computation of Fréchet means and calculation of curvature in wald space. This work is intended to serve as a mathematical foundation for further geometric and statistical research on this space.
Tag der mündlichen Prüfung: 04. Oktober 2022 v vi A promising construction that covers the example from above and behaves as biological intuition suggests is our recently introduced Wald Space, cf. Garba et al. (2021a). It is based on the characterization of phylogenetic trees as covariance matrices, which is a byproduct of a generalization of the popular biological substitution models that are used to calculate likelihoods for trees given genetic sequence data, so those substitution models are the backbone of phylogenetic tree estimation. In other words, the Wald Space is a space that is consistent with the tree estimation methods that are currently used, up to the generalizations that have been made. More details can be found in Garba et al. (2021a).In this work, we concentrate on the Wald Space purely from the perspective that it is a mathematical structure and thus we try to enable for a be er understanding of the Wald Space. To this end, we introduce the mathematical structures required: metric spaces, Riemannian manifolds, Riemann strati ed spaces, as well as, for our construction essential, various geometries on the manifold of strictly positive de nite symmetric real matrices. Furthermore, we introduce various possible ways to represent the phylogenetic trees and forests that we consider. Having nished the introduction of the more general and known concepts, we brie y introduce the BHV Space. en we de ne and describe the Wald Space, which is a topological strati ed space, and we investigate its topological features. is part is the core of the thesis. Finally, we equip the Wald Space with a geometry that can be chosen to some degree and nd that these spaces are then Riemann strati ed spaces of type (A). Last but not least, we propose some numerical algorithms to calculate geodesics and distances in the Wald Space equipped with a geometry.ose are not tested in this work, but to some extent in Lueg et al. (2021).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.