Terraces in phylogenetic tree space are sets of trees with identical optimality scores for a 1 given data set, arising from missing data. These were first described for multilocus 2 phylogenetic data sets in the context of maximum parsimony inference and maximum 3 likelihood inference under certain model assumptions. Here we show how the mathematical 4 properties that lead to terraces extend to gene tree -species tree problems in which the 5 gene trees are incomplete. Inference of species trees from either sets of gene family trees 6 subject to duplication and loss, or allele trees subject to incomplete lineage sorting, can 7 exhibit terraces in their solution space. First, we show conditions that lead to a new kind 8 of terrace, which stems from subtree operations that appear in reconciliation problems for 9 incomplete trees. Then we characterize when terraces of both types can occur when the 10 optimality criterion for tree search is based on duplication, loss or deep coalescence scores.
11Finally, we examine the impact of assumptions about the causes of losses: whether they are 12 due to imperfect sampling or true evolutionary deletion. 13 14 15 2 SANDERSON ET AL.A long standing and still largely dominant paradigm in phylogenetic tree inference 16 is based on optimization of some score derived from data over candidate tree solutions. In 17 addition to familiar maximum parsimony, maximum likelihood, and (certain) 18 distance-based methods like minimum evolution and Fitch-Margoliash (Felsenstein 2004), 19all commonly used to infer a tree from a sequence alignment, optimization methods also 20 are employed in a diverse set of methods aimed at solving other tree inference problems, 21 such as supertree construction, gene tree reconciliation, species tree inference using 22 likelihood or pseudo-likelihood, and network reconstruction. Computational obstacles in 23 optimization include the problem of multiple optima and regions where the solution space 24 is flat, which can both impede algorithms to find optima and make circumscription of 25 solutions more complex. One contributor to this problem in phylogenetics is missing data, 26 and a particularly direct example of this is the phenomenon of "terraces"-regions of tree 27 space having identical optimality scores purely due to certain patterns of missing data 28 among the taxa sampled (Sanderson et al. 2011(Sanderson et al. , 2015.
29The properties of terraces have been elucidated mostly in the context of large 30 multilocus data sets, where the pattern of missing data can be described by the "taxon 31 coverage" of data-which loci are sampled for which taxa. If a tree is inferred for a 32 concatenated multilocus alignment by maximum parsimony, or by maximum likelihood 33 with certain model assumptions, the pattern of taxon coverage alone can be used to infer 34 the number and sizes of terraces having identical optimality scores. Surveys of empirical 35 studies indicate that terraces can be astronomically large in large trees (Dobrin et al. 36 2018), and they can...