Modeling discrete phenotypic traits for either ancestral character state reconstruction or morphologybased phylogenetic inference suffers from ambiguities of character coding, homology assessment, dependencies, and selection of adequate models. These drawbacks occur because trait evolution is driven by two key processes -hierarchical and hidden -which are not accommodated simultaneously by the available phylogenetic methods. The hierarchical process refers to the dependencies between anatomical body parts, while the hidden process refers to the evolution of gene regulatory networks underlying trait development. Herein, I demonstrate that these processes can be efficiently modeled using structured Markov models equipped with hidden states, which resolves the majority of the problems associated with discrete traits. Integration of structured Markov models with anatomy ontologies can adequately incorporate the hierarchical dependencies, while the use of the hidden states accommodates hidden evolution of gene regulatory networks and substitution rate heterogeneity. I assess the new models using simulations and theoretical synthesis. The new approach solves the long-standing tail color problem (that aims at coding tail when it is absent) and presents a previously unknown issue called the "two-scientist paradox". The latter issue refers to the confounding nature of the coding of a trait and the hidden processes driving the trait's evolution; failing to account for the hidden process may result in a bias, which can be avoided by using hidden state models. All this provides a clear guideline for coding traits into characters. This paper gives practical examples of using the new framework for phylogenetic inference and comparative analysis.
KEY WORDS:discrete trait, character, morphology, homology, anatomy ontology, structured Markov models, hidden Markov models, lumpability, gene regulatory networks Understanding the processes driving trait evolution is crucial for explaining evolutionary radiations (Price et al. 2010;Van Bocxlaer et al. 2010;Tobias et al. 2014), the origin of complexity and novelty (Moczek 2008;Ramirez and Michalik 2014), and for inferring phylogenies. For many of these analyses, we need to (1) discretize the trait (delimit the trait within a phenotype), (2) assess its primary homology (similarity), and finally (3) encode the trait (observations) into a character string or vector (see the definitions in Box 1, section A). This procedure, called character construction (Wiens 2001), is a basic stage of any analysis, and has a profound influence on all downstream stages. Despite the plethora of inference frameworks -be it parsimony, maximum likelihood or a Bayesian framework [reviewed in O'Meara (2012)] -the lack of repeatable and agreed-upon approaches for character construction generates considerable ambiguity. As a result, different hypotheses of discretization and different ways of coding the same hypothesis into a character may be proposed for the same trait (Hawkins et al. 1997;Strong and Lipscomb 19...