Model-based approaches to species delimitation are constrained both by 22 computational capacities as well as by algorithmic assumptions that are frequently violated when 23 applied to biologically complex systems. An alternate approach, demonstrated herein, employs 24 machine learning (=ML) approaches from which species limits are derived without an explicit 25 definition of an underlying species model. By doing so, we demonstrate the capacity of these 26 approaches to designate phylogenomically and biologically relevant groups, using North 27 American box turtles (Terrapene spp.) as an example. Several different ML-based and traditional 28 species delimitation algorithms were invoked to parse a large SNP dataset derived from ddRAD 29 sequencing. Our results illuminate two major findings. First, more traditional model-based 30 approaches perform poorly, a likely reflection of systematic biases inherent in their formulation.
31Multispecies coalescent methods consistently over-split Terrapene, particularly given prior 32 evidence and our own phylogenetic results. Second, results from ML and clustering algorithms 33 consistently reiterated the presence of clades that were well-supported in prior species tree 34 analyses. In summary, we highlight both the strengths and limitations of ML algorithms, and in 35 doing so, explore appropriate approaches to data manipulation and model fit. Our study was 36 accomplished within the context of a well-characterized empirical system that allowed a direct 37 contrast between ML versus traditional approaches. It allowed the utility of ML-methods to be 38 underscored for species delimitation and serves as a study case from which guidelines implicit to 39 ML methods could be applied to other study systems. 40 41