Several computational methods have recently been proposed for delimiting species using multilocus sequence data. Among them, the Bayesian method of Yang and Rannala uses the multispecies coalescent model in the likelihood framework to calculate the posterior probabilities for the different species-delimitation models. It has a sound statistical basis and is found to have nice statistical properties in simulation studies, such as low error rates of undersplitting and oversplitting. However, the method suffers from poor mixing of the reversible-jump Markov chain Monte Carlo (rjMCMC) algorithms. Here, we describe several modifications to the algorithms. We propose a flexible prior that allows the user to specify the probability that each node on the guide tree represents a true speciation event. We also introduce modifications to the rjMCMC algorithms that remove the constraint on the new species divergence time when splitting and alter the gene trees to remove incompatibilities. The new algorithms are found to improve mixing of the Markov chain for both simulated and empirical data sets.
SPECIES delimitation using genetic data has become a popular objective in recent years (Knowles and Carstens 2007). Several likelihood and Bayesian methods have been developed in the coalescent framework that accounts for lineage sorting and species-tree vs. gene-tree conflicts but they vary in the adequacy of their treatment of statistical uncertainty. The methods of O'Meara (2010) and Ence and Carstens (2011) attempt to infer both species trees and delimitations but assume that gene trees are known without error. O'Meara (2010) also uses several heuristics to avoid difficult numerical analyses-the statistical performance of his method is therefore not predictable from standard asymptotic theory. Most populations for which species delimitation will be applied will not be greatly differentiated and the gene trees will be very uncertain due to few mutations and a consequent lack of information in the sequence data. A recent Bayesian-delimitation method (Yang and Rannala 2010) averages over uncertainties in the gene trees and should perform better for such data. From a statistical perspective, species delimitation can be viewed as a model choice problem. Each possible delimitation corresponds to a distinct statistical model with parameters that are not strictly comparable. This is similar to the problem of phylogenetic inference in which parameters such as branch lengths have different interpretations in different topologies (see Yang 2006), but the delimitation problem is more complex because the number of parameters (model dimension) also changes across delimitations (Yang and Rannala 2010).The Bayesian method of Yang and Rannala (2010) uses reversible-jump Markov chain Monte Carlo (rjMCMC) to calculate the posterior probabilities of species delimitations, allowing for changes of dimension between models. The algorithms involve a split step (which increases the number of species by one) and a join step (which decreases th...