The MODELTEST package, including the source code and some documentation is available at http://bioag.byu. edu/zoology/crandall_lab/modeltest.html.
The statistical selection of best-fit models of nucleotide substitution is routine in the phylogenetic analysis of DNA sequence alignments. The programs ModelTest 1 and jModelTest 2 are very popular tools to accomplish this task, with thousands of users and citations. The latter uses PhyML 3 to obtain maximum likelihood estimates of model parameters, and implements different statistical criteria to select among 88 models of nucleotide substitution, including hierarchical and dynamical likelihood ratio tests, Akaike's and Bayesian information criteria (AIC and BIC) and a performance-based decision theory method (see ref. 4 ). jModelTest also provides estimates of model selection uncertainty, parameter importances and model-averaged parameter estimates, including model-averaged phylogenies 4 .However, in recent years the advent of NGS technologies has changed the field, and most researchers are now moving from phylogenetics to phylogenomics, where large sequence alignments typically include hundreds or thousands of loci. Phylogenetic resources therefore need to be adapted to a High Performance Computing (HPC) paradigm, allowing demanding analyses at the genomic level. Here we introduce jModelTest 2, which incorporates more models, new heuristics, efficient technical optimizations and multithreaded and MPI-based implementations for statistical model selection. jModelTest 2 includes several important new features (Supplementary Table 1). We have expanded the set of candidate models from 88 to 1624, resulting from the consideration of the 203 different partitions of the 4 ×4 nucleotide substitution rate matrix (R-matrix) combined with rate variation among sites and equal/unequal base frequencies. Indeed, likelihood computations for a large number of models or for large data sets can be extremely time-consuming, so we have also implemented two different heuristics for the selection of the best-fit model. The first one is a greedy hill-climbing hierarchical clustering that searches the set of 1624 models optimizing at most 288 models (Supplementary Note 1) with almost the same accuracy as an exhaustive search. The second is a heuristic filtering dposada@uvigo.es
jModelTest is a new program for the statistical selection of models of nucleotide substitution based on "Phyml" (Guindon and Gascuel 2003. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol. 52:696-704.). It implements 5 different selection strategies, including "hierarchical and dynamical likelihood ratio tests," the "Akaike information criterion," the "Bayesian information criterion," and a "decision-theoretic performance-based" approach. This program also calculates the relative importance and model-averaged estimates of substitution parameters, including a model-averaged estimate of the phylogeny. jModelTest is written in Java and runs under Mac OSX, Windows, and Unix systems with a Java Runtime Environment installed. The program, including documentation, can be freely downloaded from the software section at http://darwin.uvigo.es.
Model selection is a topic of special relevance in molecular phylogenetics that affects many, if not all, stages of phylogenetic inference. Here we discuss some fundamental concepts and techniques of model selection in the context of phylogenetics. We start by reviewing different aspects of the selection of substitution models in phylogenetics from a theoretical, philosophical and practical point of view, and summarize this comparison in table format. We argue that the most commonly implemented model selection approach, the hierarchical likelihood ratio test, is not the optimal strategy for model selection in phylogenetics, and that approaches like the Akaike Information Criterion (AIC) and Bayesian methods offer important advantages. In particular, the latter two methods are able to simultaneously compare multiple nested or nonnested models, assess model selection uncertainty, and allow for the estimation of phylogenies and model parameters using all available models (model-averaged inference or multimodel inference). We also describe how the relative importance of the different parameters included in substitution models can be depicted. To illustrate some of these points, we have applied AIC-based model averaging to 37 mitochondrial DNA sequences from the subgenus Ohomopterus(genus Carabus) ground beetles described by Sota and Vogler (2001).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.