Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees

Rambaut, Andrew; Grassly, Nicholas C.

doi:10.1093/bioinformatics/13.3.235

Cited by 1,281 publications

(1,062 citation statements)

References 26 publications

Supporting

Mentioning

1,061

Contrasting

Unclassified

Order By: Relevance

“…Parametric replicate data sets were generated using Seq-gen (Rambaut and Grassly 1997) and analyzed sequentially by PAUP* to give the log likelihood ratio statistic under various scenarios. Results were tabulated using a Cprogram written by R.O.…”

Section: Methodsmentioning

confidence: 99%

Measuring Fit of Sequence Data to Phylogenetic Model: Gain of Power Using Marginal Tests

2009

View full text Add to dashboard Cite

field of phylogenetics rarely do this. Such analyses discard fundamental aspects of science as prescribed by Karl Popper. Indeed, not without cause, Popper (1978) once argued that evolutionary biology was unscientific as its hypotheses were untestable. Here we trace developments in assessing fit from Penny et al. (1982) to the present. We compare the general log-likelihood ratio (the G or G 2 statistic) statistic between the evolutionary tree model and the multinomial model with that of marginalized tests applied to an alignment (using placental mammal coding sequence data). It is seen that the most general test does not reject the fit of data to model (p~0.5), but the marginalized tests do. Tests on pair-wise frequency (F) matrices, strongly (p < 0.001) reject the most general phylogenetic (GTR) models commonly in use. It is also clear (p < 0.01) that the sequences are not stationary in their nucleotide composition.Deviations from stationarity and homogeneity seem to be unevenly distributed amongst taxa; not necessarily those expected from examining other regions of the genome. By marginalizing the 4 t patterns of the i.i.d. model to observed and expected parsimony counts, that is, from constant sites, to singletons, to parsimony informative characters of a minimum possible length, then the likelihood ratio test regains power, and it too rejects the evolutionary model with p << 0.001.Given such behavior over relatively recent evolutionary time, readers in general should maintain a healthy skepticism of results, as the scale of the systematic errors in published analyses may really be far larger than the analytical methods (e.g., bootstrap) report.Keywords: Fit of sequence data to evolutionary model, base composition stationarity, placental / eutherian mammals.Waddell, Ota and Penny

show abstract

Section: Methodsmentioning

confidence: 99%

Measuring Fit of Sequence Data to Phylogenetic Model: Gain of Power Using Marginal Tests

2009

View full text Add to dashboard Cite

show abstract

“…Each genealogy represents a chromosomal segment of 5cM. Nucleotide sequences are then generated under Kimura's two-parameter model (Nei 1987) using the program seq-gen (Rambaut and Grassly 1997). Thus, each genealogy gives rise to a cluster of linked SNPs.…”

Section: Simulation 4: Effects Of Ld and Genetic Drift All Simulatiomentioning

confidence: 99%

Estimation of individual admixture: Analytical and study design considerations

Tang

Peng

Wang

et al. 2005

Genetic Epidemiology

631

642

View full text Add to dashboard Cite

The genome of an admixed individual represents a mixture of alleles from different ancestries.In the United States, the two largest minority groups, African Americans and Hispanics, are both admixed. An understanding of the admixture proportion at an individual level (individual admixture, or IA) is valuable for both population geneticists and epidemiologists who conduct case-control association studies in these groups. Here we present an extension of a previously described frequentist (maximum likelihood or ML) approach to estimate individual admixture that allows for uncertainty in ancestral allele frequencies. We compare this approach both to prior partial likelihood based methods as well as more recently described Bayesian MCMC methods.Our full ML method demonstrates increased robustness when compared to an existing partial ML approach. Simulations also suggest that this frequentist estimator achieves similar efficiency, measured by the mean squared error criterion, as Bayesian methods but requires just a tiny fraction of the computational time to produce point estimates, allowing for extensive analysis (e.g. simulations) not possible by Bayesian methods. Our simulation results demonstrate that inclusion of ancestral populations or their surrogates in the analysis is required by any method of IA estimation to obtain reasonable results.keywords: admixture, EM algorithm, maximum likelihood estimate.

show abstract

“…Heuristic searches (200 replicates of RAS, holding 10 trees per step and with TBR branch swapping, keeping 10 trees of score P 1) were carried out on the original dataset in PAUP * with and without each of the four genera Austrodanthonia, Joycea, Notodanthonia and Rytidosperma constrained to be monophyletic. Based on the most appropriate model of nucleotide sequence evolution, as estimated in Modeltest 3.7 (Posada and Crandall, 1998) using only DNA characters, 100 new datasets were simulated in SeqGen (Rambaut and Grassly, 1997) and constrained and unconstrained analyses were carried out on each of the new matrices as above. The resulting length differences (constrained sim -unconstrained sim ) were plotted as a frequency diagram and used as a null distribution of length differences, against which the length difference (constrained obs -unconstrained obs ) was assessed.…”

Section: Parsimony Analyses and Monophyly Testingmentioning

confidence: 99%

A plastid tree can bring order to the chaotic generic taxonomy of Rytidosperma Steud. s.l. (Poaceae)

Humphreys¹,

Pirie²,

Linder³

2010

Molecular Phylogenetics and Evolution

View full text Add to dashboard Cite

Rytidosperma s.l., wallaby grasses and allies, is in dire need of a single, unanimously accepted generic taxonomy. Motivated by the desire to establish a generic classification that complies with phylogeny, we investigated how much phylogenetic signal is contained within a plastid (cpDNA) tree, given that the nrDNA tree (ITS) was uninformative and that a phylogenetic hypothesis based on a single genome may not be reliable. We find that the plastid tree is significantly different from a morphological cladogram and show that this is the result of homoplasy in the morphological dataset. Treated individually, several morphological characters fit the plastid tree very well. Similarly, we find a good fit of the plastid tree with ecological and distribution characters and with biogeographical patterns in the Southern Hemisphere. We conclude that a significant level of the species phylogeny is resolved by the plastid tree and are confident it can form a sound basis for a reconsideration of generic limits. None of the currently recognised seven genera in the Rytidosperma clade is monophyletic. Therefore, we propose combining the segregate genera in Australasia within a broadly construed Rytidosperma, including all the species from Australia, New Guinea, New Zealand and South America.

show abstract

Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees

Cited by 1,281 publications

References 26 publications

Measuring Fit of Sequence Data to Phylogenetic Model: Gain of Power Using Marginal Tests

Measuring Fit of Sequence Data to Phylogenetic Model: Gain of Power Using Marginal Tests

Estimation of individual admixture: Analytical and study design considerations

A plastid tree can bring order to the chaotic generic taxonomy of Rytidosperma Steud. s.l. (Poaceae)

Contact Info

Product

Resources

About