2023
DOI: 10.1101/2023.07.11.548509
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Simulations of sequence evolution: how (un)realistic they are and why

Abstract: Motivation: Simulating sequence evolution plays an important role in the development and evaluation of phylogenetic inference tools. Naturally, the simulated data needs to be as realistic as possible to be indicative of the performance of the developed tools on empirical data. Over the years, numerous phylogenetic sequence simulators, employing various models of evolution, have been published with the goal to simulate such empirical-like data. In this study, we simulated DNA and protein Multiple Sequence Align… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 53 publications
0
2
0
Order By: Relevance
“…Among the empirical datasets, 575 are DNA datasets and 150 are AA MSAs, whereas all simulated MSAs are DNA datasets, in order to reduce the CO 2 footprint. We sampled the empirical MSAs from the TreeBASE databse (Piel et al, 2009) and the simulated MSAs from the datasets used in a recent benchmark study conducted by our group and colleagues (Trost et al, 2024). The selected empirical and simulated MSAs capture the full spectrum of difficulty scores predicted by the Pythia tool (Haag et al, 2022).…”
Section: Resultsmentioning
confidence: 99%
“…Among the empirical datasets, 575 are DNA datasets and 150 are AA MSAs, whereas all simulated MSAs are DNA datasets, in order to reduce the CO 2 footprint. We sampled the empirical MSAs from the TreeBASE databse (Piel et al, 2009) and the simulated MSAs from the datasets used in a recent benchmark study conducted by our group and colleagues (Trost et al, 2024). The selected empirical and simulated MSAs capture the full spectrum of difficulty scores predicted by the Pythia tool (Haag et al, 2022).…”
Section: Resultsmentioning
confidence: 99%
“…the proportion of gaps or alternative measures/predictions of phylogenetic signal). To this end, simulating MSAs with specific attributes could constitute a way forward, albeit simulations still tend to be unrealistic ( Trost et al 2023 ). However, such a more thorough exploration should be carefully considered, as performing a vast amount of tree inferences is computationally expensive.…”
Section: Discussionmentioning
confidence: 99%
“…Trost et al [32] demonstrate, that machine learning algorithms can easily distinguish between simulated and empirical MSAs with high accuracy and conclude that sequence simulations do not fully capture all characteristics of empirical MSAs. Consequently, we exclusively use empirical MSAs to train EBG.…”
Section: Training Datamentioning
confidence: 99%