2020
DOI: 10.1101/2020.06.08.140384
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

ClipKIT: a multiple sequence alignment-trimming algorithm for accurate phylogenomic inference

Abstract: Highly divergent sites in multiple sequence alignments, which stem from erroneous inference of homology and saturation of substitutions, are thought to negatively impact phylogenetic inference. Trimming methods aim to remove these sites before phylogenetic inference, but recent analysis suggests that doing so can worsen inference. We introduce ClipKIT, a trimming method that instead aims to retain phylogenetically-informative sites; phylogenetic inference using ClipKIT-trimmed alignments is accurate, robust, a… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3

Relationship

3
4

Authors

Journals

citations
Cited by 25 publications
(12 citation statements)
references
References 22 publications
0
12
0
Order By: Relevance
“…First, the rpS3 sequences were aligned with MAFFT linsi v7.453 ( Katoh and Standley, 2013 ) and for poorly aligning sequences we performed homology searches with BLASTp against NCBI’s nr ( Altschul et al, 1990 ) to confirm their origin (e.g., eukaryotic, misannotations, potential misassemblies) and remove them. The remaining sequences were fused with the Hug dataset, realigned, and trimmed with ClipKIT (mode: kpic-gappy) ( Steenwyk et al, 2020 ). A Maximum-Likelihood phylogeny was reconstructed in IQ-TREE 2 ( Minh et al, 2020 ), under a model selected with ModelFinder ( Kalyaanamoorthy et al, 2017 ), and branch supports calculated with 1000 ultrafast bootstraps ( Hoang et al, 2018 ), 1000 SH-aLRT replicates ( Guindon et al, 2010 ), and aBayes ( Anisimova et al, 2011 ).…”
Section: Methodsmentioning
confidence: 99%
“…First, the rpS3 sequences were aligned with MAFFT linsi v7.453 ( Katoh and Standley, 2013 ) and for poorly aligning sequences we performed homology searches with BLASTp against NCBI’s nr ( Altschul et al, 1990 ) to confirm their origin (e.g., eukaryotic, misannotations, potential misassemblies) and remove them. The remaining sequences were fused with the Hug dataset, realigned, and trimmed with ClipKIT (mode: kpic-gappy) ( Steenwyk et al, 2020 ). A Maximum-Likelihood phylogeny was reconstructed in IQ-TREE 2 ( Minh et al, 2020 ), under a model selected with ModelFinder ( Kalyaanamoorthy et al, 2017 ), and branch supports calculated with 1000 ultrafast bootstraps ( Hoang et al, 2018 ), 1000 SH-aLRT replicates ( Guindon et al, 2010 ), and aBayes ( Anisimova et al, 2011 ).…”
Section: Methodsmentioning
confidence: 99%
“…Specifically, the following parameters were used: --op 1.0 --maxiterate 1000 --retree 1 --genafpair. The resulting alignment was trimmed using ClipKIT, v0.1 (Steenwyk et al, 2020b), with default ‘gappy’ mode. The trimmed alignment was then used to infer the evolutionary history of tef1 sequences using IQ-TREE2 (Minh et al, 2020).…”
Section: Methodsmentioning
confidence: 99%
“…Next, nucleotide sequences were threaded onto the protein alignments using function thread_dna in PhyKIT, v0.0.1 (Steenwyk et al, 2020a). The resulting codon-based alignments were then trimmed using ClipKIT, v0.1 (Steenwyk et al, 2020b), using the gappy mode. The resulting aligned and trimmed alignments were then concatenated into a single matrix with 7,133,367 sites using the PhyKIT function create_concat.…”
Section: Maximum Likelihood Molecular Phylogeneticsmentioning
confidence: 99%
“…From these phylogenies we isolated the sequences forming well-supported branches corresponding to the ones in (Lombard, 2016), albeit with increased diversity plus some additional clades of potential interest. The individual datasets (archaeal, bacterial, eukaryotic) were then fused into single Alg13/EpsF-like and Alg14/EpsE-like datasets that were aligned as above and trimmed with ClipKIT (kpigappy) (Steenwyk et al, 2020). Then phylogenies were inferred with IQ-TREE 2.0.5 (Minh et al, 2020) under the model selected by ModelFinder (Kalyaanamoorthy et al, 2017), and ultrafast bootstrap (Hoang et al, 2018), aLRT SH-like (Guindon et al, 2010), and approximate Bayesian (Anisimova et al, 2011) branch support tests (-m MFP -bb 1000 -alrt 1000 -abayes).…”
Section: Phylogenetic Analysismentioning
confidence: 99%