2021
DOI: 10.1093/nargab/lqab075
|View full text |Cite|
|
Sign up to set email alerts
|

Gotree/Goalign: toolkit and Go API to facilitate the development of phylogenetic workflows

Abstract: Phylogenetics is nowadays at the center of numerous studies in many fields, ranging from comparative genomics to molecular epidemiology. However, phylogenetic analysis workflows are usually complex and difficult to implement, as they are often composed of many small, reccuring, but important data manipulations steps. Among these, we can find file reformatting, sequence renaming, tree re-rooting, tree comparison, bootstrap support computation, etc. These are often performed by custom scripts or by several heter… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
58
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
3

Relationship

0
10

Authors

Journals

citations
Cited by 90 publications
(58 citation statements)
references
References 31 publications
0
58
0
Order By: Relevance
“…The multiple sequence alignment was trimmed using trimAl version 1.3 [ 55 ] to remove any positions where greater than 25% of the sequences had gaps. Identical sequences were then removed from the alignment so that only one copy was kept using the dedup function of Goalign version 0.3.5 [ 67 ]. In total, 182 out of 509 sequences were removed during the deduplication process.…”
Section: Methodsmentioning
confidence: 99%
“…The multiple sequence alignment was trimmed using trimAl version 1.3 [ 55 ] to remove any positions where greater than 25% of the sequences had gaps. Identical sequences were then removed from the alignment so that only one copy was kept using the dedup function of Goalign version 0.3.5 [ 67 ]. In total, 182 out of 509 sequences were removed during the deduplication process.…”
Section: Methodsmentioning
confidence: 99%
“…Phylogenies were produced from the entire set of sequences in each wave, using a custom workflow for building fast SARS-CoV-2 trees available at github.com/MDU-PHL/kovid-trees-nf. Briefly, sequences were cleaned to remove sites with > 5% missing calls and de-duplicated with GOALIGN [20]. An approximate maximum-likelihood tree was built using FastTree [21] and the branch lengths optimised with RAxML-NG [22].…”
Section: Methodsmentioning
confidence: 99%
“…To generate a global alignment of all sequences, reads and shredded assemblies were mapped to reference genome USA300 FPR3757 (assembly accession: GCF_000013465.1) (using Snippy, v4.6.0 (https://github.com/tseemann/snippy). The core genome alignment was obtained using Snippy; sites with > 10% gaps were removed using Goalign (102) and constant sites were removed using SNP-sites (103), for a final length of 186,825 bp. A maximum-likelihood phylogenetic tree of 2,590 sequences (those kept in the analysis after excluding genetically unrelated strains, see below) was inferred using IQ-TREE, v2.0.3 (104).…”
Section: Methodsmentioning
confidence: 99%