2020
DOI: 10.1101/2020.09.14.291484
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

CIAlign - A highly customisable command line tool to clean, interpret and visualise multiple sequence alignments

Abstract: BackgroundThroughout biology, multiple sequence alignments (MSAs) form the basis of much investigation into biological features and relationships. These alignments are at the heart of many bioinformatics analyses. However, sequences in MSAs are often incomplete or very divergent, which leads to poorly aligned regions or large gaps in alignments. This slows down computation and can impact conclusions without being biologically relevant. Therefore, cleaning the alignment by removing these regions can substantial… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
15
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
1

Relationship

2
5

Authors

Journals

citations
Cited by 16 publications
(15 citation statements)
references
References 46 publications
0
15
0
Order By: Relevance
“…Protein sequences were aligned using MUSCLE ( 63 ), and maximum likelihood trees were built and visualized in Seaview ( 64 ) using PhyML ( 65 ), with 100 bootstrap replicates for statistical support. Sequence alignments of IFIT3 proteins in different species were visualized using CIAlign ( https://pypi.org/project/cialign/ ) ( 66 ).…”
Section: Methodsmentioning
confidence: 99%
“…Protein sequences were aligned using MUSCLE ( 63 ), and maximum likelihood trees were built and visualized in Seaview ( 64 ) using PhyML ( 65 ), with 100 bootstrap replicates for statistical support. Sequence alignments of IFIT3 proteins in different species were visualized using CIAlign ( https://pypi.org/project/cialign/ ) ( 66 ).…”
Section: Methodsmentioning
confidence: 99%
“…E) Predicted amino acid sequences of the uORF from 98 PRRSV genomes representative of NA PRRSV diversity. Visualisation made using CIAlign 69 , with each row representing one sequence, each coloured rectangle representing an amino acid, and gaps indicating translation termination due to a stop codon. F and G) Conservation of F) the initiation context and G) the stop codon for the NA PRRSV uORF.…”
Section: Characterisation Of the Prrsv Translatomementioning
confidence: 99%
“…For the uORF amino acid conservation plot, the nucleotide sequence of the CDS was extracted from a multiple sequence alignment and frame 0 was translated, with a 28-codon extension in KY348852 omitted from the plot for space considerations. Logo plots and mini-alignment plots were generated using CIAlign 69 and, for the uORF analyses, genome sequences which began partway through the ORF were excluded, as was KT257963 which has a likely sequencing artefact in the 5′ UTR. For analysis of uORF start and stop codon conservation, sequences were filtered to take only those spanning the entire feature of interest without gaps, leaving 564 and 598 sequences, respectively, as input for the logo plots in Figure 6F and G. Synonymous site conservation was analysed, for the representative NA PRRSV sequences or for all EU sequences, using SYNPLOT2 [68] and p values plotted after application of a 25-codon running mean filter.…”
Section: Analysis Of Sequence Conservationmentioning
confidence: 99%
“…The uORF CDS nucleotide sequence was extracted from a multiple sequence alignment and frame 0 was translated. Each row represents one sequence, with each coloured rectangle representing an amino acid (logo plots and alignment visualisations made using CIAlign 68 ). Gaps indicate translation is predicted to have terminated due to a stop codon.…”
Section: Characterising the Prrsv Translatomementioning
confidence: 99%
“…For analyses of NA PRRSV genomes "representative of NA PRRSV diversity", the NA PRRSV sequences were clustered using CD-HIT 158 (version 4.8.1) based on the whole genome and with a nucleotide similarity threshold of 95% (all other parameters set to default), and one representative sequence from each cluster was selected to make a sequence alignment of 137 sequences. Logo plots and mini-alignment plots were generated using CIAlign 68 and, for the uORF analyses, genome sequences which began partway through the ORF were excluded, as was KT257963 which has a likely sequencing artefact in the 5′ UTR. Synonymous site conservation was analysed, for the representative NA PRRSV sequences or for all EU sequences, using SYNPLOT2 [67] and p values plotted after application of a 25-codon running mean filter.…”
Section: Analysis Of Sequence Conservationmentioning
confidence: 99%