2022
DOI: 10.1093/bib/bbac222
|View full text |Cite
|
Sign up to set email alerts
|

ggmsa: a visual exploration tool for multiple sequence alignment and associated data

Abstract: The identification of the conserved and variable regions in the multiple sequence alignment (MSA) is critical to accelerating the process of understanding the function of genes. MSA visualizations allow us to transform sequence features into understandable visual representations. As the sequence–structure–function relationship gains increasing attention in molecular biology studies, the simple display of nucleotide or protein sequence alignment is not satisfied. A more scalable visualization is required to bro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
70
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
8
2

Relationship

0
10

Authors

Journals

citations
Cited by 127 publications
(78 citation statements)
references
References 45 publications
0
70
0
Order By: Relevance
“…Multiple sequence alignment was done by MAFFT (v7.490) 35 . All data management procedures, analyses and plottings 36,37 were performed in R environment (v4.2.1). 38…”
Section: Methodsmentioning
confidence: 99%
“…Multiple sequence alignment was done by MAFFT (v7.490) 35 . All data management procedures, analyses and plottings 36,37 were performed in R environment (v4.2.1). 38…”
Section: Methodsmentioning
confidence: 99%
“…All statistical analysis was performed in R v4.0.3. Phylogenetic trees and tp0470 and arp variants were visualized with the R packages ggtree ( Yu et al, 2017 ), treeio ( Wang et al, 2020 ), and ggplot ( Wickham, 2016 ), and multiple sequence alignments by R package ggmsa ( Zhou et al, 2022 ). Bash, R, and python scripts for all data processing are available at https://github.com/greninger-lab/TP_genome_finishing .…”
Section: Methodsmentioning
confidence: 99%
“…The pipeline consists of several steps: i) BlastP for homology searching (Camacho et al, 2009) with cut-offs set on a protein by protein basis to optimise numbers and diversity of hits: 50% sequence identity to the query cutoff for CplR (WP_011861613.1), 70% identity cutoff for VmlR (WP_003234144.1) and VmlR2 (WP_024026878.1) and 80% for LsaA (WP_002398829.1), ii) retrieval of 300 nt upstream sequences preceding the identified ARE-ABCF genes, iii) uORF annotation and, finally, iv) conservation analysis and multiple sequence alignment with MAFFT v7.490 (Katoh and Standley, 2013). Visualisation was performed with the ggmsa R package (Zhou et al, 2022), msa4u (Egorov and Atkinson, 2022) and Logomaker (Tareen and Kinney, 2020) Python packages. Secondary structures of the cplR 5ʹ leader region were predicted using RNAfold (Gruber et al, 2008) and Mfold (Peltier et al, 2020).…”
Section: Methodsmentioning
confidence: 99%