2023
DOI: 10.1101/2023.01.24.524926
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Accurate and fast graph-based pangenome annotation and clustering with ggCaller

Abstract: Bacterial genomes differ in both gene content and sequence mutations, which can cause important clinical phenotypic differences such as vaccine escape or antimicrobial resistance. To identify and quantify important variants, all genes within a population must be predicted, functionally annotated and clustered, representing the 'pangenome'. Despite the volume of genome data available, gene prediction and annotation are currently conducted in isolation on individual genomes, which is computationally inefficient … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
16
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 9 publications
(16 citation statements)
references
References 87 publications
0
16
0
Order By: Relevance
“…An ideal dataset would then have harmonized gene annotations across isolates. Recent advances in pangenome-aware gene calling and pangenome estimation might then be more appropriate; the recently described ggCaller 15 algorithm's output is in fact compatible with panfeed.…”
Section: Discussionmentioning
confidence: 99%
See 3 more Smart Citations
“…An ideal dataset would then have harmonized gene annotations across isolates. Recent advances in pangenome-aware gene calling and pangenome estimation might then be more appropriate; the recently described ggCaller 15 algorithm's output is in fact compatible with panfeed.…”
Section: Discussionmentioning
confidence: 99%
“…panfeed iterates over the gene presence/absence matrix produced by software such as Roary 20 , panaroo 21 or ggCaller 15 , specifically the "gene_presence_absence.csv" file. For each gene cluster, the full nucleotide sequence of the gene is retrieved from the input GFF3 file of each of the input samples, optionally including sequences upstream and downstream of the start and stop codon, respectively.…”
Section: Methods Panfeed Algorithmmentioning
confidence: 99%
See 2 more Smart Citations
“…Coloured and compacted generalizations of the de Bruijn graph-based assemblers have been successively used to build graphs from large sequence sets [11,12], with tools existing to build graphs containing thousands of isolates [13]. These graphs have been shown to be useful in problems such as the detection of the core genome [14] or the improvement of gene prediction, annotation and clustering [15]. However, in these methods the graph structure encodes, at the same time, nucleotide-level variation and largescale structural rearrangements.…”
Section: Introductionmentioning
confidence: 99%