2019
DOI: 10.1101/586495
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Automated reconstruction of all gene histories in large bacterial pangenome datasets and search for co-evolved gene modules with Pantagruel

Abstract: The availability of bacterial pangenome data grows exponentially, requiring efficient new methods of analysis. Currently popular approaches for the fast comparison of genomes have the drawback of not being based on explicit evolutionary models of diversification. Making sense of bacterial genome evolution, and notably in the accessory genome, requires however to take into account the complex processes by which the genomes evolve. Here we present the Pantagruel bioinformatic software pipeline, which enables the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
18
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
3
3

Relationship

4
2

Authors

Journals

citations
Cited by 15 publications
(18 citation statements)
references
References 29 publications
0
18
0
Order By: Relevance
“…A custom set of 155 genomes was gathered covering 28 different Bradyrhizobium species, as well as selected genomes from the genera Nitrobacter and Rhodopseudomonas as outgroups. The genomes listed in Table S1 (available in the online version of this article) were downloaded from the National Center for Biotechnology Information (NCBI) RefSeq or GenBank databases and used as input for the bioinformatic pipeline Pantagruel [26] to build a phylogenomic database. In short, coding sequences (CDSs) and the corresponding protein sequences were extracted from the RefSeq or GenBank annotation and then clustered into homologous gene families using MMSeqs2 [27].…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…A custom set of 155 genomes was gathered covering 28 different Bradyrhizobium species, as well as selected genomes from the genera Nitrobacter and Rhodopseudomonas as outgroups. The genomes listed in Table S1 (available in the online version of this article) were downloaded from the National Center for Biotechnology Information (NCBI) RefSeq or GenBank databases and used as input for the bioinformatic pipeline Pantagruel [26] to build a phylogenomic database. In short, coding sequences (CDSs) and the corresponding protein sequences were extracted from the RefSeq or GenBank annotation and then clustered into homologous gene families using MMSeqs2 [27].…”
Section: Methodsmentioning
confidence: 99%
“…Bayesian scenario samples were used to compute co-evolution scores between gene families, using an approach adapted from Lassalle et al . [26] and detailed further in File S1. ML and Bayesian samples of scenarios, summaries of the inferred transfer events and of the co-evolution analysis are available as part of online Supporting Data at the Figshare data repository (http://dx.doi.org/10.6084/m9.figshare.12191103).…”
Section: Methodsmentioning
confidence: 99%
“…Further pangenome analyses were conducted using the Pantagruel pipeline under the default settings as described previously (Lassalle et al, 2019, Lassalle et al, 2020) and on the program webpage http://github.com/flass/pantagruel/. Because of computationally highly intensive tasks, the dataset analyzed was limited to the Allorhizobium genus and Rhizobium aggregatum complex (total of 28 strains).…”
Section: Methodsmentioning
confidence: 99%
“…Unless specified otherwise, the following bioinformatic analyses were conducted using the Pantagruel pipeline under the default settings as described previously [23] and on the program webpage http://github.com/flass/pantagruel/. This pipeline is designed for the analysis of bacterial pangenomes, including the inference of a species tree, gene trees, and the detection of horizontal gene transfers (HGT) through species tree/gene tree reconciliations [62].…”
Section: Methodsmentioning
confidence: 99%