Mining bacterial NGS data vastly expands the complete genomes of temperate phages

Zhang, Xianglilan; WANG, Ruohan; Xie, Xiangcheng; Hu, Yunjia; Wang, Jianping; Sun, Qiang; Feng, Xikang; Lin, Wei; Tong, Shanwei; Yan, Wei; Wen, Huiqi; Wang, Mengyao; Zhai, Shixiang; Sun, Cheng; Wang, Fangyi; Niu, Qi; Kropinski, Andrew M.; Cui, Yu‐Jun; Peng, Shaoliang; Li, Shuai Cheng; Tong, Yigang

doi:10.1093/nargab/lqac057

Cited by 12 publications

(14 citation statements)

References 65 publications

(99 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This study identified a total of 8,777 temperate phages specifically associated with S. enterica from the comprehensive public and compromised temperate phage database generated in our previous study ( Zhang et al, 2022 ). These temperate phages were detected from the bacterial next-generation sequencing data.…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Genetic characteristics and integration specificity of Salmonella enterica temperate phages

Sun,

Zhang

2023

Front. Microbiol.

Self Cite

View full text Add to dashboard Cite

IntroductionTemperate phages can engage in the horizontal transfer of functional genes to their bacterial hosts. Thus, their genetic material becomes an intimate part of bacterial genomes and plays essential roles in bacterial mutation and evolution. Specifically, temperate phages can naturally transmit genes by integrating their genomes into the bacterial host genomes via integrases. Our previous study showed that Salmonella enterica contains the largest number of temperate phages among all publicly available bacterial species. S. enterica is an important pathogen that can cause serious systemic infections and even fatalities.MethodsInitially, we extracted all S. enterica temperate phages from the extensively developed temperate phage database established in our previous study. Subsequently, we conducted an in-depth analysis of the genetic characteristics and integration specificity exhibited by these S. enterica temperate phages.ResultsHere we identified 8,777 S. enterica temperate phages, all of which have integrases in their genomes. We found 491 non-redundant S. enterica temperate phage integrases (integrase entries). S. enterica temperate phage integrases were classified into three types: intA, intS, and phiRv2. Correlation analysis showed that the sequence lengths of S. enterica integrase and core regions of attB and attP were strongly correlated. Further phylogenetic analysis and taxonomic classification indicated that both the S. enterica temperate phage genomes and the integrase gene sequences were of high diversities.DiscussionOur work provides insight into the essential integration specificity and genetic diversity of S. enterica temperate phages. This study paves the way for a better understanding of the interactions between phages and S. enterica. By analyzing a large number of S. enterica temperate phages and their integrases, we provide valuable insights into the genetic diversity and prevalence of these elements. This knowledge has important implications for developing targeted therapeutic interventions, such as phage therapy, to combat S. enterica infections. By harnessing the lytic capabilities of temperate phages, they can be engineered or utilized in phage cocktails to specifically target and eradicate S. enterica strains, offering an alternative or complementary approach to traditional antibiotic treatments. Our study has implications for public health and holds potential significance in combating clinical infections caused by S. enterica.

show abstract

Section: Methodsmentioning

confidence: 99%

“…In a previous study, we observed that the Salmonella enterica genome contains the largest number of temperate phages compared to other sequenced bacterial species ( Zhang et al, 2022 ). Salmonella enterica is a zoonotic pathogen of substantial concern to global human and animal health ( Knodler and Elfenbein, 2019 ).…”

Section: Introductionmentioning

confidence: 99%

Genetic characteristics and integration specificity of Salmonella enterica temperate phages

Sun,

Zhang

2023

Front. Microbiol.

Self Cite

View full text Add to dashboard Cite

show abstract

Section: Resultsmentioning

confidence: 99%

“…In addition, temperate phage and prophage sequences were also screened for ARGs. A total of 66,823 temperate phage sequences were identified using the TemPhD [38] method (fetched from the PhageScope database [39]) and 49,254 prophage sequences extracted from the PhageClouds database [40] (Figure 1C). Interestingly, a maximum fraction of ARG-carrying phages were found in temperate and prophages sequences, 2.15% and 0.66%, in TemPhD and Phagecloud datasets, respectively, and the least occurrence was found in the MGV environmental database.…”

Section: Presence Of Args In Phage Genomesmentioning

confidence: 99%

The low abundance of antimicrobial resistance genes (ARGs) in bacteriophages and their transfer bottlenecks limit the ability of phages to contribute to the spread of ARGs

Kant,

Petersen,

Sicheritz-Ponten

et al. 2024

Preprint

View full text Add to dashboard Cite

The role of bacteriophages in the spread of antimicrobial resistance genes (ARGs) has been debated over the past decade. Several questions regarding the ARG dissemination potential of bacteriophages remain unanswered. For example, what is the frequency of acquisition of ARGs in phages? Are phages selective in acquiring the ARGs compared to other host genes? What is the predominant mechanism of transferring ARGs to phages? To address these questions, we thoroughly analyzed the available phage genomes, viromes, temperate phage, and prophage sequences for the presence of all known ARGs. Out of the 38,861 phage genome sequences we analyzed, only 182 phages contained a total of 314 ARGs. Interestingly, a few of the Streptococcus and Acinetobacter phages were found to carry an ARG cluster with four or more genes. One of the uncharacterized Myoviridae phages was found to carry the entire vancomycin operon. Furthermore, based on the presence of lysogenic marker sequences, the terminal location of ARGs on phage genomes, and complete ARG clusters transferred to phages, we suggest that ARGs are predominantly acquired from hosts by temperate phages via specialized transduction. The close association of most phage ARGs with lysogenic markers and mobile genetic elements (MGEs) also points towards specialized transduction as a potent mechanism of acquisition of ARGs by phages. Our study further suggests that the acquisition of ARGs by phages occurs by chance rather than through a selective process. Taken together, the limited presence of ARGs in phages, alongside various transfer bottlenecks, significantly restricts the role of phages in the dissemination of ARGs.

show abstract

“…Additionally, the categorization of viral taxonomy is still a topic of discussion (Walker et al, 2022). Though there have been recent efforts to expand databases (Camargo et al, 2023; Zhang et al, 2022), the overall understanding of viral diversity is still not complete (Yan et al, 2023). We have assembled a unique phage sequence database using recently published genomic data.…”

Section: Methodsmentioning

confidence: 99%

ProkBERT Family: Genomic Language Models for Microbiome Applications

Ligeti,

Szepesi-Nagy,

Bodnár

et al. 2023

Preprint

View full text Add to dashboard Cite

Machine learning offers transformative capabilities in microbiology and microbiome analysis, deciphering intricate microbial interactions, predicting functionalities, and unveiling novel patterns in vast datasets. This enriches our comprehension of microbial ecosystems and their influence on health and disease. However, the integration of machine learning in these fields contends with issues like the scarcity of labeled datasets, the immense volume and complexity of microbial data, and the subtle interactions within microbial communities. Addressing these challenges, we introduce the ProkBERT model family. Built on transfer learning and self-supervised methodologies, ProkBERT models capitalize on the abundant available data, demonstrating adaptability across diverse scenarios. The models’ learned representations align with established biological understanding, shedding light on phylogenetic relationships. With the novel Local Context-Aware (LCA) tokenization, the ProkBERT family overcomes the context size limitations of traditional transformer models without sacrificing performance or the information rich local context. In bioinformatics tasks like promoter prediction and phage identification, ProkBERT models excel. For promoter predictions, the best performing model achieved an MCC of 0.74 forE. coliand 0.62 in mixed-species contexts. In phage identification, they all consistently outperformed tools like VirSorter2 and DeepVirFinder, registering an MCC of 0.85. Compact yet powerful, the ProkBERT models are efficient, generalizable, and swift. They cater to both supervised and unsupervised tasks, providing an accessible tool for the community. The models are available on GitHub and HuggingFace.

show abstract

Mining bacterial NGS data vastly expands the complete genomes of temperate phages

Cited by 12 publications

References 65 publications

Genetic characteristics and integration specificity of Salmonella enterica temperate phages

Genetic characteristics and integration specificity of Salmonella enterica temperate phages

The low abundance of antimicrobial resistance genes (ARGs) in bacteriophages and their transfer bottlenecks limit the ability of phages to contribute to the spread of ARGs

ProkBERT Family: Genomic Language Models for Microbiome Applications

Contact Info

Product

Resources

About