Accurate identification of bacteriophages from metagenomic data using Transformer

Shang, Jiayu; Tang, Xubo; Guo, Ruocheng; Sun, Yanni

doi:10.1093/bib/bbac258

Cited by 28 publications

(19 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Despite these limitations, we hope the developed benchmark may be informative to users and would be further developed to include new computational challenges. It should be noted that the results presented here are limited to those tools that could be installed and run by July 2021, and since then many more tools have been published [we are aware of 3CAC ( Pu and Shamir, 2022 ), DeepMicrobeFinder ( Hou et al, 2021 ), INHERIT ( Bai et al, 2022 ), PHAMB ( Johansen et al, 2022 ), PhaMer ( Shang et al, 2022 ), VirMine 2.0 ( Johnson and Putonti, 2022 ), and virSearcher ( Liu Q. et al, 2022 )]. Additionally, modular pipelines such as the IMG/VR viral discovery pipeline ( Paez-Espino et al, 2017 ) and computational pipelines combining several tools presented here, were not evaluated in this work but could be assessed using the same benchmark datasets developed here.…”

Section: Conclusion and Recommendationsmentioning

confidence: 99%

Evaluation of computational phage detection tools for metagenomic datasets

et al. 2023

View full text Add to dashboard Cite

IntroductionAs new computational tools for detecting phage in metagenomes are being rapidly developed, a critical need has emerged to develop systematic benchmarks.MethodsIn this study, we surveyed 19 metagenomic phage detection tools, 9 of which could be installed and run at scale. Those 9 tools were assessed on several benchmark challenges. Fragmented reference genomes are used to assess the effects of fragment length, low viral content, phage taxonomy, robustness to eukaryotic contamination, and computational resource usage. Simulated metagenomes are used to assess the effects of sequencing and assembly quality on the tool performances. Finally, real human gut metagenomes and viromes are used to assess the differences and similarities in the phage communities predicted by the tools.ResultsWe find that the various tools yield strikingly different results. Generally, tools that use a homology approach (VirSorter, MARVEL, viralVerify, VIBRANT, and VirSorter2) demonstrate low false positive rates and robustness to eukaryotic contamination. Conversely, tools that use a sequence composition approach (VirFinder, DeepVirFinder, Seeker), and MetaPhinder, have higher sensitivity, including to phages with less representation in reference databases. These differences led to widely differing predicted phage communities in human gut metagenomes, with nearly 80% of contigs being marked as phage by at least one tool and a maximum overlap of 38.8% between any two tools. While the results were more consistent among the tools on viromes, the differences in results were still significant, with a maximum overlap of 60.65%. Discussion: Importantly, the benchmark datasets developed in this study are publicly available and reusable to enable the future comparability of new tools developed.

show abstract

Section: Conclusion and Recommendationsmentioning

confidence: 99%

Evaluation of computational phage detection tools for metagenomic datasets

et al. 2023

View full text Add to dashboard Cite

show abstract

“…Thus, a pre-processing step is needed for detecting those contigs from metagenomic data. A number of tools, such as VirFinder (Ren et al, 2020), Seeker (Auslander et al, 2020), and PhaMer (Shang et al, 2022) can be applied in the pre-processing step.…”

Section: Approaches For Phage Taxonomic Classificationmentioning

confidence: 99%

“…• Simulated metagenomic dataset We used a simulated metagenomic dataset generated by six common bacteria living in human gut (Shang et al, 2022). We first utilized metaSPAdes (Nurk et al, 2017) to assemble the reads into contigs.…”

Section: Datasetmentioning

confidence: 99%

“…We first utilized metaSPAdes (Nurk et al, 2017) to assemble the reads into contigs. Then PhaMer (Shang et al, 2022) was applied to identify bacteriophages from metagenomic data, and the labels of the contigs were determined using BLAST (Camacho et al, 2009). Eventually, 37 contigs were used in the experiments.…”

Section: Datasetmentioning

confidence: 99%

“…In this experiment, we used the simulated metagenomic dataset provided in PhaMer (Shang et al, 2022). The dataset is a small-scale metagenomic dataset simulated by CAMISIM (Fritz et al, 2019) using the commonly seen bacteria living in the human gut and the phages that infect these bacteria.…”

Section: Experiments : Classification Performance On the Simulated Me...mentioning

confidence: 99%

See 2 more Smart Citations

Phage family classification under Caudoviricetes: A review of current tools using the latest ICTV classification framework

et al. 2022

Self Cite

View full text Add to dashboard Cite

Bacteriophages, which are viruses infecting bacteria, are the most ubiquitous and diverse entities in the biosphere. There is accumulating evidence revealing their important roles in shaping the structure of various microbiomes. Thanks to (viral) metagenomic sequencing, a large number of new bacteriophages have been discovered. However, lacking a standard and automatic virus classification pipeline, the taxonomic characterization of new viruses seriously lag behind the sequencing efforts. In particular, according to the latest version of ICTV, several large phage families in the previous classification system are removed. Therefore, a comprehensive review and comparison of taxonomic classification tools under the new standard are needed to establish the state-of-the-art. In this work, we retrained and tested four recently published tools on newly labeled databases. We demonstrated their utilities and tested them on multiple datasets, including the RefSeq, short contigs, simulated metagenomic datasets, and low-similarity datasets. This study provides a comprehensive review of phage family classification in different scenarios and a practical guidance for choosing appropriate taxonomic classification pipelines. To our best knowledge, this is the first review conducted under the new ICTV classification framework. The results show that the new family classification framework overall leads to better conserved groups and thus makes family-level classification more feasible.

show abstract