2020
DOI: 10.1093/nargab/lqaa044
|View full text |Cite
|
Sign up to set email alerts
|

A network-based integrated framework for predicting virus–prokaryote interactions

Abstract: Metagenomic sequencing has greatly enhanced the discovery of viral genomic sequences; however, it remains challenging to identify the host(s) of these new viruses. We developed VirHostMatcher-Net, a flexible, network-based, Markov random field framework for predicting virus–prokaryote interactions using multiple, integrated features: CRISPR sequences and alignment-free similarity measures ($s_2^*$ and WIsH). Evaluation of this method on a benchmark set of 1462 known virus–prokaryote pairs yielded host predicti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
134
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 89 publications
(135 citation statements)
references
References 79 publications
1
134
0
Order By: Relevance
“…In extreme environments such as the Atacama Desert, where only slow microbial replication is supported (25,26), it is plausible that CRISPR-Cas may not yield fast enough immune responses (36) for it to be prevalent. To predict possible host-virus interactions for hosts that lack CRISPR systems, VirHostMatcher (37) identified 3897 putative interactions between 73 MAGs and 132 viral scaffolds using d2* threshold of 0.3 (Figure 3b shows highest confidence interactions with d2* threshold of 0.25 for visualisation, Table S3 contains sample specific overview). Most putative interactions were established with actinobacterial MAGs, which were infected by a largely shared set of multiple viruses, while Thaumarchaeota and Firmicutes only interacted with three and four viruses specific to the hosts' taxonomic group respectively.…”
Section: Oligonucleotide Frequency-based Host-virus Matches Suggest Amentioning
confidence: 99%
“…In extreme environments such as the Atacama Desert, where only slow microbial replication is supported (25,26), it is plausible that CRISPR-Cas may not yield fast enough immune responses (36) for it to be prevalent. To predict possible host-virus interactions for hosts that lack CRISPR systems, VirHostMatcher (37) identified 3897 putative interactions between 73 MAGs and 132 viral scaffolds using d2* threshold of 0.3 (Figure 3b shows highest confidence interactions with d2* threshold of 0.25 for visualisation, Table S3 contains sample specific overview). Most putative interactions were established with actinobacterial MAGs, which were infected by a largely shared set of multiple viruses, while Thaumarchaeota and Firmicutes only interacted with three and four viruses specific to the hosts' taxonomic group respectively.…”
Section: Oligonucleotide Frequency-based Host-virus Matches Suggest Amentioning
confidence: 99%
“…The noted difference in vHULK's performance (primarily at the species level) between testing and validation could be explained in part by the presence of phages in the NDG dataset whose hosts are not present in vHULK's training datasets. It is known that many machine learning prediction models may struggle when facing instances for which there was no target in training (24,25,40). We have tried to mitigate this problem by providing users with entropy values associated with scores; we have shown that these entropy values can be used as proxies for prediction confidence.…”
Section: Discussionmentioning
confidence: 99%
“…According to its authors, WiSH presents 63% of mean accuracy when predicting host genus among 20 possible genera. A more recent work proposed the use of neural networks based on an ensemble of distance measurements and other features, concepts that were implemented in the tool VirHostMatchernet (VHM-net) (25). VHM-net achieved accuracies that varied from 43% to 59% at the host genus level only.…”
Section: Introductionmentioning
confidence: 99%
“…We further evaluated the host prediction accuracy of Phirbo by selecting a top-scored prokaryotic sequence for each phage [14][15][16]18]. Briefly, host prediction accuracy is calculated as the percentage of phages whose predicted hosts have the same taxonomic affiliation as their respective known hosts (if multiple top-scoring hosts are present, the prediction is scored as correct if the true host is among the predicted hosts).…”
Section: Phirbo Preserves Blast Top-ranked Host Predictionsmentioning
confidence: 99%
“…Methods for studying phage-host interactions primarily rely on cultured virus-host systems; however, recent in silico approaches suggest a much broader range of hosts may be susceptible to viral infections [14]. These methods predict prokaryotic hosts based on sequence composition [15,16], direct sequence similarity between phages and hosts [14], analysis of CRISPR spacers or tRNAs [13,17], as well as supervised approaches that integrate several sequence-based methods [18,19].…”
Section: Introductionmentioning
confidence: 99%