2017
DOI: 10.1093/bioinformatics/btx383
|View full text |Cite
|
Sign up to set email alerts
|

WIsH: who is the host? Predicting prokaryotic hosts from metagenomic phage contigs

Abstract: SummaryWIsH predicts prokaryotic hosts of phages from their genomic sequences. It achieves 63% mean accuracy when predicting the host genus among 20 genera for 3 kbp-long phage contigs. Over the best current tool, WisH shows much improved accuracy on phage sequences of a few kbp length and runs hundreds of times faster, making it suited for metagenomics studies.Availability and implementationOpenMP-parallelized GPL-licensed C ++ code available at .Supplementary information Supplementary data are available at B… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

2
294
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
3
2
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 228 publications
(296 citation statements)
references
References 13 publications
2
294
0
Order By: Relevance
“…Until now, there are several works on the identification of host of viruses, such as HostPhinder (6), WIsH (7), which all predict hosts for bacteriophages. However they were not designed for non-phage virus host-prediction, especially the virus causing Zoonotic diseases.…”
Section: Main Textmentioning
confidence: 99%
See 1 more Smart Citation
“…Until now, there are several works on the identification of host of viruses, such as HostPhinder (6), WIsH (7), which all predict hosts for bacteriophages. However they were not designed for non-phage virus host-prediction, especially the virus causing Zoonotic diseases.…”
Section: Main Textmentioning
confidence: 99%
“…Considering difference in input sequence lengths, two BiPathCNNs (BiPathCNN-A and BiPathCNN-B) were built for predicting hosts of viral sequences from 100 bp to 400 bp and 400 bp to 800 bp respectively. The dataset(7) for training and testing includes genomes of all DNA viruses, coding sequences of all RNA viruses and their host information in…”
mentioning
confidence: 99%
“…In the meantime, the development of methods to add information to this genome-centric database is currently at full steam. These include approaches for large scale virus taxonomy classification (Bolduc et al, 2017b;Meier-Kolthoff and Göker, 2017;Nishimura et al, 2017;Aiewsakun and Simmonds, 2018), as well as host linkage for uncultivated viruses either computationally (Edwards et al, 2016;Galiez et al, 2017;Ahlgren et al, 2016) or experimentally (Tadmor et al, 2011;Martínez-García et al, 2014;Deng et al, 2014;Roux et al, 2014;Labonté et al, 2015;Spencer et al, 2016). Beyond virus-specific tools, we anticipate significant improvements in genome annotation capabilities stemming from (i) "multi 'omics approaches" combining transcriptomics, proteomics and metabolomics studies of individual environments (Franzosa et al, 2015) and (ii) improved functional prediction tools leveraging protein structure constraints and large-scale comparative genomics (Alva et al, 2016;Ovchinnikov et al, 2017).…”
Section: Wishful Thinking or Realistic Path Forward?mentioning
confidence: 99%
“…Several investigators have utilized the fact that viruses are similar to their hosts compared with other unrelated host species in terms of their genomic signatures (or k-mer usage) 11,[24][25][26] . They predicted the host of a virus as the one closest to the viral genome based on some dissimilarity measures using k-mers.…”
Section: Introductionmentioning
confidence: 99%
“…The recently developed dissimilarity measure d * 2 that subtracts expected k-mer frequency from the observed frequency achieves the highest accuracy among all current genomic signature-based measures, including the commonly used Euclidean and Manhattan distances 25 . Similarly, Galiez et al 26 predicted the host of a virus to be the one under whose Markov chain model the viral sequence has the highest likelihood, and the method has good accuracy for short viral fragments. The genomic signature-based measures are often referred as alignment-free sequence comparison measures.…”
Section: Introductionmentioning
confidence: 99%