2021
DOI: 10.1186/s12915-020-00938-6
|View full text |Cite
|
Sign up to set email alerts
|

Prokaryotic virus host predictor: a Gaussian model for host prediction of prokaryotic viruses in metagenomics

Abstract: Background Viruses are ubiquitous biological entities, estimated to be the largest reservoirs of unexplored genetic diversity on Earth. Full functional characterization and annotation of newly discovered viruses requires tools to enable taxonomic assignment, the range of hosts, and biological properties of the virus. Here we focus on prokaryotic viruses, which include phages and archaeal viruses, and for which identifying the viral host is an essential step in characterizing the virus, as the v… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
85
0

Year Published

2021
2021
2025
2025

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 75 publications
(86 citation statements)
references
References 32 publications
(54 reference statements)
1
85
0
Order By: Relevance
“…While HostPhinder predicts phage-host interactions by comparing 16-mers composition between query phage and known phage genome sequences [39], most tools instead compare k-mer frequencies between the query phage and a host reference genome database, based on the assumption that virus and host genomes often display similar sequence composition bias and k-mer frequencies patterns (Table 1). Prokaryotic virus host Predictor (PHP) [40] and VirHostMatcher (VHM) [41] respectively compare 4-mer and 6-mer frequency vectors between the query phage and a database of reference host genomes. "Who Is the Host?"…”
Section: Ii1 Sequence Composition Features Used For Alignment-free Host Predictionmentioning
confidence: 99%
See 2 more Smart Citations
“…While HostPhinder predicts phage-host interactions by comparing 16-mers composition between query phage and known phage genome sequences [39], most tools instead compare k-mer frequencies between the query phage and a host reference genome database, based on the assumption that virus and host genomes often display similar sequence composition bias and k-mer frequencies patterns (Table 1). Prokaryotic virus host Predictor (PHP) [40] and VirHostMatcher (VHM) [41] respectively compare 4-mer and 6-mer frequency vectors between the query phage and a database of reference host genomes. "Who Is the Host?"…”
Section: Ii1 Sequence Composition Features Used For Alignment-free Host Predictionmentioning
confidence: 99%
“…In contrast, the most recent tools leverage advances in machine learning to identify reliable predictions, including Gaussian models [40], neighborhood regularized logistic matrix factorization [43], and deep convolution neural network [45]. A critical and challenging aspect of these techniques is the establishment of robust and balanced training and test sets which should ideally represent a diverse range of viruses, hosts, and virus-host interactions, to avoid over-estimating the performance of these tools, i.e.…”
Section: Ii1 Sequence Composition Features Used For Alignment-free Host Predictionmentioning
confidence: 99%
See 1 more Smart Citation
“…Host predictions of eukaryotic viruses have been usually conducted based on viral sequences alone, such as those for influenza viruses and coronaviruses (Xu et al 2017 ; Tian, 2020 ); while those of prokaryotic viruses have been usually conducted based on the similarity of sequence features or sequences between viruses and hosts. At present, two kinds of computational methods have been developed to predict prokaryotic virus hosts based on genomic sequences (Edwards et al 2016 ; Ahlgren et al 2017 ; Galiez et al 2017 ; Lu et al 2021 ). The first kind of methods rely on the sequence similarity search between the query viruses and the candidate host genomes since viruses and their hosts may share the same genes and/or short nucleotide sequences such as the spacer sequences used in CRISPR systems (Edwards et al 2016 ).…”
Section: Host Prediction Of Virusesmentioning
confidence: 99%
“…However, they can be only used for a small proportion of viruses since only some viruses have sequence similarities with their hosts (Edwards et al 2016 ). Another kind of methods can predict the viral hosts based on the sequence composition similarity between viruses and their hosts, such as the Prokaryotic virus Host Predictor (PHP) (Lu et al 2021 ), VirHostMatcher (Ahlgren et al 2017 ) and WIsH (Galiez et al 2017 ). Although the latter kind of method predicts viral hosts with lower accuracy than the former, they can be used for any prokaryotic viruses.…”
Section: Host Prediction Of Virusesmentioning
confidence: 99%