2017
DOI: 10.1186/s12859-017-1602-3
|View full text |Cite
|
Sign up to set email alerts
|

A machine learning approach for viral genome classification

Abstract: BackgroundAdvances in cloning and sequencing technology are yielding a massive number of viral genomes. The classification and annotation of these genomes constitute important assets in the discovery of genomic variability, taxonomic characteristics and disease mechanisms. Existing classification methods are often designed for specific well-studied family of viruses. Thus, the viral comparative genomic studies could benefit from more generic, fast and accurate tools for classifying and typing newly sequenced s… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
57
0
1

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
2
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 59 publications
(58 citation statements)
references
References 54 publications
0
57
0
1
Order By: Relevance
“…In general, the classifiers performed better at genotyping than at subtyping HCV sequences. Several studies for viral [9], [10] and metagenomic [3], [20], [21] taxonomic classification have reported similar results where the performance is better at high-level classifications. Genomic sequences are more similar at low-level than at high-level clades, which makes more difficult to discriminate between sequences at low-level clades.…”
Section: Overall Remarksmentioning
confidence: 58%
“…In general, the classifiers performed better at genotyping than at subtyping HCV sequences. Several studies for viral [9], [10] and metagenomic [3], [20], [21] taxonomic classification have reported similar results where the performance is better at high-level classifications. Genomic sequences are more similar at low-level than at high-level clades, which makes more difficult to discriminate between sequences at low-level clades.…”
Section: Overall Remarksmentioning
confidence: 58%
“…Pre-filtering by host-mapping subtraction could lead to efficient de novo assembly, allowing the rapid and accurate procurement of a complete viral genome sequence. In addition to the accuracy of de novo assembly, the exclusion of human-related sequences can circumvent conflicting ethical issues by avoiding analyzing the personal genetic information of patients [46,47].…”
Section: Virustap: Viral Genome-targeted Assembly Pipelinementioning
confidence: 99%
“…VIP (https://github.com/keylabivdc/VIP) is a web-based virus discovery and identification tool [46]. With a single click, it will filter out background-related reads, classify reads on basis of nucleotide and remote amino acid homology, and perform phylogenetic analysis to provide evolutionary insights.…”
Section: Virus Identification Pipeline (Vip)mentioning
confidence: 99%
“…Machine learning has been successfully used in small-scale genomic analysis studies [40,41,42]. In this paper we propose a novel combination of supervised machine learning with feature vectors consisting of the distance between the magnitude spectrum of a sequence's digital signal and the magnitude spectra of all other sequences in the training set.…”
Section: Supervised Machine Learningmentioning
confidence: 99%
“…To address situations where alignment-based methods fail or are insufficient, alignment-free methods have been proposed [10], including approaches based on Chaos Game Representation of DNA sequences [11,12,13], random walk [14], graph theory [15], iterated maps [16], information theory [17], category-position-frequency [18], spaced-words frequencies [19], Markov-model [20], thermal melting profiles [21], word analysis [22], among others. Software implementations of alignment-free methods also exist, among them COMET [23], CASTOR [24], SCUEAL [25], REGA [26], KAMERIS [27], and FFP (Feature Frequency Profile) [28].…”
Section: Introductionmentioning
confidence: 99%