2021
DOI: 10.1093/bioinformatics/btab845
|View full text |Cite
|
Sign up to set email alerts
|

Virtifier: a deep learning-based identifier for viral sequences from metagenomes

Abstract: Motivation Viruses, the most abundant biological entities on earth, are important components of microbial communities, and as major human pathogens, they are responsible for human mortality and morbidity. The identification of viral sequences from metagenomes is critical for viral analysis. As massive quantities of short sequences are generated by next-generation sequencing (NGS), most methods utilize discrete and sparse one-hot vectors to encode nucleotide sequences, which are usually ineffe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
17
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 31 publications
(17 citation statements)
references
References 33 publications
0
17
0
Order By: Relevance
“…Genome sequencing has enhanced our understanding of emerging viromes by providing blueprints of the evolutionary and functional diversity of viruses, but sequences always contain dark matter that cannot be identified or matched. As not all viruses can be isolated and cultured in the laboratory, numerous algorithms, databases, and pipelines have been developed to process virome sequencing data and support dark matter exploration ( 56 58 ). Using DVF and CheckV software, we performed a preliminary exploration of a large number of unannotated sequences in the study samples, looking for possibly viral sequences in the dark matter.…”
Section: Discussionmentioning
confidence: 99%
“…Genome sequencing has enhanced our understanding of emerging viromes by providing blueprints of the evolutionary and functional diversity of viruses, but sequences always contain dark matter that cannot be identified or matched. As not all viruses can be isolated and cultured in the laboratory, numerous algorithms, databases, and pipelines have been developed to process virome sequencing data and support dark matter exploration ( 56 58 ). Using DVF and CheckV software, we performed a preliminary exploration of a large number of unannotated sequences in the study samples, looking for possibly viral sequences in the dark matter.…”
Section: Discussionmentioning
confidence: 99%
“…Tools such as the machine learning algorithms RNN-VirSeeker and VirFinder, along with cloud-based platforms like Serratus, have been innovatively designed to pinpoint viral sequences within metagenomic data, markedly enhancing the efficiency and precision of virus detection. [129][130][131] Despite the remarkable advancements in the construction of infectious clones for insect viruses, significant challenges persist. These challenges include the assembly or preservation of large genomic fragments in prokaryotic cells, such as Escherichia coli or Agrobacterium.…”
Section: Discussionmentioning
confidence: 99%
“…Furthermore, advancements in bioinformatics have greatly enriched the domain of virus identification. Tools such as the machine learning algorithms RNN‐VirSeeker and VirFinder, along with cloud‐based platforms like Serratus, have been innovatively designed to pinpoint viral sequences within metagenomic data, markedly enhancing the efficiency and precision of virus detection 129–131 …”
Section: Discussionmentioning
confidence: 99%
“…This preserves the nucleotide positions of each word in the sequence. Two vector encoding methods, one hot vector encoding, and label encoding are also used to represent the numerical values of the sequences [24]. One hot vector encoding, and label encoding are used because in contrast to image data, which is represented as a two-dimensional numerical matrix as an input to the CNN, text data is represented as a one-dimensional series of consecutive characters.…”
Section: B Data Preproccessingmentioning
confidence: 99%