2018
DOI: 10.1093/nar/gky1127
|View full text |Cite
|
Sign up to set email alerts
|

IMG/VR v.2.0: an integrated data management and analysis system for cultivated and environmental viral genomes

Abstract: The Integrated Microbial Genome/Virus (IMG/VR) system v.2.0 (https://img.jgi.doe.gov/vr/) is the largest publicly available data management and analysis platform dedicated to viral genomics. Since the last report published in the 2016, NAR Database Issue, the data has tripled in size and currently contains genomes of 8389 cultivated reference viruses, 12 498 previously published curated prophages derived from cultivated microbial isolates, and 735 112 viral genomic fragments computationally predicted from asse… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

1
199
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 161 publications
(200 citation statements)
references
References 33 publications
1
199
0
Order By: Relevance
“…We used the host taxonomic information derived from IMG/VR version 1.0 where two computational approaches were used: (1) host assignment based on virus clusters that included isolate virus genomes with known hosts and (2) CRISPR-spacer sequence matches (only tolerating 1 SNP over the whole spacer length as cutoffs). To further complement the host assignment from IMG/VR version 1.0, we used a classification of the viral protein families (used in the virus identification pipeline) to determine the domain (Eukaryotic, Bacterial, or Archaeal) of the host predicted for 85.6% of the viral sequences described in Paez-Espino et al 40 ( Supplementary Table 3). Briefly, the viral protein families were benchmarked against the viral RefSeq genomes and the viral genomes with predicted host from the prokaryotic virus of orthologous groups database 50 obtaining a subset of them used as host-type marker genes.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…We used the host taxonomic information derived from IMG/VR version 1.0 where two computational approaches were used: (1) host assignment based on virus clusters that included isolate virus genomes with known hosts and (2) CRISPR-spacer sequence matches (only tolerating 1 SNP over the whole spacer length as cutoffs). To further complement the host assignment from IMG/VR version 1.0, we used a classification of the viral protein families (used in the virus identification pipeline) to determine the domain (Eukaryotic, Bacterial, or Archaeal) of the host predicted for 85.6% of the viral sequences described in Paez-Espino et al 40 ( Supplementary Table 3). Briefly, the viral protein families were benchmarked against the viral RefSeq genomes and the viral genomes with predicted host from the prokaryotic virus of orthologous groups database 50 obtaining a subset of them used as host-type marker genes.…”
Section: Methodsmentioning
confidence: 99%
“…2 and 3) and these contigs were primarily from vegetated biomes (Extended Data Tables 2 and 3). Also, when analyzing the entire IMG/VR virus database 40 , viral chitosanases were exclusively found in soil or freshwater viruses (66% vs. 33% of the cases, respectively). This finding highlights the potential importance of viral-encoded chitosanase functions in bacteria in ecosystems where chitin is abundant.…”
Section: A New Paradigm: Cross-domain Transfer Of Biogeochemical Funcmentioning
confidence: 99%
“…Likewise, we did not find evidence of this gene in Amorphea 230 (which includes Amoebozoa, animals, fungi, and related protists). Although these initial results 231 implied a distribution that might be truly restricted to green plants (Viridiplantae), searches of other 232 major eukaryotic lineages found that plant-like MSH1 homologs carrying the characteristic GIY-YIG and another gammaproteobacterium of uncertain classification, as well as some unclassified viruses curated from environmental and metagenomic datasets (42,43). Phylogenetic analysis confirmed (24,48,49).…”
mentioning
confidence: 99%
“…To further 708 expand our search to include some of the vast amount of biological diversity that is unculturable and 709 only detected in environmental samples, we queried a sample of 2000 metagenome assemblies 710 from the JGI IMG/MER repository (43). We also searched against the IMG/VR database, which 711 houses the largest available collection of viral sequences from both sequenced isolates and 712 environmental samples (42). In cases where MSH1-like sequences were identified on metagenomic 713 scaffolds, we searched other proteins encoded in the flanking sequence against the NCBI nr 714 database to infer possible origins for the scaffold.…”
mentioning
confidence: 99%
“…The largest oral virome study was conducted in 2015, which generated and analyzed more than 100 Gb shotgun sequencing data from 25 samples (20 dental plaque specimens and 5 salivary) to primarily explore the phage-bacteria interaction network (12). More recently, shotgun metagenome sequencing of 3,042 samples from various environments, including the human oral cavity, was conducted in a project aimed at uncovering the Earth’s virome (13), which achieved an almost 3-fold increase in the metagenome samples (14). Re-analysis of the Earth’s virome data revealed signatures of genetic conflict invoked by the coevolution of phages and host oral bacteria enriched in the human oral cavity (15), suggesting it as an attractive system to study coevolution using metagenomic data.…”
Section: Introductionmentioning
confidence: 99%