RNA viruses are diverse components of global ecosystems. The metagenomic identification of RNA viruses is currently limited to those with sequence similarity to known viruses, such that highly divergent viruses that comprise the "dark matter" of the virosphere remain challenging to detect. We developed a deep learning algorithm – LucaProt – to search for highly divergent RNA-dependent RNA polymerase (RdRP) sequences in 10,487 global meta-transcriptomes. LucaProt integrates both sequence and structural information to accurately and efficiently detect RdRP sequences. With this approach we identified 180,571 RNA viral species and 180 superclades (viral phyla/classes). This is the broadest diversity of RNA viruses described to date, including many viruses undetectable using BLAST or HMM approaches. The newly identified RNA viruses were present in diverse ecological niches, including the air, hot springs and hydrothermal vents, and both virus diversity and abundance varied substantially among ecological types. We also identified the longest RNA virus genome (nido-like) observed so far, at 47,250 nucleotides, and expanded the diversity of RNA bacteriophage to more than ten phyla/classes. This study marks the beginning of a new era of virus discovery, with the potential to redefine our understanding of the global virosphere and reshape our understanding of virus evolutionary history.
RNA viruses are diverse components of global ecosystems. The metagenomic identification of RNA viruses is currently limited to those with sequence similarity to known viruses, such that highly divergent viruses that comprise the "dark matter" of the virosphere remain challenging to detect. We developed a deep learning algorithm - LucaProt - to search for highly divergent RNA-dependent RNA polymerase (RdRP) sequences in 10,487 global meta-transcriptomes. LucaProt integrates both sequence and structural information to accurately and efficiently detect RdRP sequences. With this approach we identified 180,571 RNA viral species and 180 superclades (viral phyla/classes). This is the broadest diversity of RNA viruses described to date, including many viruses undetectable using BLAST or HMM approaches. The newly identified RNA viruses were present in diverse ecological niches, including the air, hot springs and hydrothermal vents, and both virus diversity and abundance varied substantially among ecological types. We also identified the longest RNA virus genome (nido-like) observed so far, at 47,250 nucleotides, and expanded the diversity of RNA bacteriophage to more than ten phyla/classes. This study marks the beginning of a new era of virus discovery, with the potential to redefine our understanding of the global virosphere and reshape our understanding of virus evolutionary history.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.