Natural products (NPs) are small molecules produced by living organisms with potential applications in pharmacology and other industries as many of them are bioactive. This potential raised great interest in NP research around the world and in different application fields, therefore, over the years a multiplication of generalistic and thematic NP databases has been observed. However, there is, at this moment, no online resource regrouping all known NPs in just one place, which would greatly simplify NPs research and allow computational screening and other in silico applications. In this manuscript we present the online version of the COlleCtion of Open Natural prodUcTs (COCONUT): an aggregated dataset of elucidated and predicted NPs collected from open sources and a web interface to browse, search and easily and quickly download NPs. COCONUT web is freely available at https://coconut.naturalproducts.net.
Natural products (NPs) have been the centre of attention of the scientific community in the last decencies and the interest around them continues to grow incessantly. As a consequence, in the last 20 years, there was a rapid multiplication of various databases and collections as generalistic or thematic resources for NP information. In this review, we establish a complete overview of these resources, and the numbers are overwhelming: over 120 different NP databases and collections were published and re-used since 2000. 98 of them are still somehow accessible and only 50 are open access. The latter include not only databases but also big collections of NPs published as supplementary material in scientific publications and collections that were backed up in the ZINC database for commercially-available compounds. Some databases, even published relatively recently are already not accessible anymore, which leads to a dramatic loss of data on NPs. The data sources are presented in this manuscript, together with the comparison of the content of open ones. With this review, we also compiled the open-access natural compounds in one single dataset a COlleCtion of Open NatUral producTs (COCONUT), which is available on Zenodo and contains structures and sparse annotations for over 400,000 non-redundant NPs, which makes it the biggest open collection of NPs available to this date.
Contemporary bioinformatic and chemoinformatic capabilities hold promise to reshape knowledge management, analysis and interpretation of data in natural products research. Currently, reliance on a disparate set of non-standardized, insular, and specialized databases presents a series of challenges for data access, both within the discipline and for integration and interoperability between related fields. The fundamental elements of exchange are referenced structure-organism pairs that establish relationships between distinct molecular structures and the living organisms from which they were identified. Consolidating and sharing such information via an open platform has strong transformative potential for natural products research and beyond. This is the ultimate goal of the newly established LOTUS initiative, which has now completed the first steps toward the harmonization, curation, validation and open dissemination of 750,000+ referenced structure-organism pairs. LOTUS data is hosted on Wikidata and regularly mirrored on https://lotus.naturalproducts.net. Data sharing within the Wikidata framework broadens data access and interoperability, opening new possibilities for community curation and evolving publication models. Furthermore, embedding LOTUS data into the vast Wikidata knowledge graph will facilitate new biological and chemical insights. The LOTUS initiative represents an important advancement in the design and deployment of a comprehensive and collaborative natural products knowledge base.
Background The rise of antibiotic resistance (AR) in clinical settings is of great concern. Therefore, the understanding of AR mechanisms, evolution, and global distribution is a priority for patient survival. Despite all efforts in the elucidation of AR mechanisms in clinical strains, little is known about its prevalence and evolution in environmental microorganisms. We used 293 metagenomic samples from the TARA Oceans project to detect and quantify environmental antibiotic resistance genes (ARGs) using machine learning tools. Results After manual curation of ARGs, their abundance and distribution in the global ocean are presented. Additionally, the potential of horizontal ARG transfer by plasmids and their correlation with environmental and geographical parameters is shown. A total of 99,205 environmental open reading frames (ORFs) were classified as 1 of 560 different ARGs conferring resistance to 26 antibiotic classes. We found 24,567 ORFs in putative plasmid sequences, suggesting the importance of mobile genetic elements in the dynamics of environmental ARG transmission. Moreover, 4,804 contigs with >=2 putative ARGs were found, including 2 plasmid-like contigs with 5 different ARGs, highlighting the potential presence of multi-resistant microorganisms in the natural ocean environment. Finally, we identified ARGs conferring resistance to some of the most relevant clinical antibiotics, revealing the presence of 15 ARGs similar to mobilized colistin resistance genes (mcr) with high abundance on polar biomes. Of these, 5 are assigned to Psychrobacter, a genus including opportunistic human pathogens. Conclusions This study uncovers the diversity and abundance of ARGs in the global ocean metagenome. Our results are available on Zenodo in MySQL database dump format, and all the code used for the analyses, including a Jupyter notebook js avaliable on Github. We also developed a dashboard web application (http://www.resistomedb.com) for data visualization.
The emergence of Next Generation Sequencing generates an incredible amount of sequence and great potential for new enzyme discovery. Despite this huge amount of data and the profusion of bioinformatic methods for function prediction, a large part of known enzyme activities is still lacking an associated protein sequence. These particular activities are called “orphan enzymes”. The present review proposes an update of previous surveys on orphan enzymes by mining the current content of public databases. While the percentage of orphan enzyme activities has decreased from 38% to 22% in ten years, there are still more than 1,000 orphans among the 5,000 entries of the Enzyme Commission (EC) classification. Taking into account all the reactions present in metabolic databases, this proportion dramatically increases to reach nearly 50% of orphans and many of them are not associated to a known pathway. We extended our survey to “local orphan enzymes” that are activities which have no representative sequence in a given clade, but have at least one in organisms belonging to other clades. We observe an important bias in Archaea and find that in general more than 30% of the EC activities have incomplete sequence information in at least one superkingdom. To estimate if candidate proteins for local orphans could be retrieved by homology search, we applied a simple strategy based on the PRIAM software and noticed that candidates may be proposed for an important fraction of local orphan enzymes. Finally, by studying relation between protein domains and catalyzed activities, it appears that newly discovered enzymes are mostly associated with already known enzyme domains. Thus, the exploration of the promiscuity and the multifunctional aspect of known enzyme families may solve part of the orphan enzyme issue. We conclude this review with a presentation of recent initiatives in finding proteins for orphan enzymes and in extending the enzyme world by the discovery of new activities.ReviewersThis article was reviewed by Michael Galperin, Daniel Haft and Daniel Kahn.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.