Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be correctly interpreted and applied. For example, are two terms equivalent or merely related? Are they narrow or broad matches? Or are they associated in some other way? Such relationships between the mapped terms are often not documented, which leads to incorrect assumptions and makes them hard to use in scenarios that require a high degree of precision (such as diagnostics or risk prediction). Furthermore, the lack of descriptions of how mappings were done makes it hard to combine and reconcile mappings, particularly curated and automated ones. We have developed the Simple Standard for Sharing Ontological Mappings (SSSOM) which addresses these problems by: (i) Introducing a machine-readable and extensible vocabulary to describe metadata that makes imprecision, inaccuracy and incompleteness in mappings explicit. (ii) Defining an easy-to-use simple table-based format that can be integrated into existing data science pipelines without the need to parse or query ontologies, and that integrates seamlessly with Linked Data principles. (iii) Implementing open and community-driven collaborative workflows that are designed to evolve the standard continuously to address changing requirements and mapping practices. (iv) Providing reference tools and software libraries for working with the standard. In this paper, we present the SSSOM standard, describe several use cases in detail and survey some of the existing work on standardizing the exchange of mappings, with the goal of making mappings Findable, Accessible, Interoperable and Reusable (FAIR). The SSSOM specification can be found at http://w3id.org/sssom/spec. Database URL: http://w3id.org/sssom/spec
Background Genomics-driven discoveries of microbial species have provided extraordinary insights into the biodiversity of human microbiota. In addition, a significant portion of genetic variation between microbiota exists at the subspecies, or strain, level. High-resolution genomics to investigate species- and strain-level diversity and mechanistic studies, however, rely on the availability of individual microbes from a complex microbial consortia. High-throughput approaches are needed to acquire and identify the significant species- and strain-level diversity present in the oral, skin, and gut microbiome. Here, we describe and validate a streamlined workflow for cultivating dominant bacterial species and strains from the skin, oral, and gut microbiota, informed by metagenomic sequencing, mass spectrometry, and strain profiling. Results Of total genera discovered by either metagenomic sequencing or culturomics, our cultivation pipeline recovered between 18.1–44.4% of total genera identified. These represented a high proportion of the community composition reconstructed with metagenomic sequencing, ranging from 66.2–95.8% of the relative abundance of the overall community. Fourier-Transform Infrared spectroscopy (FT-IR) was effective in differentiating genetically distinct strains compared with whole-genome sequencing, but was less effective as a proxy for genetic distance. Conclusions Use of a streamlined set of conditions selected for cultivation of skin, oral, and gut microbiota facilitates recovery of dominant microbes and their strain variants from a relatively large sample set. FT-IR spectroscopy allows rapid differentiation of strain variants, but these differences are limited in recapitulating genetic distance. Our data highlights the strength of our cultivation and characterization pipeline, which is in throughput, comparisons with high-resolution genomic data, and rapid identification of strain variation.
Motivation Biomedical identifier resources (such as ontologies, taxonomies, and controlled vocabularies) commonly overlap in scope and contain equivalent entries under different identifiers. Maintaining mappings between these entries is crucial for interoperability and the integration of data and knowledge. However, there are substantial gaps in available mappings motivating their semi-automated curation. Results Biomappings implements a curation workflow for missing mappings which combines automated prediction with human-in-the-loop curation. It supports multiple prediction approaches and provides a web-based user interface for reviewing predicted mappings for correctness, combined with automated consistency checking. Predicted and curated mappings are made available in public, version-controlled resource files on GitHub. Biomappings currently makes available 9,274 curated mappings and 40,691 predicted ones, providing previously missing mappings between widely used identifier resources covering small molecules, cell lines, diseases, and other concepts. We demonstrate the value of Biomappings on case studies involving predicting and curating missing mappings among cancer cell lines as well as small molecules tested in clinical trials. We also present how previously missing mappings curated using Biomappings were contributed back to multiple widely used community ontologies. Availability The data and code are available under the CC0 and MIT licenses at https://github.com/biopragmatics/biomappings. Supplementary information Supplementary data are available at Bioinformatics online.
No abstract
Genomics-driven discovery of microbial species have provided extraordinary insights into the biodiversity of human microbiota. High resolution genomics to investigate species- and strain-level diversity and mechanistic studies, however, rely on the availability of individual microbes from a complex microbial consortia. Here, we describe and validate a streamlined workflow for cultivating microbes from the skin, oral, and gut microbiota, informed by metagenomic sequencing, mass spectrometry, and strain profiling.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.