No abstract
Scientists frequently collect biological and environmental information over years and store it in database systems to answer their own research questions without exposing it in repositories that make it easy to find and retrieve. While in recent years the community working on biodiversity informatics has made significant strides by creating common shared vocabularies such as the Darwin Core (DwC, Wieczorek et al. 2012) and publishing mechanisms such as the Integrated Publishing Toolkit (IPT, Robertson et al. 2014), integration is largely limited to the aggregation of datasets and full interoperability has still not been achieved. In this context, The Semantic Web (SW) aims to represent information in a way that, in addition to the human-centered display purposes, it can be used autonomously by machines for integration and reuse across applications. From the biodiversity informatics point of view, interoperability and links among data sources would allow integration of information that is otherwise disconnected, enabling scientists to answer broader questions. These considerations provide strong motivations to formulate a web application considering the semantic interoperability that may provide answers to questions such as the following: (Q1) Is it possible to complement taxonomic, bibliographic and environmental information of a particular species without relying on specific Application Programming Interfaces (APIs)? (Q2) How to relate occurrences of species with environmental variables within a specific region? (Q3) What are the bibliographic references associated with a given species? (Q1) Is it possible to complement taxonomic, bibliographic and environmental information of a particular species without relying on specific Application Programming Interfaces (APIs)? (Q2) How to relate occurrences of species with environmental variables within a specific region? (Q3) What are the bibliographic references associated with a given species? With questions such as these in mind, we present the design of a proof-of-concept application: Linked Open Biodiversity Data (LOBD). LOBD uses Linked Data (LD) (Heath and Bizer 2011) to complement species occurrence information previously extracted from GBIF and converted to Resource Description Framework (RDF) (Zárate et al. 2020) with information about the taxa in question from different RDF datasets, such as Wikidata, NCBI Taxonomy, Springer Nature SciGraph and OpenCitation corpus. A simplified view of the architecture is shown in Fig. 1. To achieve semantic interoperability, we use the SPARQL query language, which allows us not to depend on specific APIs to retrieve information. The application consists of three modules: General information, where the Wikidata endpoint is used to retrieve additional information about the selected species, including links to other databases and information about the species extracted from National Center for Biotechnology Information (NCBI) Taxonomy. Bibliography, where all publications related to the species are retrieved and extracted from OpenCitation. Environment, where users can plot species on a map and add layers related to marine regions as well as environmental layers (e.g., temperature, salinity, etc). General information, where the Wikidata endpoint is used to retrieve additional information about the selected species, including links to other databases and information about the species extracted from National Center for Biotechnology Information (NCBI) Taxonomy. Bibliography, where all publications related to the species are retrieved and extracted from OpenCitation. Environment, where users can plot species on a map and add layers related to marine regions as well as environmental layers (e.g., temperature, salinity, etc). For the development of the application, we use the Shiny framework for R, access to SPARQL endpoints is done through the SPARQL package, marine regions are obtained from marineregion.org and the environmental layers are extracted from Bio-ORACLE. The data used for this article were collected by the Center for the Study of Marine Systems at the National Patagonian Sci-Tech Centre (CCT CENPAT-CONICET), and are published and available through the GBIF network. Linked Data is a powerful tool for scientists, as it allows generating new approaches to biodiversity informatics, which can help to address the data integration challenges. Users would benefit from complementing the current prevalent use of vocabularies that are not ontologically defined (like DwC) for sharing biodiversity data. Although this application is a proof of concept, it shows that with little effort, it is possible to achieve greater interoperability between datasets that were not initially represented as LD.
Scientific publication services are changing drastically, researchers demand intelligent search services to discover and relate scientific publications. Publishersneed to incorporate semantic information to better organize their digital assets and make publications more discoverable. In this paper, we present the on-going work to publish a subset of scientific publications of CONICET Digital as Linked Open Data. The objective of this work is to improve the recovery andreuse of data through Semantic Web technologies and Linked Data in the domain of scientific publications.To achieve these goals, Semantic Web standards and reference RDF schema’s have been taken into account (Dublin Core, FOAF, VoID, etc.). The conversion and publication process is guided by the methodological guidelines for publishing government linked data. We also outline how these data can be linked to other datasets DBLP, WIKIDATA and DBPEDIA on the web of data. Finally, we show some examples of queries that answer questions that initially CONICET Digital does not allow
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.