Abstract-Biodiversity studies all life forms that we find in nature. The maintenance of biological diversity is important because it is essential to life on Earth. The lack of accurate spatial geographic information in species occurrence data, especially from diversity rich regions (like the Amazon Forest), leads to problems in many conservation activities, such as systematic planning for the protection of endangered species. In this paper, we present a gazetteer (a geographical directory that associate name places to geographic coordinates) for biodiversity data that is available as an Linked Open Data resource (using a GeoSPARQL Endpoint) and show how it can be used to improve inaccurate geographic collection data. We compared the efficiency of our Gazetteer with three openly available resources, Geonames, WikiMapia and Wikipedia, and got a 10% better recall rate than these endpoints. We also used the Gazetteer to correct geographic data from a big record sample (327,000 occurrence records) from SpeciesLink and GBIF (two big open access repositories of biodiversity occurrence data). In this data set, we were able to add geographic coordinates to around 14% of records that did not have them before.
The use of ontology presents a novel data integration resource, when centred in semantic definitions and the need for interoperability. Results from previews works indicate that ontologies can drive knowledge acquisition processes for the purpose of comprehensive, transportable machine understanding and knowledge management. Applied to the biodiversity domain, ontologies can be a valuable resource for strategic planning and its contribution toward conservation of the Amazon region.
Currently, Linked Open Data (LOD) have enabled integrated data sharing across disciplines over the Web. However, for LOD users, in areas such as biodiversity (which massively use the Web to disseminate data), the task of transforming data file contents in CSV (Comma Separated Value) to RDF (Resource Description Framework) is not trivial. We have developed a new approach to map data files in CSV to RDF format based on a domain-specific language (DSL) called BioDSL. Using it, biodiversity data users can write compact programs to map their data to RDF and link them to the LOD. Biodiversity vocabularies and ontologies, such as Darwin Core and OntoBio, can be used with BioDSL to enrich user data. Existing tools are exclusively focused on mapping (CSV to RDF), offering little or no support for linking data to the LOD (interconnecting user entities to LOD entities). They also are more complex to use than BioDSL.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.