In the French West Indies, more than 20 species of cetaceans have been observed over the last decades. The recognition of this hotspot of biodiversity of marine mammals, observed in the French Exclusive Economic Zone of the West Indies, motivated the French government to create in 2010 a marine protected area (MPA) dedicated to the conservation of marine mammals: the Agoa Sanctuary. Threats that cetacean populations face are multiple, but well-documented. Cetacean conservation can only be achieved if relevant and reliable data are available, starting by occurrence data. In the Guadeloupe Archipelago and in addition to some data collected by the Agoa Sanctuary, occurrence data are mainly available through the contribution of citizen science and of local stakeholders (i.e. non-profit organisations (NPO) and whale-watchers). However, no observation network has been coordinated and no standards exist for cetacean presence data collection and management. In recent years, several whale watchers and NPOs regularly collected cetacean observation data around the Guadeloupe Archipelago. Our objective was to gather datasets from three Guadeloupean whale watchers, two NPOs and the Agoa Sanctuary, that agreed to share their data. These heterogeneous data went through a careful process of curation and standardisation in order to create a new extended database, using a newly-designed metadata set. This aggregated dataset contains a total of 4,704 records of 21 species collected in the Guadeloupe Archipelago from 2000 to 2019. The database was called Kakila ("who is there?" in Guadeloupean Creole). The Kakila database was developed following the FAIR principles with the ultimate objective of ensuring sustainability. All these data were transferred into the PNDB repository (Pöle National de Données de Biodiversité, Biodiversity French Data Hub, https://www.pndb.fr). In the Agoa Sanctuary and surrounding waters, marine mammals have to interact with increasing anthropogenic pressure from growing human activities. In this context, the Kakila database fulfils the need for an organised system to structure marine mammal occurrences collected by multiple local stakeholders with a common objective: contribute to the knowledge and conservation of cetaceans living in the French Antilles waters. Much needed data analysis will enable us to identify high cetacean presence areas, to document the presence of rarer species and to determine areas of possible negative interactions with anthropogenic activities.
Data quality and documentation are at the core of the FAIR (Findable, Accessible, Interoperable, Reusable) principles (Wilkinson et al. 2016). Regarding biodiversity and more broadly ecology domains, complementary solutions of the well-known data standard (notably through Darwin Core (Wieczorek et al. 2012)) orientation are emerging from the intensive use of EML (Ecological Metadata Language (Michener et al. 1997)) metadata standard. These notably capitalize on using: semantic annotation from EML metadata documents that describe data attributes, and FAIR quality assessment as proposed by DataOne (Data Observation Network for Earth) network. semantic annotation from EML metadata documents that describe data attributes, and FAIR quality assessment as proposed by DataOne (Data Observation Network for Earth) network. Here we propose to present this point of view by orchestrating the production of rich (with attributes description and links with terminological resources terms) EML metadata from raw datafiles and, through the generation of FAIR metrics for direct assessment of FAIRness and creation of data standards like Darwin Core. Using EML, we can describe each data attribute (e.g., name, type, unit) and associate each attribute one to several terms coming from terminological resources. Using the Darwin Core vocabulary as a terminological resource, we can thus associate, on the metadata file, original attributes terms to corresponding Darwin Core ones. Then, the data and their metadata files can be processed in order to automatically create the necessary files for a Darwin Core Archive. By acting at the metadata level, associated with accessible raw data files, we can associate raw attribute names to standardized ones, and then, potentially create data standards.
The French national biodiversity data hub (“Pôle National de Données de Biodiversité” - PNDB) is a national e-infrastructure created in 2018 and led by the National Museum of Natural History, contributing to the Open Science policy of the Ministry of Higher Education, Research and Innovation (MESRI). PNDB contributes to building an integrative framework taking into account biodiversity over the long term (from the origins of life to future models), at all biological scales (from the molecule to the socio-ecosystem), and in all its interactions, by providing tools and services for the description, access, validation, analysis and reuse of biodiversity data. With the diversity and complementary type of research biodiversity data (information systems, institutional data repositories, research infrastructures as observatories, experimental devices, natural history collections, etc.), but also from public policy data, the missions of the PNDB are deeply based on the FAIR approach (making data Findable, Accessible, Interoperable, Reusable). Thanks to its nomination in 2022 as a thematic reference center of the MESRI, PNDB will contribute to promoting the FAIR approach, will increase the skills (e.g., by training, good practices) of the scientific communities around open science, and stimulate interactions between producers and users of biodiversity data. PNDB has led the French participation to GEO BON (Group on Earth Observations Biodiversity Observation Network) since 2018 and recently shared the lead with public policies information system coordination. Thanks to this co-lead, this national BON proposes an innovative coordination of all biodiversity monitoring programs, from expertise to research around an innovative Essential Biodiversity Variable (EBV) operationalization pilot. This pilot is made of open practical solutions providing a particular high degree of FAIRNess of biodiversity research objects, from data to source codes. PNDB is also a major European point of contact for the DataOne network, who, in combination with the strong link between PNDB and French Global Biodiversity Information Facility (GBIF) node colleagues, allows the dissemination of all types of data through the world in the best manner!
Integration of biological data with different ecological scales is complex! The biodiversity community (scientists, policy makers, managers, citizen, NGOs) needs to build a framework of harmonized and interoperable data from raw, heterogeneous and scattered datasets. Such a framework will help observation, measurement and understanding of the spatio-temporal dynamic of biodiversity from local to global scales. One of the most relevant approaches to reach that aim is the concept of Essential Biodiversity Variables
"FAIR (Findable, Accessible, Interoperable, Reusable) principles" (Wilkinson et al. 2016) and "open science" are two complementary movements in biodiversity science. Although we need to transition to making scientific data and associated material more FAIR, this does not necessarily imply open data or open source algorithms. Here, based on the experience of the French Biodiversity Data Hub ("Pôle national de données de Biodiversité" - PNDB), which is an e-infrastructure for and by researchers, we want to showcase how focusing on openness can be a strategy to efficiently reach greater FAIRness. Following DataOne guidance, we can build a complete data/metadata ecosystem allowing us to structure heterogeneous environmental information systems. Using the Galaxy analysis platform and its related initiatives (Galaxy training network, European Erasmus+ Gallantries project, bioconda, bioContainer), we can thus illustrate how we can create transparent, peer-reviewed and accessible tools and workflows and collaborative training material. The Galaxy platform also facilitates use of high performance computing infrastructure through notably the European Open Science Cloud marketplace. Finally, through our experiences contributing to open source projects like EML (Ecological Metadata Language (Michener et al. 1997)) Assembly Line, EDI (Environmental Data Initiative, or PAMPA (Indicators of Marine Protected Areas performance for managing coastal ecosystems, resources and their uses), a French platform to help protected areas managers to standardize and analyse their data, we also show how building open source "doors" through the R Shiny programming language to these environments can be beneficial for all.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.