The increasing availability of digitized biodiversity data worldwide, provided by an increasing number of institutions and researchers, and the growing use of those data for a variety of purposes have raised concerns related to the "fitness for use" of such data and the impact of data quality (DQ) on the outcomes of analyses, reports, and decisions. A consistent approach to assess and manage data quality is currently critical for biodiversity data users. However, achieving this goal has been particularly challenging because of idiosyncrasies inherent in the concept of quality. DQ assessment and management cannot be performed if we have not clearly established the quality needs from a data user’s standpoint. This paper defines a formal conceptual framework to support the biodiversity informatics community allowing for the description of the meaning of "fitness for use" from a data user’s perspective in a common and standardized manner. This proposed framework defines nine concepts organized into three classes: DQ Needs, DQ Solutions and DQ Report. The framework is intended to formalize human thinking into well-defined components to make it possible to share and reuse concepts of DQ needs, solutions and reports in a common way among user communities. With this framework, we establish a common ground for the collaborative development of solutions for DQ assessment and management based on data fitness for use principles. To validate the framework, we present a proof of concept based on a case study at the Museum of Comparative Zoology of Harvard University. In future work, we will use the framework to engage the biodiversity informatics community to formalize and share DQ profiles related to DQ needs across the community.
The quality of biodiversity data publicly accessible via aggregators such as GBIF (Global Biodiversity Information Facility), the ALA (Atlas of Living Australia), iDigBio (Integrated Digitized Biocollections), and OBIS (Ocean Biogeographic Information System) is often questioned, especially by the research community. The Data Quality Interest Group, established by Biodiversity Information Standards (TDWG) and GBIF, has been engaged in four main activities: developing a framework for the assessment and management of data quality using a fitness for use approach; defining a core set of standardised tests and associated assertions based on Darwin Core terms; gathering and classifying user stories to form contextual-themed use cases, such as species distribution modelling, agrobiodiversity, and invasive species; and developing a standardised format for building and managing controlled vocabularies of values. Using the developed framework, data quality profiles have been built from use cases to represent user needs. Quality assertions can then be used to filter data suitable for a purpose. The assertions can also be used to provide feedback to data providers and custodians to assist in improving data quality at the source. A case study, using two different implementations of tests and assertions based around the Darwin Core "Event Date" terms, were also tested against GBIF data, to demonstrate that the tests are implementation agnostic, can be run on large aggregated datasets, and can make biodiversity data more fit for typical research uses.
Background Animal pollination is an important ecosystem function and service, ensuring both the integrity of natural systems and human well-being. Although many knowledge shortfalls remain, some high-quality data sets on biological interactions are now available. The development and adoption of standards for biodiversity data and metadata has promoted great advances in biological data sharing and aggregation, supporting large-scale studies and science-based public policies. However, these standards are currently not suitable to fully support interaction data sharing. Results Here we present a vocabulary of terms and a data model for sharing plant–pollinator interactions data based on the Darwin Core standard. The vocabulary introduces 48 new terms targeting several aspects of plant–pollinator interactions and can be used to capture information from different approaches and scales. Additionally, we provide solutions for data serialization using RDF, XML, and DwC-Archives and recommendations of existing controlled vocabularies for some of the terms. Our contribution supports open access to standardized data on plant–pollinator interactions. Conclusions The adoption of the vocabulary would facilitate data sharing to support studies ranging from the spatial and temporal distribution of interactions to the taxonomic, phenological, functional, and phylogenetic aspects of plant–pollinator interactions. We expect to fill data and knowledge gaps, thus further enabling scientific research on the ecology and evolution of plant–pollinator communities, biodiversity conservation, ecosystem services, and the development of public policies. The proposed data model is flexible and can be adapted for sharing other types of interactions data by developing discipline-specific vocabularies of terms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.