The study of biodiversity spans many disciplines and includes data pertaining to species distributions and abundances, genetic sequences, trait measurements, and ecological niches, complemented by information on collection and measurement protocols. A review of the current landscape of metadata standards and ontologies in biodiversity science suggests that existing standards such as the Darwin Core terminology are inadequate for describing biodiversity data in a semantically meaningful and computationally useful way. Existing ontologies, such as the Gene Ontology and others in the Open Biological and Biomedical Ontologies (OBO) Foundry library, provide a semantic structure but lack many of the necessary terms to describe biodiversity data in all its dimensions. In this paper, we describe the motivation for and ongoing development of a new Biological Collections Ontology, the Environment Ontology, and the Population and Community Ontology. These ontologies share the aim of improving data aggregation and integration across the biodiversity domain and can be used to describe physical samples and sampling processes (for example, collection, extraction, and preservation techniques), as well as biodiversity observations that involve no physical sampling. Together they encompass studies of: 1) individual organisms, including voucher specimens from ecological studies and museum specimens, 2) bulk or environmental samples (e.g., gut contents, soil, water) that include DNA, other molecules, and potentially many organisms, especially microbes, and 3) survey-based ecological observations. We discuss how these ontologies can be applied to biodiversity use cases that span genetic, organismal, and ecosystem levels of organization. We argue that if adopted as a standard and rigorously applied and enriched by the biodiversity community, these ontologies would significantly reduce barriers to data discovery, integration, and exchange among biodiversity resources and researchers.
Legacy data from natural history collections contain invaluable and irreplaceable information about biodiversity in the recent past, providing a baseline for detecting change and forecasting the future of biodiversity on a human-dominated planet. However, these data are often not available in formats that facilitate use and synthesis. New approaches are needed to enhance the rates of digitization and data quality improvement. Notes from Nature provides one such novel approach by asking citizen scientists to help with transcription tasks. The initial web-based prototype of Notes from Nature is soon widely available and was developed collaboratively by biodiversity scientists, natural history collections staff, and experts in citizen science project development, programming and visualization. This project brings together digital images representing different types of biodiversity records including ledgers , herbarium sheets and pinned insects from multiple projects and natural history collections. Experts in developing web-based citizen science applications then designed and built a platform for transcribing textual data and metadata from these images. The end product is a fully open source web transcription tool built using the latest web technologies. The platform keeps volunteers engaged by initially explaining the scientific importance of the work via a short orientation, and then providing transcription “missions” of well defined scope, along with dynamic feedback, interactivity and rewards. Transcribed records, along with record-level and process metadata, are provided back to the institutions. While the tool is being developed with new users in mind, it can serve a broad range of needs from novice to trained museum specialist. Notes from Nature has the potential to speed the rate of biodiversity data being made available to a broad community of users.
In this commentary, the authors, an international group data curation researchers and educators, reflect on some of the challenges and opportunities for data curation in the wake of the COVID‐19 pandemic. We focus on some topics of particular interest to the information science community: data infrastructures for scholarly communication and research, the politicization of data curation and visualization for public‐facing “dashboards,” and human subjects research and policies. We conclude with some areas of opportunity and need, including broader and richer data curation education in the information schools, the establishment of better data management policy implementations by research funders, the award of formal academic credit for data curation activities and data sharing, and engagement in cooperative action around data ethics and security.
Site-Based Data Curation (SBDC) is an approach to managing research data that prioritizes sharing and reuse of data collected at scientifically significant sites. The SBDC framework is based on geobiology research at natural hot spring sites in Yellowstone National Park as an exemplar case of high value field data in contemporary, cross-disciplinary earth systems science. Through stakeholder analysis and investigation of data artifacts, we determined that meaningful and valid reuse of digital hot spring data requires systematic documentation of sampling processes and particular contextual information about the site of data collection. We propose a Minimum Information Framework for recording the necessary metadata on sampling locations, with anchor measurements and description of the hot spring vent distinct from the outflow system, and multi-scale field photography to capture vital information about hot spring structures. The SBDC framework can serve as a global model for the collection and description of hot spring systems field data that can be readily adapted for application to the curation of data from other kinds scientifically significant sites.
The use of small Unmanned Aircraft Systems (sUAS) as platforms for data capture has rapidly increased in recent years. However, while there has been significant investment in improving the aircraft, sensors, operations, and legislation infrastructure for such, little attention has been paid to supporting the management of the complex data capture pipeline sUAS involve. This paper reports on a four-year, community-based investigation into the tools, data practices, and challenges that currently exist for particularly researchers using sUAS as data capture platforms. The key results of this effort are: (1) sUAS captured data—as a set that is rapidly growing to include data in a wide range of Physical and Environmental Sciences, Engineering Disciplines, and many civil and commercial use cases—is characterized as both sharing many traits with traditional remote sensing data and also as exhibiting—as common across the spectrum of disciplines and use cases—novel characteristics that require novel data support infrastructure; and (2), given this characterization of sUAS data and its potential value in the identified wide variety of use case, we outline eight challenges that need to be addressed in order for the full value of sUAS captured data to be realized. We conclude that there would be significant value gained and costs saved across both commercial and academic sectors if the global sUAS user and data management communities were to address these challenges in the immediate to near future, so as to extract the maximal value of sUAS captured data for the lowest long-term effort and monetary cost.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.