Persistent identifiers (PID) to identify digital representations of physical specimens in natural science collections (i.e., digital specimens) unambiguously and uniquely on the Internet are one of the mechanisms for digitally transforming collections-based science. Digital Specimen PIDs contribute to building and maintaining long-term community trust in the accuracy and authenticity of the scientific data to be managed and presented by the Distributed System of Scientific Collections (DiSSCo) research infrastructure planned in Europe to commence implementation in 2024. Not only are such PIDs valid over the very long timescales common in the heritage sector but they can also transcend changes in underlying technologies of their implementation. They are part of the mechanism for widening access to natural science collections. DiSSCo technical experts previously selected the Handle System as the choice to meet core PID requirements. Using a two-step approach, this options appraisal captures, characterises and analyses different alternative Handle-based PID schemes and the possible operational modes of use. In a first step a weighting and ranking the options has been applied followed by a structured qualitative assessment of social and technical compliance across several assessment dimensions: levels of scalability, community trust, persistence, governance, appropriateness of the scheme and suitability for future global adoption. The results are discussed in relation to branding, community perceptions and global context to determine a preferred PID scheme for DiSSCo that also has potential for adoption and acceptance globally. DiSSCo will adopt a ‘driven-by DOI’ persistent identifier (PID) scheme customised with natural sciences community characteristics. Establishing a new Registration Agency in collaboration with the International DOI Foundation is a practical way forward to support the FAIR (findable, accessible interoperable, reusable) data architecture of DiSSCo research infrastructure. This approach is compatible with the policies of the European Open Science Cloud (EOSC) and is aligned to existing practices across the global community of natural science collections.
DiSSCo (Distributed System of Scientific Collections) is a research infrastructure (RI) under development, which will provide services for the global research community to support and enhance physical and digital access to the natural history collections in Europe. These services include training, support, documentation and e-services. This talk will focus on the e-services and will give an overview of the current status, roadmap and first results as an introduction to the next talks in the session, which focus on some of the services in more detail and the standards work undertaken in Biodiversity Information Standards (TDWG) to enable them. The RI community will provide the envisioned e-services, which will use the novel FAIR Digital Object (FDO) infrastructure serving digital specimens from the European collections. The infrastructure will provide integrated data analysis, enhanced interpretation, annotation and access services for community curation and visualisation. The FDO infrastructure enables specimen data to be (re-)connected with genomic, geographical, morphological, taxonomic and environmental information through the digital specimen, making them Digital Extended Specimens. A large number of user stories have been collected through the DiSSCo-linked projects ICEDIG, SYNTHESYS+ and DiSSCo Prepare, to guide which e-Services to build and what functionality to provide. These user stories are publicly available in a github repository. The e-services are developed based on the user stories and prioritization provided by collection providers and the scientific community. A variety of mechanisms are used to collect input: surveys, workshops, roundtables and workpackage meetings, and feedback from users that have already been using beta versions of some of the services. DiSSCo aims to become operational in 2026 but several of the services are already being piloted or implemented. Experimental services and demonstrators are publicly available through DiSSCo Labs for testing and feedback. By connecting the specimen data with derived and related information in a FAIR way (Findable, Accessible, Interoperable and Reusable), the e-services will accelerate biodiversity discovery and support novel research questions. The FDO infrastructure has a data model that also integrates the PROV Ontology (PROV-O), which allows for the e-services to capture activities and improve the visibility of researcher contributions. This vision towards FAIR and high quality data is essential for community curation of the specimen data and making better use of the limited number of experts available. To provide the DiSSCo e-services in a FAIR way, the data derived from the natural history collections in Europe needs to be integrated as one virtual collection. The data has to be findable and accessible as soon as it is being created for services like a Specimen Data Refinery prior to publication in a facility like GBIF (Global Biodiversity Information Facility). This requires new standards for describing collections and specimen data. Standards being created to fill these gaps are TDWG CD (Collection Descriptions) and TDWG MIDS (Minimum Information about a Digital Specimen). The DiSSCo e-Services vision brings the data, standards, and processes together to serve the user community.
Persistent Identifier (PID) systems are the foundation for achieving the FAIR Guiding Principles (“findable, accessible, interoperable and reusable”). As FAIR data and connecting different data classes (i.e. specimens, genomics, observations, taxonomy and publications) are essential aspects of the BiCIKL project, we need a PID system at least at the European level to create and maintain identifiers for the digital representation of specimens and samples, called Digital Specimens (DS) (Hardisty et al. 2022). The PID system provides the mechanism to ensure that identifiers are globally unique, persistent and resolvable. This system should also manage associated metadata, facilitate provenance, enable discovery, manage states and the life cycle of the PID, link to other derived data and digital content, and allow content providers to enforce metadata constraints. For the successful provision of a PID system, this design document has been created to guide us during the implementation and operation phases. The document is based on an earlier milestone (MS28) that was used for discussion and evaluation with potential end-users.
International mass digitization efforts through infrastructures like the European Distributed System of Scientific Collections (DiSSCo), the US resource for Digitization of Biodiversity Collections (iDigBio), the National Specimen Information Infrastructure (NSII) of China, and Australia’s digitization of National Research Collections (NRCA Digital) make geo- and biodiversity specimen data freely, fully and directly accessible. Complementary, overarching infrastructure initiatives like the European Open Science Cloud (EOSC) were established to enable mutual integration, interoperability and reusability of multidisciplinary data streams including biodiversity, Earth system and life sciences (De Smedt et al. 2020). Natural Science Collections (NSC) are of particular importance for such multidisciplinary and internationally linked infrastructures, since they provide hard scientific evidence by allowing direct traceability of derived data (e.g., images, sequences, measurements) to physical specimens and material samples in NSC. To open up the large amounts of trait and habitat data and to link these data to digital resources like sequence databases (e.g., ENA), taxonomic infrastructures (e.g., GBIF) or environmental repositories (e.g., PANGAEA), proper annotation of specimen data with rich (meta)data early in the digitization process is required, next to bridging technologies to facilitate the reuse of these data. This was addressed in recent studies (Younis et al. 2018, Younis et al. 2020), where we employed computational image processing and artificial intelligence technologies (Deep Learning) for the classification and extraction of features like organs and morphological traits from digitized collection data (with a focus on herbarium sheets). However, such applications of artificial intelligence are rarely—this applies both for (sub-symbolic) machine learning and (symbolic) ontology-based annotations—integrated in the workflows of NSC’s management systems, which are the essential repositories for the aforementioned integration of data streams. This was the motivation for the development of a Deep Learning-based trait extraction and coherent Digital Specimen (DS) annotation service providing “Machine learning as a Service” (MLaaS) with a special focus on interoperability with the core services of DiSSCo, notably the DS Repository (nsidr.org) and the Specimen Data Refinery (Walton et al. 2020), as well as reusability within the data fabric of EOSC. Taking up the use case to detect and classify regions of interest (ROI) on herbarium scans, we demonstrate a MLaaS prototype for DiSSCo involving the digital object framework, Cordra, for the management of DS as well as instant annotation of digital objects with extracted trait features (and ROIs) based on the DS specification openDS (Islam et al. 2020). Source code available at: https://github.com/jgrieb/plant-detection-service
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.