This protocol describes how to determine for a body of research articles, whether underlying datasets have been openly shared. Statements on shared data are detected within articles using the ODDPub text mining algorithm, and are then further processed using an openness extraction form implemented in Numbat. This extraction form was developed to guide and document the manual validation of automatedly detected Open Data statements. For one article, several datasets are checked, one per dataset location. The extraction form consists of checks of data availability and reusability, loosely inspired by the FAIR principles. The resulting table gives an overview of, amongst others, dataset location, applied license, and data formats. Data sharing in supplements, data reuse and restricted data sharing are also documented as alternatives to open data.
To monitor the sharing of research data through repositories is increasingly of interest to institutions and funders, as well as from a meta-research perspective. Automated screening tools exist, but they are based on narrow and/or vague definitions of open data. Where manual validation has been performed, it was based on a small article sample. At our biomedical research institution, we developed detailed criteria for such a screening, as well as a workflow which combines an automated and a manual step, and considers both fully open and restricted-access data. We use the results for an internal incentivization scheme, as well as for a monitoring in a dashboard. Here, we describe in detail our screening procedure and its validation, based on automated screening of 10960 biomedical research articles, of which 1381 articles with potential data sharing were subsequently screened manually. The screening results were highly reliable, as witnessed by inter-rater reliability values of >0.8 in two different validation samples. We also report the results of the screening, both for our institution and an independent sample from a meta-research study. In the largest of the three samples, the 2021 institutional sample, underlying data had been openly shared for 7.9% of research articles. For an additional 1.1% of articles, restricted-access data had been shared, resulting in 8.4% of articles overall having open and/or restricted-access data. The extraction workflow is then discussed with regard to its applicability in different contexts, limitations, possible variations, and future developments. In summary, we present a comprehensive, validated, semi-automated workflow for the detection of shared research data underlying biomedical article publications.
This protocol was created as part of the Information Science Master's program project MWP5 Digital Information Management at the Humboldt University of Berlin. It aims to help a potential user group of the decision-making stakeholders to find the examined dataset for cluster modeling (https://doi.org/10.6084/m9.figshare.12743639.v1). Overall five opportunities are both described textually and shown visually. In all cases, except the DOI system, the search results can be reduced by applying multiple filters to find the wanted dataset faster and more precisely.
This protocol was created as part of the Information Science Master's program project MWP5 Digital Information Management at the Humboldt University of Berlin. It aims to help a potential user group of the decision-making stakeholders to find the examined dataset for cluster modeling (https://doi.org/10.6084/m9.figshare.12743639.v1). Overall five opportunities are both described textually and shown visually. In all cases, except the DOI system, the search results can be reduced by applying multiple filters to find the wanted dataset faster and more precisely.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.