2023
DOI: 10.1038/s41597-023-01968-9
|View full text |Cite
|
Sign up to set email alerts
|

Developing a standardized but extendable framework to increase the findability of infectious disease datasets

Abstract: Biomedical datasets are increasing in size, stored in many repositories, and face challenges in FAIRness (findability, accessibility, interoperability, reusability). As a Consortium of infectious disease researchers from 15 Centers, we aim to adopt open science practices to promote transparency, encourage reproducibility, and accelerate research advances through data reuse. To improve FAIRness of our datasets and computational tools, we evaluated metadata standards across established biomedical data repositori… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5

Relationship

1
4

Authors

Journals

citations
Cited by 9 publications
(4 citation statements)
references
References 45 publications
0
4
0
Order By: Relevance
“…To define which properties need to be collected about a dataset, a schema defines the set of field names within the data and what they represent (i.e., description (dataset description), creator (author(s) who generated and/or processed the data), measurementTechnique (experimental technique(s) used to collect the data), etc. 1 ). More detailed schemas also define the allowable values (controlled vocabularies, or ontologies, which are formal representations of allowed values and their relationship to each other) and constraints such as type or expected number for each property.…”
mentioning
confidence: 99%
See 1 more Smart Citation
“…To define which properties need to be collected about a dataset, a schema defines the set of field names within the data and what they represent (i.e., description (dataset description), creator (author(s) who generated and/or processed the data), measurementTechnique (experimental technique(s) used to collect the data), etc. 1 ). More detailed schemas also define the allowable values (controlled vocabularies, or ontologies, which are formal representations of allowed values and their relationship to each other) and constraints such as type or expected number for each property.…”
mentioning
confidence: 99%
“…Often, data standards, tools, software, platforms, and resources are developed as pilot projects or as side effects of hypothesis-driven scientific grants. For example, the NIAID Systems Biology Data Dissemination Working Group developed and implemented an infectious disease-specific Dataset and ComputationalTool schema, increasing the FAIRness of nearly 400 datasets and computational tools using it 1 . The schema is straightforward, yet has potential to exponentially enhance biological and biomedical dataset accessibility and reuse via increased exposure through dataset aggregation projects like Google Dataset Search.…”
mentioning
confidence: 99%
“…However, we identified significant gaps between current documentation practice and M1 DFM ’s requirements, suggesting that software schemas being promulgated for use in FAIR representations include M1-FAIR properties and funders incentivize software developers to use them. A practical constraint identified by Tsueng et al [ 9 ] is that contributors to repositories only represent a small number of requested properties, influencing their decision to limit the number of required elements in their ComputationalTool schema to six, and vindicating ours.…”
Section: Discussionmentioning
confidence: 99%
“…Similarly, the researchers aim to investigate and tackle the challenges associated with ensuring the FAIRness (findability, accessibility, interoperability, reusability) of expanding biomedical data sets housed in diverse repositories. Their objective is to improve transparency, reproducibility, and the progress of research by promoting open science practices and the reuse of data [10]. The primary objective of constructing and sharing open data sets, metadata, related data set publications, and results is to encourage open data set benchmarking, replication, validation of research approaches, applied data analysis practices, detection of experimental errors and exploration of novel hypotheses [11], [12], [13].…”
Section: Introductionmentioning
confidence: 99%