2020
DOI: 10.1038/s41597-020-00780-z
|View full text |Cite
|
Sign up to set email alerts
|

Obstacles to the reuse of study metadata in ClinicalTrials.gov

Abstract: Metadata that are structured using principled schemas and that use terms from ontologies are essential to making biomedical data findable and reusable for downstream analyses. The largest source of metadata that describes the experimental protocol, funding, and scientific leadership of clinical studies is ClinicalTrials.gov. We evaluated whether values in 302,091 trial records adhere to expected data types and use terms from biomedical ontologies, whether records contain fields required by government regulatio… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
31
0

Year Published

2020
2020
2025
2025

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 30 publications
(32 citation statements)
references
References 61 publications
1
31
0
Order By: Relevance
“…For instance, the guarantee of persistence is a problematic point for all FAIR principles; its use adds greater control and reusability over (meta)data. Interestingly, Genbank had lower results on these modules, which substantiates previous findings in the literature [5,12].…”
Section: Fairness Experimentssupporting
confidence: 90%
See 2 more Smart Citations
“…For instance, the guarantee of persistence is a problematic point for all FAIR principles; its use adds greater control and reusability over (meta)data. Interestingly, Genbank had lower results on these modules, which substantiates previous findings in the literature [5,12].…”
Section: Fairness Experimentssupporting
confidence: 90%
“…However, they represent a small part of a group of more than 1,364 databases linked to the life sciences area [10]. Many of these archives are highly consolidated and have a long history, but it is also highly recognized in the academy that these same databases present issues of the most varied types [5,[11][12][13].…”
Section: Genomic Databasesmentioning
confidence: 99%
See 1 more Smart Citation
“…Third, previously proposed methods of linking trials to publications (e.g., overall matching of textual similarity [18] or shared authors between trials and publications [14]) have limited predictive performance on their own. Fourth, the textual fields and metadata of trial registries are not well standardized, [17,23,24] which complicates the process of matching specific textual fields of trials to those of publications. Finally, ancillary publications may arise from a trial concerning a wide variety of issues, such as questionnaire development, GWAS studies carried out on trial subjects, reanalysis of data across multiple trials, and so on, which may not share word usage, topics, or investigators with the registered trial entry.…”
Section: Background and Significancementioning
confidence: 99%
“…Additionally, as the metadata that needs to be submitted is not strongly enforced, e.g. by the use of ontologies or MeSH terms, it can only be re-used to a limited extend [4]. Thereby, several works proposed methods for automatically analyzing the data of ClinicalTrials.gov for detecting unusual patterns due to policy changes [5] or errors in metadata [1][4].…”
Section: Introductionmentioning
confidence: 99%