Metadata that are structured using principled schemas and that use terms from ontologies are essential to making biomedical data findable and reusable for downstream analyses. The largest source of metadata that describes the experimental protocol, funding, and scientific leadership of clinical studies is ClinicalTrials.gov. We evaluated whether values in 302,091 trial records adhere to expected data types and use terms from biomedical ontologies, whether records contain fields required by government regulations, and whether structured elements could replace free-text elements. Contact information, outcome measures, and study design are frequently missing or underspecified. Important fields for search, such as condition and intervention, are not restricted to ontologies, and almost half of the conditions are not denoted by MeSH terms, as recommended. Eligibility criteria are stored as semi-structured free text. Enforcing the presence of all required elements, requiring values for certain fields to be drawn from ontologies, and creating a structured eligibility criteria element would improve the reusability of data from ClinicalTrials.gov in systematic reviews, metanalyses, and matching of eligible patients to trials.
Objective: ClinicalTrials.gov is a registry of clinical-trial metadata whose use is required by many funding agencies and scientific publishers. Metadata are essential to the reuse of data, but issues such as heterogenous metadata schemas, inconsistent values, and usage of free text instead of controlled terms pervade many metadata repositories. Our objective is to evaluate the quality of metadata about clinical studies in ClinicalTrials.gov and to document strategies to improve metadata accuracy. Methods: Using 302,091 metadata records, we evaluated whether values adhere to type expectations for Boolean, integer, date, age, and value-set fields, and whether records contain fields required by the Food and Drug Administration. We tested whether values for condition and intervention use terms from biomedical ontologies, and whether values for eligibility criteria follow the recommended format. Results: For simple fields, records contain correctly typed values, but there are anomalies in value-set fields. Contact information, outcome measures, and study design are frequently missing or underspecified. Important fields for search, such as condition and intervention, are not restricted to ontology terms, and almost half of the values for condition are not from MeSH, as recommended. Eligibility criteria are stored as unstructured free text. Conclusions: ClinicalTrials.gov's data-entry system enforces a schema with type restrictions, freeing records from common issues in metadata repositories. However, lack of ontology restrictions or structure for the condition, intervention, and eligibility criteria elements significantly impairs reusability. Searchability of the database depends on infrastructure that maps free-text values to terms from UMLS ontologies.Metadata are the lifeblood of biomedical data. At the simplest level, metadata are data that describe other data. In practice, we expect metadata to be structured and standardized, and to be useful in making the underlying data findable and reusable. High-quality metadata enhance scientific reproducibility and transparency, allow researchers to pool studies to increase the statistical power of inferences,[1] and enable the use of "big data" machine learning techniques. International metadata repositories such as the National Center for Biotechnology Information's (NCBI) BioSample and the European Bioinformatics Institute's (EBI) BioSamples repositories encourage data reuse through the availability of comprehensive metadata. They each gather metadata from several different repositories of biological data into a centralized, searchable database. Ideally, they also ensure that metadata follow unified standards and schema regardless of the author, source, and format of the original data.Unfortunately, biomedical metadata are plagued by numerous quality issues. Hu et al. examined the quality of the metadata that accompany data records in the Gene Expression Omnibus (GEO) and found that they suffered from type inconsistency (e.g., numerical fields populated with non-numerica...
COVID-19 success stories from countries using contact tracing as an intervention tool for the pandemic have motivated US counties to pilot opt-in contact tracing applications. Contact tracing involves identifying individuals who came into physical contact with infected individuals. Recent studies show the effectiveness of contact tracing scales with the number of people using the applications. We hypothesize that the effectiveness of contact tracing also depends on the occupation of the user with a large-scale adoption in certain at risk occupations being particularly valuable for identifying emerging outbreaks. We build on an agent-based epidemiological simulator that resolves spatiotemporal dynamics to model San Francisco, CA, USA.Census, OpenStreetMap, SafeGraph, and Bureau of Labor Statistics data inform the agent dynamics and site characteristics in our simulator. We test different agent occupations that create the contact network, e.g. educators, office workers, restaurant workers, and grocery workers. We use Bayesian Optimization to determine transmission rates in San Francisco, which we validate with transmission rate studies that were recently conducted for COVID-19 in restaurants, homes and grocery stores. Our sensitivity analysis of different sights show that the practices that impact the transmission rate at schools have the greatest impact on the infection rate in San Francisco. The addition of occupation dynamics into our simulator increases the spreading rate of the virus, because each occupation has a different impact on the contact network of a city. We quantify the positive benefits of contact tracing adopted by at risk occupation workers on the community and distinguish the specific benefits on at risk occupation workers. We classify to which degree a certain occupation is at risk by quantifying the impact (a) the number of unique contacts and (b) the total number of contacts an individual has for any given work day on the virus spreading rate. We also attempt to constrain if, when, and for how long certain sites should be shut down once exposed to positive cases. Through our research, we are able to identify the occupations, like educators, that are at greatest risk. We use common geophysical data analysis techniques to bring a different set of insights into COVID-19 and policy research.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.