Implementation
of the Clinical Data Interchange Standards Consortium (CDISC)’s
Standard for Exchange of Nonclinical Data (SEND) by the United States
Food and Drug Administration Center for Drug Evaluation and Research
(US FDA CDER) has created large quantities of SEND data sets and a
tremendous opportunity to apply large-scale data analytic approaches.
To fully realize this opportunity, differences in SEND implementation
that impair the ability to conduct cross-study analysis must be addressed.
In this manuscript, a prototypical question regarding historical control
data (see Table of Contents graphic) was used to identify areas for
SEND harmonization and to develop algorithmic strategies for nonclinical
cross-study analysis within a variety of databases. FDA CDER’s
repository of >1800 sponsor-submitted studies in SEND format was
queried using the statistical programming language R to gain insight
into how the CDISC SEND Implementation Guides are being applied across
the industry. For each component needed to answer the question (defined
as “query block”), the frequency of data population
was determined and ranged from 6 to 99%. For fields populated <90%
and/or that did not have Controlled Terminology, data extraction methods
such as data transformation and script development were evaluated.
Data extraction was successful for fields such as phase of study,
negative controls, and histopathology using scripts. Calculations
to assess accuracy of data extraction indicated a high confidence
in most query block searches. Some fields such as vehicle name, animal
supplier name, and test facility name are not amenable to accurate
data extraction through script development alone and require additional
harmonization to confidently extract data. Harmonization proposals
are discussed in this manuscript. Implementation of these proposals
will allow stakeholders to capitalize on the opportunity presented
by SEND data sets to increase the efficiency and productivity of nonclinical
drug development, allowing the most promising drug candidates to proceed
through development.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.