Background The increasing number of clinical trials and their complexity make it challenging to detect and identify clinical quality issues timely. Despite extensive sponsor audit programs and monitoring activities, issues related to data integrity, safety, sponsor oversight and patient consent have recurring audit and inspection findings. Recent developments in data management and IT systems allow statistical modeling to provide insights to clinical Quality Assurance (QA) professionals to help mitigate some of the key clinical quality issues more holistically and efficiently. Methods We used findings from a curated data set from Roche/Genentech operational and quality assurance study data, covering a span of 8 years (2011-2018) and grouped them into 5 clinical impact factor categories, for which we modeled the risk with a logistic regression using hand crafted features. Results We were able to train 5 interpretable, cross-validated models with several distinguished risk factors, many of which confirmed field observations of our quality professionals. Our models were able to reliably predict a decrease in risk by 12-44%, with 2-8 coefficients each, despite a low signal-to-noise ratio in our data set. Conclusion We proposed a modeling strategy that could provide insights to clinical QA professionals to help them mitigate key clinical quality issues (e.g., safety, consent, data integrity) in a more sustained data-driven way, thus turning the traditional reactive approach to a more proactive monitoring and alerting approach. Also, we are calling for cross-sponsors collaborations and data sharing to improve and further validate the use of statistical models in clinical QA.
Dear Editor, In a previous project [1], we developed a predictive model that enabled Roche/Genentech quality leads oversight of adverse event (AE) reporting. External clinical trial datasets such as Project Data Sphere (PDS) [2] allowed us to further test our machine learning-based approach to alleviate concerns of overfitting and to demonstrate the reproducibility of our research.Our primary objective was to further validate our model for detection of AE under-reporting using PDS data. Our secondary objective was to build an oncology-specific model using a combined dataset of Roche and PDS data. The scope remained as predicting AEs-not adverse drug reactionsthat occur in clinical trials. Good clinical practice requires all AEs (regardless of the causal relationship between the drug intake and the events) to be reported in a timely manner [3].The curation process of downloadable PDS studies (as of November 2019) left five studies that fulfilled our data requirements, as sponsors are not required to share the full datasets. They were large phase III trials and included 742 investigator sites, 2363 subjects, and 51,847 visits. Hence, we could use PDS data to achieve our objectives.The oncology-specific model was built using the methodology described in our previous manuscript [1]. We used a combined dataset of 53 completed oncology studies (Roche + PDS). Our final model used 38 features built from patient and study attributes.To test whether our model can be applied to non-Roche studies, we compared the quality of the predictions using a scatter plot ( Fig. 1a) and found that, within a range of 0-150 on both axes (> 94% of all datapoints, our region of interest [ROI]), the predictions matched the observed values for both datasets equally well. To quantify the goodness of fit, we used scale-independent performance metrics (which are adequate for comparing the goodness of fit of different datasets used by the same model [4]): symmetric mean absolute percentage error (SMAPE) [5] and symmetric mean absolute poisson significance level (SMASL). The latter is calculated by subtracting 0.5 from each poisson significance level measurement, converting it to its absolute value, and taking the mean. SMASL puts equal weight on over-and under-predicting and has a range from 0 to 0.5 (i.e., The smaller the value the better the fit). Considering SMAPE, average predictions for the PDS study sites were slightly better than for the Roche study sites, whereas the reverse was true for SMASL ( Fig. 1b). We concluded that the goodness of fit for both datasets using our model was very similar within the ROI.For the secondary objective, we tested how well the oncology model (using Roche and PDS data and the same algorithm [1]) would detect simulated test cases on data not used for model training. For relevant simulation scenarios of 25%, 50%, and 75% under-reporting on the site level, our model scored an area under the curve (AUC) of the receiver operating characteristic curve of 0.60, 0.77, and 0.90, respectively. These AUC values were on...
In the majority of cancers, pathogenic variants are only found at the level of the tumor; however, an unusual number of cancers and/or diagnoses at an early age in a single family may suggest a genetic predisposition. Predisposition plays a major role in about 5 to 10% of adult cancers and in certain childhood tumors. As access to genomic testing for cancer patients continues to expand, the identi cation of Potential Germline Pathogenic Variants (PGPV) through tumor-DNA sequencing is also increasing. Statistical methods have been developed to infer the presence of a PGPV without the need of a matched normal sample. These methods are mainly used for exploratory research, for example in realworld Clinico-Genomic Databases/platforms (CGDB). These databases are being developed to support many applications such as targeted drug development, clinical trial optimization and post marketing studies. To ensure the integrity of data used for research, a quality management system should be established, and quality oversight activities should be conducted to assess and mitigate clinical quality risks (for patient safety and data integrity). As opposed to well-de ned "good practice" quality guidelines (GxP) areas such as good clinical practice, there are no comprehensive instructions on how to assess the clinical quality of statistically derived variables from sequencing data such as PGPV. In this report, we aim to share our strategy and propose a possible set of tactics to assess the PGPV quality and to ensure data integrity in exploratory research.
Clinical drug development is a complex and extensive process that entails multiple stakeholders alongside patients, requires large capital expenditures and takes nearly a decade on average to complete. To ensure the correct development of this process, rigorous quality activities must be conducted to assess and guarantee the Good Clinical and Pharmacovigilance Practices (GxP) for study compliance. For about 25 years, most of these activities have been performed in the form of audits, which implies a high volume of manual work and resources in addition to being reactive by nature. Due to the limitations of this approach, together with intent to leverage new technologies in the data analytics field, a more holistic, proactive and data-driven approach needed to take place. For this to happen, quality assurance expertise needed to be complemented by the data literacy skillset. To achieve this, the Data Analytics University (DAU) was created. An in-house training program composed by two pathways that provided a framework for clinical quality staff to develop their data analytics capabilities. The first pathway covers the basics of statistics, probability and data-related terminology, while the second deepens further into the topics covered in the former followed by hands-on activities to put the knowledge to test. After successful completion of 15 DAU sessions, over 310 trained staff were able to apply their learning on data analytics and solve potential issues that might arise with a given dataset. In the near future, the DAU will be made available externally as an e-learning training program.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.