Data Quality in the Human and Environmental Health Sciences: Using Statistical Confidence Scoring to Improve QSAR/QSPR Modeling

Steinmetz, Fabian P.; Madden, Judith C.; Cronin, Mark T.D.

doi:10.1021/acs.jcim.5b00294

Cited by 10 publications

(6 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The greater the number of replicas, the lower the correlation coefficients became. Since this study was a single trial with simple random noise, the results do not contradict those of the previous works ,. The duplication of data should be treated more carefully than it is in the simple duplication.…”

Section: Resultssupporting

confidence: 52%

See 1 more Smart Citation

Quantitative Structure‐activity Relationship (QSAR) Models for Docking Score Correction

Fukunishi

Yamasaki²,

Yasumatsu

et al. 2016

Molecular Informatics

View full text Add to dashboard Cite

In order to improve docking score correction, we developed several structure‐based quantitative structure activity relationship (QSAR) models by protein‐drug docking simulations and applied these models to public affinity data. The prediction models used descriptor‐based regression, and the compound descriptor was a set of docking scores against multiple (∼600) proteins including nontargets. The binding free energy that corresponded to the docking score was approximated by a weighted average of docking scores for multiple proteins, and we tried linear, weighted linear and polynomial regression models considering the compound similarities. In addition, we tried a combination of these regression models for individual data sets such as IC50, Ki, and %inhibition values. The cross‐validation results showed that the weighted linear model was more accurate than the simple linear regression model. Thus, the QSAR approaches based on the affinity data of public databases should improve docking scores.

show abstract

Section: Resultssupporting

confidence: 52%

“…Cortes‐Ciriano et al . suggested that the use of multiple replica data sets permutated by random noise could improve the QSAR accuracy . In the replica PCR method, the experimental data and docking scores are replicated by the permutation of 5 % noise.…”

Section: Methodsmentioning

confidence: 99%

Quantitative Structure‐activity Relationship (QSAR) Models for Docking Score Correction

Fukunishi

Yamasaki²,

Yasumatsu

et al. 2016

Molecular Informatics

View full text Add to dashboard Cite

show abstract

“…8,9 Other researchers concluded that high-quality data are crucial for adequately predicting quantitative structure-activity relationship (QSAR) models. 10,11 The demand for reliability of predictions emerged already when the first QSAR models and expert systems appeared. Hence, the assessment of prediction reliability was addressed by many researchers.…”

Section: Introductionmentioning

confidence: 99%

Building In Silico Models to Trigger Retesting: A Strategy on How to Use Predictive Models to Identify Potentially Incorrect In Vitro Intrinsic Clearance Results

Pitter

PeterssonCarl

ZanelliUgo

et al. 2019

Applied In Vitro Toxicology

View full text Add to dashboard Cite

Introduction: The results from biological assays, such as microsomal intrinsic clearance, are often associated with moderate to high variability. Nevertheless, it is crucial to disciplines, such as drug discovery and toxicological risk assessment, to trust such experimental results. In the following study, a novel approach is suggested, which is based on in silico predictions and confidence scoring triggering experimental retesting. Materials and Methods: After successful validation of in silico models and confidence scoring, experiments with correct predictions (n = 73) and incorrect predictions (n = 65), both with high confidence scores (CS > 0.7), were repeated. Results: While 4.1% of the correct predictions changed their experimental outcome toward a different class, the incorrect predictions led to a class change in 27.7% of the experiments. Discussion: Such an in silico approach has the potential to identify inaccurate/variable results, which may then be subject to retesting. This suggested retesting strategy will improve decision-making and overall data quality if applied for a longer period. This may also then improve in silico models. Conclusions: As in silico models contribute toward improving in vitro data (due to adequate retesting), and higher data quality leads to more accurately predicting in silico models, this concept can be described as a virtuous circle of data quality.

show abstract

“…This is a conservative approach where the lowest dose associated with a given toxic response is assumed is used. It is worth mentioning, that the presence of multiple, comparable values for the same chemical can increase confidence in the data and this may be expressed as a confidence score (CS), and the use of such data will consequently improves the robustness of the model [39].…”

Section: Introductionmentioning

confidence: 99%

Characterisation of data resources for in silico modelling: benchmark datasets for ADME properties

Przybylak

Madden

Covey-Crump

et al. 2017

Expert Opinion on Drug Metabolism & Toxicology

View full text Add to dashboard Cite

The cost of in vivo and in vitro screening of ADME properties of compounds has motivated efforts to develop a range of in silico models. At the heart of the development of any computational model are the data; high quality data are essential for developing robust and accurate models. The characteristics of a dataset, such as its availability, size, format and type of chemical identifiers used, influence the modelability of the data. Areas covered: This review explores the usefulness of publicly available ADME datasets for researchers to use in the development of predictive models. More than 140 ADME datasets were collated from publicly available resources and the modelability of 31 selected datasets were assessed using specific criteria derived in this study. Expert opinion: Publicly available datasets differ significantly in information content and presentation. From a modelling perspective, datasets should be of adequate size, available in a user-friendly format with all chemical structures associated with one or more chemical identifiers suitable for automated processing (e.g. CAS number, SMILES string or InChIKey). Recommendations for assessing dataset suitability for modelling and publishing data in an appropriate format are discussed.

show abstract

Data Quality in the Human and Environmental Health Sciences: Using Statistical Confidence Scoring to Improve QSAR/QSPR Modeling

Cited by 10 publications

References 38 publications

Quantitative Structure‐activity Relationship (QSAR) Models for Docking Score Correction

Quantitative Structure‐activity Relationship (QSAR) Models for Docking Score Correction

Building In Silico Models to Trigger Retesting: A Strategy on How to Use Predictive Models to Identify Potentially Incorrect In Vitro Intrinsic Clearance Results

Characterisation of data resources for in silico modelling: benchmark datasets for ADME properties

Contact Info

Product

Resources

About