2013
DOI: 10.1002/minf.201300051
|View full text |Cite
|
Sign up to set email alerts
|

Quality Issues with Public Domain Chemogenomics Data

Abstract: The key concept in chemogenomics is the similarity principle that states that similar ligands should bind similar targets. Chemogenomic analysis requires large amounts of data and both powerful computational algorithms and computers. Data used for chemogenomics analysis can either be compiled from open sources, or they can be produced in-house as is often done in the pharmaceutical industry. The chemogenomic modeller often has to resort to mixing activity values from different laboratories and even assay types… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0

Year Published

2014
2014
2020
2020

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 19 publications
(17 citation statements)
references
References 62 publications
(65 reference statements)
0
17
0
Order By: Relevance
“…However, compared to the Morgan2 fingerprint, the QAFFP fingerprints were able to retrieve significantly higher number of new scaffolds. These findings are rather encouraging given that (i) the QAFFP fingerprints are much shorter, (ii) the QAFFP fingerprints are defined on a purely data-driven fashion, without selecting the targets following biological reasons, and (iii) the models from which the QAFFP fingerprints are derived are far from perfect as their quality is influenced by, for example, QSAR modeling errors [107,108], experimental errors in publicly available data [109][110][111], data curation errors [69,112] or data imputation noise. On Table 4 The average number of ACSKs per an assay (and its standard error of the mean SEM) in 22 CLASS sets revealed by the Morgan2, rv-QAFFP and b-QAFFP fingerprints Model AD was estimated by an ICP with the confidence level of 90%.…”
Section: Discussionmentioning
confidence: 79%
“…However, compared to the Morgan2 fingerprint, the QAFFP fingerprints were able to retrieve significantly higher number of new scaffolds. These findings are rather encouraging given that (i) the QAFFP fingerprints are much shorter, (ii) the QAFFP fingerprints are defined on a purely data-driven fashion, without selecting the targets following biological reasons, and (iii) the models from which the QAFFP fingerprints are derived are far from perfect as their quality is influenced by, for example, QSAR modeling errors [107,108], experimental errors in publicly available data [109][110][111], data curation errors [69,112] or data imputation noise. On Table 4 The average number of ACSKs per an assay (and its standard error of the mean SEM) in 22 CLASS sets revealed by the Morgan2, rv-QAFFP and b-QAFFP fingerprints Model AD was estimated by an ICP with the confidence level of 90%.…”
Section: Discussionmentioning
confidence: 79%
“…If these guidelines would be adopted in all public databases, the quality of datasets for the development and evaluation of scoring functions would increase substantially. Kalliokoski et al recently published a review where the topic of quality in bioactivity databases is discussed in more detail …”
Section: Resultsmentioning
confidence: 99%
“…Kalliokoski et al recently published a review where the topic of quality in bioactivity databases is discussed in more detail. 65 Concerning databases which can be used for the geometrical analysis of protein-ligand interactions, the format of the stored data and the possibility to generate user-specific queries are very important aspects. CREDO stores all interactions in form of an interaction fingerprint.…”
Section: Resultsmentioning
confidence: 99%
“…It is known that some noise and various contradictions are stored in, and migrate from one source of bioactivity data to another, along with correct records (Kramer and Lewis, 2012 ; Kalliokoski et al, 2013 ; Tiikkainen et al, 2013 ; Papadatos et al, 2015 ). Thus, it is necessary to filter the data before using them in order to eliminate incorrect data and records that are inconsistent with the goal of the virtual screening study (Fourches et al, 2016 ).…”
Section: Methodsmentioning
confidence: 99%