2013
DOI: 10.1002/minf.201300076
|View full text |Cite
|
Sign up to set email alerts
|

Using Graph Indices for the Analysis and Comparison of Chemical Datasets

Abstract: In cheminformatics, compounds are represented as points in multidimensional space of chemical descriptors. When all pairs of points found within certain distance threshold in the original high dimensional chemistry space are connected by distance-labeled edges, the resulting data structure can be defined as Dataset Graph (DG). We show that, similarly to the conventional description of organic molecules, many graph indices can be computed for DGs as well. We demonstrate that chemical datasets can be effectively… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
18
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
6
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 25 publications
(18 citation statements)
references
References 29 publications
0
18
0
Order By: Relevance
“…Compared to other chemical space representations, networks have the additional advantage that they can be characterized and compared in detail using a variety of statistical approaches from general network science [10,11]. However, only very few network-like representations of chemical space have been reported thus far [12][13][14][15].…”
Section: Introductionmentioning
confidence: 99%
“…Compared to other chemical space representations, networks have the additional advantage that they can be characterized and compared in detail using a variety of statistical approaches from general network science [10,11]. However, only very few network-like representations of chemical space have been reported thus far [12][13][14][15].…”
Section: Introductionmentioning
confidence: 99%
“…Often neglected, these curation steps are critical to detect mis‐annotated chemicals, structural errors, activity cliffs, and inter/intra‐lab experimental variability. Critical when using data extracted from the literature, chemical curation helps maximizing the prediction performances of QSPR models . This is particularly true with the presence of structural duplicates (i. e., identical compounds present several times in the same dataset) that is known to lead to over‐optimistic estimations of the predictivity for developed QSAR models.…”
Section: Methodsmentioning
confidence: 99%
“…Predictive performance of QSAR models highly depends upon different characteristics (e.g., size, chemical diversity, activity distribution or presence of activity cliffs) of various data sets [ 49 51 ]. It may not be always possible to build reliable QSAR models for certain data sets.…”
Section: Automated Model Buildingmentioning
confidence: 99%