1999
DOI: 10.1021/ci9903049
|View full text |Cite
|
Sign up to set email alerts
|

Analysis of a Large Structure/Biological Activity Data Set Using Recursive Partitioning

Abstract: Combinatorial chemistry and high-throughput screening are revolutionizing the process of lead discovery in the pharmaceutical industry. Large numbers of structures and vast quantities of biological assay data are quickly being accumulated, overwhelming traditional structure/activity relationship (SAR) analysis technologies. Recursive partitioning is a method for statistically determining rules that classify objects into similar categories or, in this case, structures into groups of molecules with similar poten… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
195
0

Year Published

2004
2004
2010
2010

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 188 publications
(195 citation statements)
references
References 49 publications
0
195
0
Order By: Relevance
“…Statistical and chemoinformatics approaches for assessing the quality control of HTS data and for mining their chemical and biological information have been developed, for example, by incorporating pattern-detection methods for the identification of pipetting artefacts or for the detection of chemical-class-related effects. The development of chemoinformatics methods and procedures, such as RECURSIVE PARTITIONING, PHYLOGENETIC-LIKE TREE ALGORITHMS or BINARY QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIPS (QSARS) [81][82][83][84][85][86][87] , which support the automatic identification of hits that are frequently identified by HTS, false positives and negatives, as well as structure-activity relationship (SAR) information, is essential for generating knowledge from HTS data 83,[88][89][90] . The myriad efforts that surround the design of appropriate analytic tools have to cope with the difficulty of integrating disparate types of information, especially PARSING and assimilating both chemical and genomics information data (tools such as Scitegic Pipeline pilot or Kensington Inforsense provide the required integration concepts (see online links box)).…”
Section: Challenges and Limitationsmentioning
confidence: 99%
“…Statistical and chemoinformatics approaches for assessing the quality control of HTS data and for mining their chemical and biological information have been developed, for example, by incorporating pattern-detection methods for the identification of pipetting artefacts or for the detection of chemical-class-related effects. The development of chemoinformatics methods and procedures, such as RECURSIVE PARTITIONING, PHYLOGENETIC-LIKE TREE ALGORITHMS or BINARY QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIPS (QSARS) [81][82][83][84][85][86][87] , which support the automatic identification of hits that are frequently identified by HTS, false positives and negatives, as well as structure-activity relationship (SAR) information, is essential for generating knowledge from HTS data 83,[88][89][90] . The myriad efforts that surround the design of appropriate analytic tools have to cope with the difficulty of integrating disparate types of information, especially PARSING and assimilating both chemical and genomics information data (tools such as Scitegic Pipeline pilot or Kensington Inforsense provide the required integration concepts (see online links box)).…”
Section: Challenges and Limitationsmentioning
confidence: 99%
“…Decision trees, therefore, represent Boolean functions, even in the case of a larger range of outputs. 34 The version used here is Quinlan's C4.5. 35 A decision tree is an arrangement of tests that prescribes an appropriate test at every step in an analysis.…”
Section: E )mentioning
confidence: 99%
“…2325 Stanley Young and colleagues implemented a recursive partitioning method a few years ago that can be used to derive predictive models to distinguish active from inactive compounds. 20,21 This method, compared to some older implementations, has the advantage of being able to handle a large number of descriptors and very large compound data sets.…”
Section: Introductionmentioning
confidence: 99%