New Developments in QSPR/QSAR Modeling Based on Topological Indices

Lučić, Bono; Trinajstić, Nenad

doi:10.1080/10629369708039124

Cited by 21 publications

(14 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Thus, by the use of CROMRsel.f we can select the best model with six descriptors among 10 9 possible models (it takes about 10 h on Hewlett-Packard 9000/E55 computer, which is configured as a server) or the best model with five out of 104 descriptors (∼10 8 models, what takes 28 CPU min). Therefore, if we wish to express a certain physical or chemical property, or biological activity of a group of molecules as linear combination of descriptors, the problem we face is the selection of a set of I descriptors ( I = 1, ..., N ) from the set of N descriptors 11,26 which best approximate a given property or activity. , This problem was considered by a number of authors 6 but perhaps most consistently by Randić; however, they gave no instructions (algorithm, computer program) how to solve any problem of real complexity (selection of descriptors in a large descriptor space).…”

Section: Methodsmentioning

confidence: 99%

“…Therefore, the quality of the MR method was usually misjudged since the critical opinion was reached by considering models which were not the best possible MR models that could be obtained (except for very small sets of descriptors). We have shown in our previous reports − that by selecting the best possible descriptors to be used in MR modeling one obtains better models than those obtained using the usual approximate procedures for choosing descriptors. − Additionally, the selection of the best possible descriptors increases the stability of the coefficients in the MR model and thus the accuracy of the model also increases. In addition, all methods (NN, PLS, PCA) other than MR (except pure GA with a MR-like model) are not easy to relate from equation to equation, because relationships between the chemical structure and the activity of molecules are much more complex than in the case of MR (expressed by latent variables which vary from a model to model) …”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Multivariate Regression Outperforms Several Robust Architectures of Neural Networks in QSAR Modeling

Lučić

Trinajstić

1998

J. Chem. Inf. Comput. Sci.

116

View full text Add to dashboard Cite

In the past decade, many authors replaced multivariate regression (MR) by the neural networks (NNs) algorithm because they believed the latter to be superior. To verify this, we have undertaken a comparative investigation of the relationship between biological activities and substituent constants representing physicochemical parameters of the substituent groups of 37 carboquinones and 57 benzodiazepines using MR and NNs. A new method for the selection of descriptors in the best possible MR models is presented. The use of orthogonalization procedure makes the calculation of the statistical parameters (e.g. correlation coefficient, R) for each model much simpler, and the selection of the best MR models is accelerated. Such a procedure is applicable to QSAR modeling even for the selection of the best MR model with six descriptors from a set of 100 descriptors. In case one wants to select, for example, the best 15 out of 100 descriptors, a new procedure is developed for the stepwise selection of descriptors in MR models. Using this procedure, we selected not only one (which was the case in the old stepwise MR procedure) but two, three, or more new descriptors in each subsequent step and added them to descriptors selected up to the previous step. The same data sets were previously investigated by several (mainly robust) NN algorithms which contained a hidden layer (Aoyama, T. et al.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Multivariate Regression Outperforms Several Robust Architectures of Neural Networks in QSAR Modeling

Lučić

Trinajstić

1998

J. Chem. Inf. Comput. Sci.

116

View full text Add to dashboard Cite

show abstract

“…A major limitation of the CODESSA modeling subroutine, in common with other approaches for variable selection, is, for example, the impossibility of selecting the best possible two or three descriptors for MR models from a data set containing, for example, 200 descriptors, for which case the numbers of possible two- and three-descriptor models are 4950 and 161 700, respectively. This problem has been solved rigorously in several cases but, until now, only for relatively small data sets. − …”

Section: Introductionmentioning

confidence: 99%

A New Efficient Approach for Variable Selection Based on Multiregression: Prediction of Gas Chromatographic Retention Times and Response Factors

Lučić¹,

Trinajstić²,

Sild³

et al. 1999

J. Chem. Inf. Comput. Sci.

101

View full text Add to dashboard Cite

The selection of the most relevant variable is a frequent problem in the analysis of chemical data, especially now considering the large amounts of data created by the increased computer power and analytical resolution. A novel procedure for variable selection based on multiregression (MR) analysis is developed and applied to the quantitative structure−property relationship (QSPR) modeling of gas chromatographic retention times t R and Dietz response factors RF on 152 diverse chemical compounds. Using 296 descriptors generated by the CODESSA program, “absolutely the best” linear MR models containing from 1 to 5 descriptors were first selected (∼2 × 1010 models were checked), and then “the best” linear stepwise MR models with six and seven descriptors were obtained through “i by i” stepwise selection. In this paper i was varied from 1 to 4, so that in each next step i descriptors were added to the previously selected descriptors. Nonlinear models were developed by the inclusion of cross-products of initial descriptors. We selected as the most important descriptors for t R the number of C−H and C−X bonds, connectivity indices of order 3, the highest normal mode vibrational frequency, and the rotational entropy of the molecule at 300 K. In the case of RF modeling the most important descriptors are those related to the relative number and weight of effective C atoms, the orbital electronic population, and the bond order and valency of C and H atoms. Comparison with the best six-descriptor models obtained by the normal CODESSA procedure shows that nonlinear seven-descriptor MR models now obtained achieve 30% (0.3520 vs 0.5032) and 12% (0.0472 vs 0.0530) less standard errors of estimate for t R and RF, respectively. Our novel procedure of selecting a small number of the most important descriptors from a data set allows us to extract a larger amount of useful information than with the procedure implemented in CODESSA. Thus, our new procedure enables the selection of the best possible MR models from 1010 possibilities. Through the introduction of cross-product terms, we obtained nonlinear MR models which are superior to the corresponding linear models.

show abstract

“…Recent years have seen the publication of a plethora of QSPR methods for the prediction of boiling point, and it is impracticable to cover all of these in a review of this nature. Table 1 lists those from 1996 onwards [28–83]. Notice that many of the studies deal with specific classes of compounds, especially alkanes.…”

Section: Boiling Pointmentioning

confidence: 99%

Quantitative structure‐property relationships for prediction of boiling point, vapor pressure, and melting point

Dearden¹

2003

Enviro Toxic and Chemistry

125

102

View full text Add to dashboard Cite

Abstract-Boiling point, vapor pressure, and melting point are important physicochemical properties in the modeling of the distribution and fate of chemicals in the environment. However, such data often are not available, and therefore must be estimated. Over the years, many attempts have been made to calculate boiling points, vapor pressures, and melting points by using quantitative structure-property relationships, and this review examines and discusses the work published in this area, and concentrates particularly on recent studies. A number of software programs are commercially available for the calculation of boiling point, vapor pressure, and melting point, and these have been tested for their predictive ability with a test set of 100 organic chemicals.

show abstract

New Developments in QSPR/QSAR Modeling Based on Topological Indices

Cited by 21 publications

References 41 publications

Multivariate Regression Outperforms Several Robust Architectures of Neural Networks in QSAR Modeling

Multivariate Regression Outperforms Several Robust Architectures of Neural Networks in QSAR Modeling

A New Efficient Approach for Variable Selection Based on Multiregression: Prediction of Gas Chromatographic Retention Times and Response Factors

Quantitative structure‐property relationships for prediction of boiling point, vapor pressure, and melting point

Contact Info

Product

Resources

About