A method for estimating the configurational (i.e., non-kinetic) part of the entropy of internal motion in complex molecules is introduced that does not assume any particular parametric form for the underlying probability density function. It is based on the nearest-neighbor (NN) distances of the points of a sample of internal molecular coordinates obtained by a computer simulation of a given molecule. As the method does not make any assumptions about the underlying potential energy function, it accounts fully for any anharmonicity of internal molecular motion. It provides an asymptotically unbiased and consistent estimate of the configurational part of the entropy of the internal degrees of freedom of the molecule. The NN method is illustrated by estimating the configurational entropy of internal rotation of capsaicin and two stereoisomers of tartaric acid, and by providing a much closer upper bound on the configurational entropy of internal rotation of a pentapeptide molecule than that obtained by the standard quasi-harmonic method. As a measure of dependence between any two internal molecular coordinates, a general coefficient of association based on the information-theoretic quantity of mutual information is proposed. Using NN estimates of this measure, statistical clustering procedures can be employed to group the coordinates into clusters of manageable dimensions and characterized by minimal dependence between coordinates belonging to different clusters.
Up to 60 million people working indoors experience symptoms such as eye, nose and throat irritation, headache, and fatigue. Investigations into these complaints have ascribed the effects to volatile organic compounds (VOCs) emitted from building materials, cleaning formulations, or other consumer products. New compounds can result when the VOCs react with hydroxyl or nitrate radicals or ozone present in indoor environments. Several oxygenated organic compounds, such as glyoxal, methylglyoxal, glycolaldehyde, and diacetyl, have been identified as possible reaction products of indoor environment chemistry. Although research has previously identified diacetyl and glyoxal as sensitizers, additional experiments were conducted in these studies to further classify their sensitization potential. Sensitization potential of these four compounds was assessed using quantitative structure-activity relationship (QSAR) programs. Derek for Windows and National Institute for Occupational Safety and Health logistic regression predicted all compounds to be sensitizers, while TOPKAT 6.2 predicted all compounds except for methylglyoxal. All compounds were tested in a combined irritancy and local lymph node assay (LLNA). All compounds except for glyoxal were found to be irritants and all tested positive in the LLNA with EC3 values ranging from 0.42 to 1.9%. Methylglyoxal significantly increased both the B220(+) and IgE(+)B220(+) cell populations in the draining lymph nodes and total serum IgE levels. The four compounds generated by indoor air chemistry were predicted by QSAR and animal modeling to be sensitizers, with the potential for methylglyoxal to induce IgE. The identification of these compounds as sensitizers may help to explain some of the health effects associated with indoor air complaints.
For biomolecules in solution, changes in configurational entropy are thought to contribute substantially to the free energies of processes like binding and conformational change. In principle, the configurational entropy can be strongly affected by pairwise and higher-order correlations among conformational degrees of freedom. However, the literature offers mixed perspectives regarding the contributions that changes in correlations make to changes in configurational entropy for such processes. Here we take advantage of powerful techniques for simulation and entropy analysis to carry out rigorous in silico studies of correlation in binding and conformational changes. In particular, we apply information-theoretic expansions of the configurational entropy to well-sampled molecular dynamics simulations of a model host–guest system and the protein bovine pancreatic trypsin inhibitor. The results bear on the interpretation of NMR data, as they indicate that changes in correlation are important determinants of entropy changes for biologically relevant processes and that changes in correlation may either balance or reinforce changes in first-order entropy. The results also highlight the importance of main-chain torsions as contributors to changes in protein configurational entropy. As simulation techniques grow in power, the mathematical techniques used here will offer new opportunities to answer challenging questions about complex molecular systems.
The random forest and classification tree modeling methods are used to build predictive models of the skin sensitization activity of a chemical. A new two-stage backward elimination algorithm for descriptor selection in the random forest method is introduced. The predictive performance of the random forest model was maximized by tuning voting thresholds to reflect the unbalanced size of classification groups in available data. Our results show that random forest with a proposed backward elimination procedure outperforms a single classification tree and the standard random forest method in predicting Local Lymph Node Assay based skin sensitization activity. The proximity measure obtained from the random forest is a natural similarity measure that can be used for clustering of chemicals. Based on this measure, the clustering analysis partitioned the chemicals into several groups sharing similar molecular patterns. The improved random forest method demonstrates the potential for future QSAR studies based on a large number of descriptors or when the number of available data points is limited.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.