One‐ to Four‐Dimensional Kernels for Virtual Screening and the Prediction of Physical, Chemical, and Biological Properties.

Azencott, Chloé-Agathe; Ksikes, Alexandre; Swamidass, S. Joshua; Chen, Jonathan; Ralaivola, Liva; Baldi, Pierre

doi:10.1002/chin.200733219

Cited by 3 publications

(5 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As in most chemoinformatics applications, such as the search of large databases of small molecules27-29 or the prediction of their physical, chemical, and biological properties,9,10,19,20,27 all the vHTS methods we implement depend on a quantitative notion of chemical similarity to define the local geometry of chemical space. The underlying intuition, explicitly articulated as the Similar Property Principle,30 is that the more structurally similar two molecules are, the more likely they are to have similar properties.…”

Section: Methodsmentioning

confidence: 99%

“…In both labeling schemes, the bonds are simply labeled according to their type (single, double, triple, or aromatic). In terms of graphical substructures, we consider both paths19,20 of depth d up to 2, 5, or 8 bonds, or circular substructures34 of depth d up to 2 or 3 bonds. Thus the fingerprint components index all the labeled paths, or all the labeled trees, up to a certain depth.…”

Section: Methodsmentioning

confidence: 99%

“…MAX-SIM is a particularly simple algorithm, a useful baseline for comparison. In contrast, SVMs are a highly sophisticated class of methods which have been successfully applied to other chemical classification problems19,20 and are expected to yield high performances on vHTS datasets. Comparisons against the kNN method are not included because the kNN does not rank its hits and, in prior studies, has been consistently outperformed by SVMs on chemical data.…”

mentioning

confidence: 99%

See 2 more Smart Citations

Influence Relevance Voting: An Accurate And Interpretable Virtual High Throughput Screening Method

Swamidass

Azencott

Lin

et al. 2009

J. Chem. Inf. Model.

Self Cite

View full text Add to dashboard Cite

Given activity training data from Hight-Throughput Screening (HTS) experiments, virtual HighThroughput Screening (vHTS) methods aim to predict in silico the activity of untested chemicals. We present a novel method, the Influence Relevance Voter (IRV), specifically tailored for the vHTS task. The IRV is a low-parameter neural network which refines a k-nearest neighbor classifier by non-linearly combining the influences of a chemical's neighbors in the training set. Influences are decomposed, also non-linearly, into a relevance component and a vote component.The IRV is benchmarked using the data and rules of two large, open, competitions, and its performance compared to the performance of other participating methods, as well as of an in-house Support Vector Machine (SVM) method. On these benchmark datasets, IRV achieves state-of-theart results, comparable to the SVM in one case, and significantly better than the SVM in the other, retrieving three times as many actives in the top 1% of its prediction-sorted list.The IRV presents several other important advantages over SVMs and other methods: (1) the output predictions have a probabilistic semantic; (2) the underlying inferences are interpretable; (3) the training time is very short, on the order of minutes even for very large data sets; (4) the risk of overfitting is minimal, due to the small number of free parameters; and (5) additional information can easily be incorporated into the IRV architecture. Combined with its performance, these qualities make the IRV particularly well suited for vHTS.Virtual High-Throughput Screening (vHTS) is the cost-effective, in silico complement of experimental HTS. A vHTS algorithm uses data from HTS experiments to predict the activity of new sets of compounds in silico. Although vHTS is sometimes cast as a classification task, it is more appropriately described as a ranking task, where the goal is to rank additional compounds, such that active compounds are close to the the top of the prediction-sorted list as possible. The experiments required to verify a hit are expensive, so it is critical that true actives be recognized as early as possible. Accurately ordering actives by their degree of activity, however, is not critical. The vHTS task, therefore, differs from the `ranking' task of the machine learning literature, in that the goal is not to precisely order the chemicals in relation to each other, but rather to globally rank as many actives as possible above the bulk of the inactives. Furthermore, proper vHTS training data for the ranking task, is often unavailable. E-mail: sswamida@ics.uci NIH-PA Author ManuscriptNIH-PA Author Manuscript NIH-PA Author ManuscriptAn important algorithm proposed for vHTS is the k-Nearest Neighbor (kNN) classifier, a nonparametric method which has been shown effective in a number of other problems. 1,2 In the kNN approach, each new data point is classified by integrating information from its neighborhood in the training set in a very simple way. Specifically, a new data point is assigne...

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

mentioning

confidence: 99%

See 1 more Smart Citation

Influence Relevance Voting: An Accurate And Interpretable Virtual High Throughput Screening Method

Swamidass

Azencott

Lin

et al. 2009

J. Chem. Inf. Model.

Self Cite

View full text Add to dashboard Cite

show abstract

“…A universal way of tackling the problem of molecular flexibility was suggested in paper [61] for kernel-based methods. It consists in averaging kernels over all conformations for each molecule.…”

Section: Taking Into Account Molecular Flexibilitymentioning

confidence: 99%

Continuous Molecular Fields Approach Applied to Structure-Activity Modeling

Baskin

Zhokhova

2014

Challenges and Advances in Computational Chemistry and Physics

View full text Add to dashboard Cite

The Method of Continuous Molecular Fields is a universal approach to predict various properties of chemical compounds, in which molecules are represented by means of continuous fields (such as electrostatic, steric, electron density functions, etc). The essence of the proposed approach consists in performing statistical analysis of functional molecular data by means of joint application of kernel machine learning methods and special kernels which compare molecules by computing overlap integrals of their molecular fields. This approach is an alternative to traditional methods of building 3D "structure-activity" and "structureproperty" models based on the use of fixed sets of molecular descriptors. The methodology of the approach is described in this chapter, followed by its application to building regression 3D-QSAR models and conducting virtual screening based on one-class classification models. The main directions of the further development of this approach are outlined at the end of the chapter.

show abstract

“…Typically ML algorithms are divided into several classes: 1) supervised learning (generated a function that maps input data into desired outputs); 2) unsupervised learning (model a set of inputs, where no prior classification is given); 3) semi-supervised learning (generate an appropriate function or classifier); 4) reinforcement learning (learn how to act given an observation of the world, where every action has some impart in the environment, with feedback of it back to the algorithm); 5) transduction (predicts new outputs based on training inputs, outputs and test inputs); and 6) learning to learn (learns its own inductive bias based on previous experience) [1]. Different algorithms of ML have been applied successfully to solve real-life problems, for example in the context of bioinformatics [2][3][4][5][6] or chemoinformatics problems [7][8][9][10][11][12].…”

Section: In Nt Tr Ro Od Du Uc Ct Ti Io On Nmentioning

confidence: 99%

Mean-Field Theory of Meta-learning

Plewczyński

2012

Encyclopedia of the Sciences of Learning

View full text Add to dashboard Cite

A Ab bs st tr ra ac ct tWe discuss here the mean-field theory for a cellular automata model of meta-learning. The metalearning is the process of combining outcomes of individual learning procedures in order to determine the final decision with higher accuracy than any single learning method. Our method is constructed from an ensemble of interacting, learning agents, that acquire and process incoming information using various types, or different versions of machine learning algorithms. The abstract learning space, where all agents are located, is constructed here using a fully connected model that couples all agents with random strength values. The cellular automata network simulates the higher level integration of information acquired from the independent learning trials. The final classification of incoming input data is therefore defined as the stationary state of the meta-learning system using simple majority rule, yet the minority clusters that share opposite classification outcome can be observed in the system. Therefore, the probability of selecting proper class for a given input data, can be estimated even without the prior knowledge of its affiliation. The fuzzy logic can be easily introduced into the system, even if learning agents are build from simple binary classification machine learning algorithms by calculating the percentage of agreeing agents.

show abstract

One‐ to Four‐Dimensional Kernels for Virtual Screening and the Prediction of Physical, Chemical, and Biological Properties.

Cited by 3 publications

References 17 publications

Influence Relevance Voting: An Accurate And Interpretable Virtual High Throughput Screening Method

Influence Relevance Voting: An Accurate And Interpretable Virtual High Throughput Screening Method

Continuous Molecular Fields Approach Applied to Structure-Activity Modeling

Mean-Field Theory of Meta-learning

Contact Info

Product

Resources

About