Fast Kronecker Product Kernel Methods via Generalized Vec Trick

Airola, Antti; Pahikkala, Tapio

doi:10.1109/tnnls.2017.2727545

Cited by 24 publications

(9 citation statements)

References 56 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We constructed protein kinase kernel using normalized Smith–Waterman alignment scores between full amino acid sequences, and four Tanimoto compound kernels based on the following fingerprints implemented in rcdk R package 37 : (i) 881-bit fingerprint defined by PubChem (pubchem), (ii) path-based 1024-bit fingerprint (standard), (iii) 1024-bit fingerprint based on the shortest paths between atoms taking into account ring systems and charges (shortestpath), and (iv) extended connectivity 1024-bit fingerprint with a maximum diameter set to 6 (ECFP6; circular). We used CGKronRLS as a learning algorithm (implementation available at https://github.com/aatapa/RLScore ) 38 . We conducted a nested cross-validation in order to evaluate the generalization performance of CGKronRLS with each pair of kinase and compound kernels as well as to tune the regularization hyperparameter of the model.…”

Section: Methodsmentioning

confidence: 99%

“…To enable a fine-grained discrimination of binding affinities between similar targets (e.g., kinase family members), the team Q.E.D explicitly introduced similarity matrices of compounds and targets as input features into their regression model. The regression model was implemented as an ensemble version (uniformly averaged predictor) of 440 CGKronRLS regressors (CGKronRLS v0.81) 38 , 40 , but with different choices of regularization strengths [0.1, 0.5, 1.0, 1.5, 2.0], training epochs [400, 410, …, 500], and similarity matrices: the protein similarity matrix was derived based on the normalized striped Smith–Waterman alignment scores 41 between full protein sequences ( https://github.com/mengyao/Complete-Striped-Smith-Waterman-Library ). Eight different alternatives of compound similarity matrices were computed using both Tanimoto and Dice similarity metrics for different variants of 1024-bit Morgan fingerprints 42 (‘radius’ [2, 3] and ‘useChirality’ [True, False], implementation available at https://github.com/rdkit/rdkit ).…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Crowdsourced mapping of unexplored target space of kinase inhibitors

et al. 2021

View full text Add to dashboard Cite

Despite decades of intensive search for compounds that modulate the activity of particular protein targets, a large proportion of the human kinome remains as yet undrugged. Effective approaches are therefore required to map the massive space of unexplored compound–kinase interactions for novel and potent activities. Here, we carry out a crowdsourced benchmarking of predictive algorithms for kinase inhibitor potencies across multiple kinase families tested on unpublished bioactivity data. We find the top-performing predictions are based on various models, including kernel learning, gradient boosting and deep learning, and their ensemble leads to a predictive accuracy exceeding that of single-dose kinase activity assays. We design experiments based on the model predictions and identify unexpected activities even for under-studied kinases, thereby accelerating experimental mapping efforts. The open-source prediction algorithms together with the bioactivities between 95 compounds and 295 kinases provide a resource for benchmarking prediction algorithms and for extending the druggable kinome.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Crowdsourced mapping of unexplored target space of kinase inhibitors

et al. 2021

View full text Add to dashboard Cite

show abstract

“…A conventional Stochastic Gradient Descent (SGD) 58 can result in slow convergence. Therefore, we use an alternative approach that leverages the specific structure of our embedding φ , as was previously done in Airola and Pahikkala 59 . Specifically, we exploit: (1) the tensor product nature of φ and (2) the fact that the sizes n M and n P of the input databases are much smaller than the number n Z of interactions.…”

Section: Methodsmentioning

confidence: 99%

Drug-Target Interactions Prediction at Scale: the Komet Algorithm with the LCIdb Dataset

Guichaoua,

Pinel,

Hoffmann

et al. 2024

Preprint

View full text Add to dashboard Cite

Predicting drug-target interactions (DTIs) is crucial for drug discovery, and heavily relies on supervised learning techniques. In the context of DTI prediction, supervised learning algorithms use known DTIs to learn associations between molecule and protein features, allowing for the prediction of new interactions based on learned patterns. In this paper, we present a novel approach addressing two key challenges in DTI prediction: the availability of large, high-quality training datasets and the scalability of prediction methods. First, we introduce LCIdb, a curated, large-sized dataset of DTIs, offering extensive coverage of both the molecule and druggable protein spaces. Notably, LCIdbcontains a much higher number of molecules, expanding coverage of the molecule space compared to traditional benchmarks. Second, we propose Komet (Kronecker Optimized METhod), a DTI prediction pipeline designed for scalability without compromising performance. Komet leverages a three-step framework, incorporating efficient computation choices tailored for large datasets and involving the Nyström approximation. Specifically, Komet employs a Kronecker interaction module for (molecule, protein) pairs, which is sufficiently expressive and whose structure allows for reduced computational complexity. Our method is implemented in open-source software, lever-aging GPU parallel computation for efficiency. We demonstrate the efficiency of our approach on various datasets, showing that Komet displays superior scalability and prediction performance compared to state-of-the-art deep-learning approaches. Additionally, we illustrate the generalization properties of Komet by showing its ability to solve challenging scaffold-hopping problems gathered in the publicly available LH benchmark. Komet is available open source athttps://komet.readthedocs.io.

show abstract

“…Gaussian processes have been used in many applications for temporal and spatial prediction such as environmental surveillance [ 19 ], reconstruction of sea surface temperatures [ 20 ], drug–target interaction prediction [ 21 ], global land-surface precipitation prediction [ 22 ], and wind power forecasting [ 23 ] as well as spatiotemporal modeling [ 24 , 25 ]. There is also a significant number of studies on Gaussian processes with application to epidemiology [ 26 – 29 ].…”

Section: Methodsmentioning

confidence: 99%

“…For large data sets, Gaussian processes might become computationally intensive. Several decomposition algorithms have been previously proposed to make the inference faster such as Nyström approximation [ 11 ], approximation using Hadamard and diagonal matrices [ 30 ], or Kronecker methods [ 21 , 31 – 36 ].…”

Section: Methodsmentioning

confidence: 99%

Spatiotemporal prediction of infectious diseases using structured Gaussian processes with application to Crimean–Congo hemorrhagic fever

Ergönül

Şencan

et al. 2018

PLoS Negl Trop Dis

View full text Add to dashboard Cite

BackgroundInfectious diseases are one of the primary healthcare problems worldwide, leading to millions of deaths annually. To develop effective control and prevention strategies, we need reliable computational tools to understand disease dynamics and to predict future cases. These computational tools can be used by policy makers to make more informed decisions.Methodology/Principal findingsIn this study, we developed a computational framework based on Gaussian processes to perform spatiotemporal prediction of infectious diseases and exploited the special structure of similarity matrices in our formulation to obtain a very efficient implementation. We then tested our framework on the problem of modeling Crimean–Congo hemorrhagic fever cases between years 2004 and 2015 in Turkey.Conclusions/SignificanceWe showed that our Gaussian process formulation obtained better results than two frequently used standard machine learning algorithms (i.e., random forests and boosted regression trees) under temporal, spatial, and spatiotemporal prediction scenarios. These results showed that our framework has the potential to make an important contribution to public health policy makers.

show abstract

Fast Kronecker Product Kernel Methods via Generalized Vec Trick

Cited by 24 publications

References 56 publications

Crowdsourced mapping of unexplored target space of kinase inhibitors

Crowdsourced mapping of unexplored target space of kinase inhibitors

Drug-Target Interactions Prediction at Scale: the Komet Algorithm with the LCIdb Dataset

Spatiotemporal prediction of infectious diseases using structured Gaussian processes with application to Crimean–Congo hemorrhagic fever

Contact Info

Product

Resources

About