BackgroundIn recent years, research in artificial neural networks has resurged, now under the deep-learning umbrella, and grown extremely popular. Recently reported success of DL techniques in crowd-sourced QSAR and predictive toxicology competitions has showcased these methods as powerful tools in drug-discovery and toxicology research. The aim of this work was dual, first large number of hyper-parameter configurations were explored to investigate how they affect the performance of DNNs and could act as starting points when tuning DNNs and second their performance was compared to popular methods widely employed in the field of cheminformatics namely Naïve Bayes, k-nearest neighbor, random forest and support vector machines. Moreover, robustness of machine learning methods to different levels of artificially introduced noise was assessed. The open-source Caffe deep-learning framework and modern NVidia GPU units were utilized to carry out this study, allowing large number of DNN configurations to be explored.ResultsWe show that feed-forward deep neural networks are capable of achieving strong classification performance and outperform shallow methods across diverse activity classes when optimized. Hyper-parameters that were found to play critical role are the activation function, dropout regularization, number hidden layers and number of neurons. When compared to the rest methods, tuned DNNs were found to statistically outperform, with p value <0.01 based on Wilcoxon statistical test. DNN achieved on average MCC units of 0.149 higher than NB, 0.092 than kNN, 0.052 than SVM with linear kernel, 0.021 than RF and finally 0.009 higher than SVM with radial basis function kernel. When exploring robustness to noise, non-linear methods were found to perform well when dealing with low levels of noise, lower than or equal to 20%, however when dealing with higher levels of noise, higher than 30%, the Naïve Bayes method was found to perform well and even outperform at the highest level of noise 50% more sophisticated methods across several datasets.Electronic supplementary materialThe online version of this article (doi:10.1186/s13321-017-0226-y) contains supplementary material, which is available to authorized users.
2019 novel coronavirus (2019-nCoV) is widespread in China and other countries. The target of 2019-nCoV and severe acute respiratory syndrome coronavirus (SARS-CoV) is angiotensin-converting enzyme 2 (ACE2) positive cells. ACE2 is present in the salivary gland duct epithelium, and thus it could be the target of 2019-nCoV and SARS-CoV. SARS-CoV-related animal model experiments show that it can infect the epithelial cells on the salivary gland duct in Chinese rhesus macaques by targeting ACE2. Clinical studies confirmed that 2019-nCoV and SARS-CoV could be detected in saliva of human patients. We hypothesize that the infection of 2019-nCoV and SARS-CoV will lead to inflammatory pathological lesions in patients' target organs, and possibly inflammatory lesions in salivary glands. 2019-nCoV may cause acute sialoadenitis in the acute phase of infection. After the acute phase, chronic sialoadenitis may be caused by fibrosis repairment. Although there was no direct evidence to prove this, the available indirect evidence indicates a high probability of our hypothesis.
We introduce a novel method of indexing graph databases in order to facilitate subgraph isomorphism and similarity queries. The index is comprised of two major data structures. The primary structure is a directed acyclic graph which contains a node for each of the unique, induced subgraphs of the database graphs. The secondary structure is a hash table which crossindexes each subgraph for fast isomorphic lookup. In order to create a hash key independent of isomorphism, we utilize a code-based canonical representation of adjacency matrices, which we have further refined to improve computation speed. We validate the concept by demonstrating its effectiveness in answering queries for two practical datasets. Our experiments show that for subgraph isomorphism queries, our method outperforms existing methods by more than an order of magnitude.
Finding recurring residue packing patterns, or spatial motifs, that characterize protein structural families is an important problem in bioinformatics. To this end, we apply a novel frequent subgraph mining algorithm to three graph representations of protein threedimensional (3D) structure. In each protein graph, a vertex represents an amino acid. Vertex-residues are connected by edges using three approaches: first, based on simple distance threshold between contact residues; second using the Delaunay tessellation from computational geometry, and third using the recently developed almostDelaunay tessellation approach.Applying this approach to a set of graphs representing a protein family from the Structural Classification of Proteins (SCOP) database, we typically identify several hundred common subgraphs equivalent to common packing motifs found in the majority of proteins in the family. We also use the counts of motifs extracted from proteins in two different SCOP families as input variables in a binary classification experiment using Support Vector Machines. The resulting models are capable of predicting the protein family association with the accuracy exceeding 90 percent. Our results indicate that graphs based on both almost-Delaunay and Delaunay tessellations are more sparse than contact distance graph; yet the former afford similar accuracy of classification as the latter. The protein graph mining and classification approaches developed in this paper can be used for rapid and automated annotation of protein structures determined in structural genomics projects.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.