With the rapidly increasing availability of High-Throughput Screening (HTS) data in the public domain, such as the PubChem database, methods for ligand-based computer-aided drug discovery (LB-CADD) have the potential to accelerate and reduce the cost of probe development and drug discovery efforts in academia. We assemble nine data sets from realistic HTS campaigns representing major families of drug target proteins for benchmarking LB-CADD methods. Each data set is public domain through PubChem and carefully collated through confirmation screens validating active compounds. These data sets provide the foundation for benchmarking a new cheminformatics framework BCL::ChemInfo, which is freely available for non-commercial use. Quantitative structure activity relationship (QSAR) models are built using Artificial Neural Networks (ANNs), Support Vector Machines (SVMs), Decision Trees (DTs), and Kohonen networks (KNs). Problem-specific descriptor optimization protocols are assessed including Sequential Feature Forward Selection (SFFS) and various information content measures. Measures of predictive power and confidence are evaluated through cross-validation, and a consensus prediction scheme is tested that combines orthogonal machine learning algorithms into a single predictor. Enrichments ranging from 15 to 101 for a TPR cutoff of 25% are observed.
Prediction of transmembrane spans and secondary structure from the protein sequence is generally the first step in the structural characterization of (membrane) proteins. Preference of a stretch of amino acids in a protein to form secondary structure and being placed in the membrane are correlated. Nevertheless, current methods predict either secondary structure or individual transmembrane states. We introduce a method that simultaneously predicts the secondary structure and transmembrane spans from the protein sequence. This approach not only eliminates the necessity to create a consensus prediction from possibly contradicting outputs of several predictors but bears the potential to predict conformational switches, i.e., sequence regions that have a high probability to change for example from a coil conformation in solution to an α-helical transmembrane state. An artificial neural network was trained on databases of 177 membrane proteins and 6048 soluble proteins. The output is a 3 × 3 dimensional probability matrix for each residue in the sequence that combines three secondary structure types (helix, strand, coil) and three environment types (membrane core, interface, solution). The prediction accuracies are 70.3% for nine possible states, 73.2% for three-state secondary structure prediction, and 94.8% for three-state transmembrane span prediction. These accuracies are comparable to state-of-the-art predictors of secondary structure (e.g., Psipred) or transmembrane placement (e.g., OCTOPUS). The method is available as web server and for download at www.meilerlab.org.
Nitrospirae spp. distantly related to thermophilic, sulfate-reducing Thermodesulfovibrio species are regularly observed in environmental surveys of anoxic marine and freshwater habitats. Here we present a metaproteogenomic analysis of Nitrospirae bacterium Nbg-4 as a representative of this clade. Its genome was assembled from replicated metagenomes of rice paddy soil that was used to grow rice in the presence and absence of gypsum (CaSO4·2H2O). Nbg-4 encoded the full pathway of dissimilatory sulfate reduction and showed expression of this pathway in gypsum-amended anoxic bulk soil as revealed by parallel metaproteomics. In addition, Nbg-4 encoded the full pathway of dissimilatory nitrate reduction to ammonia (DNRA), with expression of its first step being detected in bulk soil without gypsum amendment. The relative abundances of Nbg-4 were similar under both treatments, indicating that Nbg-4 maintained stable populations while shifting its energy metabolism. Whether Nbg-4 is a strict sulfate reducer or can couple sulfur oxidation to DNRA by operating the pathway of dissimilatory sulfate reduction in reverse could not be resolved. Further genome reconstruction revealed the potential to utilize butyrate, formate, H2, or acetate as an electron donor; the Wood-Ljungdahl pathway was expressed under both treatments. Comparison to publicly available Nitrospirae genome bins revealed the pathway for dissimilatory sulfate reduction also in related Nitrospirae recovered from groundwater. Subsequent phylogenomics showed that such microorganisms form a novel genus within the Nitrospirae, with Nbg-4 as a representative species. Based on the widespread occurrence of this novel genus, we propose for Nbg-4 the name “Candidatus Sulfobium mesophilum,” gen. nov., sp. nov.IMPORTANCE Rice paddies are indispensable for the food supply but are a major source of the greenhouse gas methane. If it were not counterbalanced by cryptic sulfur cycling, methane emission from rice paddy fields would be even higher. However, the microorganisms involved in this sulfur cycling are little understood. By using an environmental systems biology approach with Italian rice paddy soil, we could retrieve the population genome of a novel member of the phylum Nitrospirae. This microorganism encoded the full pathway of dissimilatory sulfate reduction and expressed it in anoxic paddy soil under sulfate-enriched conditions. Phylogenomics and comparison to the results of environmental surveys showed that such microorganisms are actually widespread in freshwater and marine environments. At the same time, they represent an undiscovered genus within the little-explored phylum Nitrospirae. Our results will be important for the design of enrichment strategies and postgenomic studies to further understanding of the contribution of these novel Nitrospirae spp. to the global sulfur cycle.
Selective potentiators of glutamate response at metabotropic glutamate receptor subtype 5 (mGluR5) have exciting potential for the development of novel treatment strategies for schizophrenia. A total of 1,382 compounds with positive allosteric modulation (PAM) of the mGluR5 glutamate response were identified through high-throughput screening (HTS) of a diverse library of 144,475 substances utilizing a functional assay measuring receptor-induced intracellular release of calcium. Primary hits were tested for concentration-dependent activity, and potency data (EC50 values) were used for training artificial neural network (ANN) quantitative structure−activity relationship (QSAR) models that predict biological potency from the chemical structure. While all models were trained to predict EC50, the quality of the models was assessed by using both continuous measures and binary classification. Numerical descriptors of chemical structure were used as input for the machine learning procedure and optimized in an iterative protocol. The ANN models achieved theoretical enrichment ratios of up to 38 for an independent data set not used in training the model. A database of ∼450,000 commercially available drug-like compounds was targeted in a virtual screen. A set of 824 compounds was obtained for testing based on the highest predicted potency values. Biological testing found 28.2% (232/824) of these compounds with various activities at mGluR5 including 177 pure potentiators and 55 partial agonists. These results represent an enrichment factor of 23 for pure potentiation of the mGluR5 glutamate response and 30 for overall mGluR5 modulation activity when compared with those of the original mGluR5 experimental screening data (0.94% hit rate). The active compounds identified contained 72% close derivatives of previously identified PAMs as well as 28% nontrivial derivatives of known active compounds.
From ANNs to NAMs! Data from an experimental metabotropic glutamate receptor 5 (mGlu5) high‐throughput screen (HTS) were employed to train artificial neural networks (ANNs) based on 345 confirmed negative allosteric modulators (NAMs) and 155 774 inactive compounds. This effort identified two potent mGlu5 NAMs with a unique chemotype. Optimization afforded a tool compound (shown), active in mouse models of anxiety and addiction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.