Activity prediction plays an essential role in drug discovery by directing search of drug candidates in the relevant chemical space. Despite being applied successfully to image recognition and semantic similarity, the Siamese neural network has rarely been explored in drug discovery where modelling faces challenges such as insufficient data and class imbalance. Here, we present a Siamese recurrent neural network model (SiameseCHEM) based on bidirectional long short-term memory architecture with a self-attention mechanism, which can automatically learn discriminative features from the SMILES representations of small molecules. Subsequently, it is used to categorize bioactivity of small molecules via N -shot learning. Trained on random SMILES strings, it proves robust across five different datasets for the task of binary or categorical classification of bioactivity. Benchmarking against two baseline machine learning models which use the chemistry-rich ECFP fingerprints as the input, the deep learning model outperforms on three datasets and achieves comparable performance on the other two. The failure of both baseline methods on SMILES strings highlights that the deep learning model may learn task-specific chemistry features encoded in SMILES strings.
Proteins tend to bury hydrophobic residues inside their core during the folding process to provide stability to the protein structure and to prevent aggregation. Nevertheless, proteins do expose some ‘sticky’ hydrophobic residues to the solvent. These residues can play an important functional role, for example in protein-protein and membrane interactions. Here, we investigate how hydrophobic protein surfaces are by providing three measures for surface hydrophobicity: the total hydrophobic surface area, the relative hydrophobic surface area, and - using our MolPatch method - the largest hydrophobic patch. Secondly, we analyse how difficult it is to predict these measures from sequence: by adapting solvent accessibility predictions from NetSurfP2.0, we obtain well-performing prediction methods for the THSA and RHSA, while predicting LHP is more difficult. Finally, we analyse implications of exposed hydrophobic surfaces: we show that hydrophobic proteins typically have low expression, suggesting cells avoid an overabundance of sticky proteins. Availability https://github.com/ibivu/hydrophobic_patches
Proteomics studies have shown differential expression of numerous proteins in dementias but have rarely led to novel biomarker tests for clinical use. The Marie Curie MIRIADE project is designed to experimentally evaluate development strategies to accelerate the validation and ultimate implementation of novel biomarkers in clinical practice, using proteomics-based biomarker development for main dementias as experimental case studies. We address several knowledge gaps that have been identified in the field. First, there is the technology-translation gap of different technologies for the discovery (e.g., mass spectrometry) and the large-scale validation (e.g., immunoassays) of biomarkers. In addition, there is a limited understanding of conformational states of biomarker proteins in different matrices, which affect the selection of reagents for assay development. In this review, we aim to understand the decisions taken in the initial steps of biomarker development, which is done via an interim narrative update of the work of each ESR subproject. The results describe the decision process to shortlist biomarkers from a proteomics to develop immunoassays or mass spectrometry assays for Alzheimer's disease, Lewy body dementia, and frontotemporal dementia. In addition, we explain the approach to prepare the market implementation of novel biomarkers and assays. Moreover, we describe the development of computational protein state and interaction prediction models to support biomarker development, such as the prediction of epitopes. Lastly, we reflect upon activities involved in the biomarker development process to deduce a best-practice roadmap for biomarker development.
Numerous ligand-based drug discovery projects are based on structure-activity relationship (SAR) analysis, such as Free-Wilson (FW) or matched molecular pair (MMP) analysis. Intrinsically they assume linearity and additivity of substituent contributions. These techniques are challenged by nonadditivity (NA) in protein–ligand binding where the change of two functional groups in one molecule results in much higher or lower activity than expected from the respective single changes. Identifying nonlinear cases and possible underlying explanations is crucial for a drug design project since it might influence which lead to follow. By systematically analyzing all AstraZeneca (AZ) inhouse compound data and publicly available ChEMBL25 bioactivity data, we show significant NA events in almost every second assay among the inhouse and once in every third assay in public data sets. Furthermore, 9.4% of all compounds of the AZ database and 5.1% from public sources display significant additivity shifts indicating important SAR features or fundamental measurement errors. Using NA data in combination with machine learning showed that nonadditive data is challenging to predict and even the addition of nonadditive data into training did not result in an increase in predictivity. Overall, NA analysis should be applied on a regular basis in many areas of computational chemistry and can further improve rational drug design.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.