A main challenge in drug discovery is finding molecules with a desirable balance of multiple properties. Here, we focus on the task of molecular optimization, where the goal is to optimize a given starting molecule towards desirable properties. This task can be framed as a machine translation problem in natural language processing, where in our case, a molecule is translated into a molecule with optimized properties based on the SMILES representation. Typically, chemists would use their intuition to suggest chemical transformations for the starting molecule being optimized. A widely used strategy is the concept of matched molecular pairs where two molecules differ by a single transformation. We seek to capture the chemist’s intuition from matched molecular pairs using machine translation models. Specifically, the sequence-to-sequence model with attention mechanism, and the Transformer model are employed to generate molecules with desirable properties. As a proof of concept, three ADMET properties are optimized simultaneously: logD, solubility, and clearance, which are important properties of a drug. Since desirable properties often vary from project to project, the user-specified desirable property changes are incorporated into the input as an additional condition together with the starting molecules being optimized. Thus, the models can be guided to generate molecules satisfying the desirable properties. Additionally, we compare the two machine translation models based on the SMILES representation, with a graph-to-graph translation model HierG2G, which has shown the state-of-the-art performance in molecular optimization. Our results show that the Transformer can generate more molecules with desirable properties by making small modifications to the given starting molecules, which can be intuitive to chemists. A further enrichment of diverse molecules can be achieved by using an ensemble of models.
On the basis of 1,2,3-triols 1a approximately d, 1,2,3,4-tetraols 2a approximately h, and 1,2,3,4,5-pentaols 3a approximately p, NMR databases with four types of profile-descriptors ((13)C-, (1)H-, and (1)H(OH)-chemical shifts and vicinal spin-coupling constants) for contiguous polyols are reported. To systematically assess the relative values of these databases, a case study has been conducted on heptaols 4a approximately d, through which the gamma- and delta-effects have been recognized to refine the (13)C and (1)H chemical shift profile predicted via an application of the concept of self-contained nature. The magnitudes of gamma- and delta-effects depend on a specific stereochemical arrangement of the functional groups present in both the inside and outside of a self-contained box and are significant only for the stereoisomers belonging to a specific sub-group. With the exception of the stereochemical arrangement of functional groups belonging to a specific sub-group, the gamma- and delta-effects can, at the first order of approximation, be ignored for the stereochemical analysis of unknown compounds. For the stereoisomers belonging to a specific sub-group, it is necessary to refine, with incorporation of the gamma- and delta-effects, the profile predicted at the first order of approximation. With use of heptaols 4a approximately d, the values of (3)J(H,H) profiles have been assessed. Two methods, one using profiles consisting of three contiguous (3)J(H,H) constants and the other using profiles consisting of two contiguous (3)J(H,H) constants, have been developed. A stereochemical analysis based on three, or two, contiguous (3)J(H,H) profiles is operationally simpler than one based on (13)C and (1)H chemical shift profiles. Therefore, it is recommended to use a (3)J(H,H) profile as the primary device to predict the stereochemistry of unknown polyols and (13)C and (1)H chemical shift profiles as the secondary devices to confirm the predicted stereochemistry.
With use of the NMR databases in achiral and chiral solvents, the complete stereochemistry of tetrafibricin (1) has been elucidated without degradation of the carbon framework. [structure--see text]
Introducing trifluoromethyl groups is a common strategy to improve the properties of biologically active compounds. However, N-trifluoromethyl moieties on amines and azoles are very rarely used. To evaluate their suitability in drug design, we synthesized a series of N-trifluoromethyl amines and azoles, determined their stability in aqueous media, and investigated their properties. We show that N-trifluoromethyl amines are prone to hydrolysis, whereas N-trifluoromethyl azoles have excellent aqueous stability. Compared to their N-methyl analogues, N-trifluoromethyl azoles have a higher lipophilicity and can show increased metabolic stability and Caco-2 permeability. Furthermore, N-trifluoromethyl azoles can serve as bioisosteres of N-iso-propyl and N-tert-butyl azoles. Consequently, we suggest that N-trifluoromethyl azoles are valuable substructures to be considered in medicinal chemistry.
Molecular optimization aims to improve the drug profile of a starting molecule. It is a fundamental problem in drug discovery but challenging due to (i) the requirement of simultaneous optimization of multiple properties and (ii) the large chemical space to explore. Recently, deep learning methods have been proposed to solve this task by mimicking the chemist’s intuition in terms of matched molecular pairs (MMPs). Although MMPs is a widely used strategy by medicinal chemists, it offers limited capability in terms of exploring the space of structural modifications, therefore does not cover the complete space of solutions. Often more general transformations beyond the nature of MMPs are feasible and/or necessary, e.g. simultaneous modifications of the starting molecule at different places including the core scaffold. This study aims to provide a general methodology that offers more general structural modifications beyond MMPs. In particular, the same Transformer architecture is trained on different datasets. These datasets consist of a set of molecular pairs which reflect different types of transformations. Beyond MMP transformation, datasets reflecting general structural changes are constructed from ChEMBL based on two approaches: Tanimoto similarity (allows for multiple modifications) and scaffold matching (allows for multiple modifications but keep the scaffold constant) respectively. We investigate how the model behavior can be altered by tailoring the dataset while using the same model architecture. Our results show that the models trained on differently prepared datasets transform a given starting molecule in a way that it reflects the nature of the dataset used for training the model. These models could complement each other and unlock the capability for the chemists to pursue different options for improving a starting molecule.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.