Chemical language models (CLMs) can be employed to design molecules with desired properties. CLMs generate new chemical structures in the form of textual representations, such as the simplified molecular input line entry system (SMILES) strings. However, the quality of these de novo generated molecules is difficult to assess a priori. In this study, we apply the perplexity metric to determine the degree to which the molecules generated by a CLM match the desired design objectives. This model-intrinsic score allows identifying and ranking the most promising molecular designs based on the probabilities learned by the CLM. Using perplexity to compare “greedy” (beam search) with “explorative” (multinomial sampling) methods for SMILES generation, certain advantages of multinomial sampling become apparent. Additionally, perplexity scoring is performed to identify undesired model biases introduced during model training and allows the development of a new ranking system to remove those undesired biases.
Chemical language models (CLMs) can be employed to design molecules with desired properties. CLMs generate new chemical structures in the form of textual representations, such as the simplified molecular input line entry systems (SMILES) strings, in a rule-free manner. However, the quality of these de novo generated molecules is difficult to assess a priori. In this study, we apply the perplexity metric to determine the degree to which the molecules generated by a CLM match the desired design objectives. This model-intrinsic score allows identifying and ranking the most promising molecular designs based on the probabilities learned by the CLM. Using perplexity to compare “greedy” (beam search) with “explorative” (multinomial sampling) methods for SMILES generation, certain advantages of multinomial sampling become apparent. Additionally, perplexity scoring is performed to identify undesired model biases introduced during model training and allows the development of a new ranking system to remove those undesired biases.
Molecular dynamics simulations enable the study of the motion of small and large (bio)molecules and the estimation of their conformational ensembles. The description of the environment (solvent) has, therefore, a large impact. Implicit solvent representations are efficient but, in many cases, not accurate enough (especially for polar solvents, such as water). More accurate but also computationally more expensive is the explicit treatment of the solvent molecules. Recently, machine learning has been proposed to bridge the gap and simulate, in an implicit manner, explicit solvation effects. However, the current approaches rely on prior knowledge of the entire conformational space, limiting their application in practice. Here, we introduce a graph neural network based implicit solvent that is capable of describing explicit solvent effects for peptides with different compositions than those contained in the training set.
Gas-phase Forster resonance energy transfer (FRET) combines mass spectrometry and fluorescence spectroscopy for the conformational analysis of mass-selected biomolecular ions. In FRET, fluorophore pairs are typically covalently attached to a biomolecule using short linkers, which affect the mobility of the dye and the relative orientation of the transition dipole moments of the donor and acceptor. Intramolecular interactions may further influence the range of motion. Yet, little is known about this factor, despite the importance of intramolecular interactions in the absence of a solvent. In this study, we applied transition metal ion FRET (tmFRET) to probe the mobility of a single chromophore pair (Rhodamine 110 and Cu 2+ ) as a function of linker lengths to assess the relevance of intramolecular interactions. Increasing FRET efficiencies were observed with increasing linker length, ranging from 5% (2 atoms) to 28% (13 atoms). To rationalize this trend, we profiled the conformational landscape of each model system using molecular dynamics (MD) simulations. We captured intramolecular interactions that promote a population shift toward smaller donor−acceptor separation for longer linker lengths and induce a significant increase in the acceptor's transition dipole moment. The presented methodology is a first step toward the explicit consideration of a fluorophore's range of motion in the interpretation of gas-phase FRET experiments.
Gas-phase Förster resonance energy transfer (FRET) combines the advantages of mass spectrometry and fluorescence spectroscopy for the conformational analysis of mass-selected biomolecules. While this implementation of FRET in the gas phase promises detailed insights for fundamental and applied studies, the gas-phase environment also poses great challenges. For FRET, fluorophore pairs are typically covalently attached to strategic binding sites in the backbone of a biomolecule, using short linkers. The linker further increases the mobility of the dye, contributing to rotational averaging of the relative orientation of the transition dipole moments of donor and acceptor. However, little is known about the fluorophore’s degrees of freedom in the gas phase and how it may be influenced by intramolecular interactions. In this study, we test the influence of a fluorophore’s linker length on the measured FRET efficiencies in the gas phase to probe the mobility of the fluorophore. An increased FRET efficiency was observed with increasing linker length, ranging from 5.3 % for a linker consisting of 2 atoms to 27.7 % for a linker length of 13 atoms. To rationalize this trend, we profiled the conformational landscape of each model system with MD simulations. Employing state-of-the-art enhanced sampling techniques, we captured intramolecular interactions that promote a population shift towards smaller donor-acceptor separation for longer linker lengths and induce a significant increase in their acceptor dipole. The presented methodology is a first step towards the explicit consideration of a fluoruophore’s range of motion in the interpretation of gas-phase FRET experiments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.