Are 2D fingerprints still valuable for drug discovery?

Gao, Kaifu; Nguyen, Duc Duy; Sresht, Vishnu; Mathiowetz, Alan M.; Tu, Meihua; Wei, Guowei

doi:10.1039/d0cp00305k

Cited by 101 publications

(108 citation statements)

References 60 publications

Supporting

Mentioning

108

Contrasting

Order By: Relevance

“…Only 2D molecular descriptors were utilized to train machine learning models. There are fields that 3D molecular descriptors perform better then 2D ones [20]. Some other applications of machine learning in predictions of organic compounds emission wavelengths were published [21][22].…”

Section: Discussionmentioning

confidence: 99%

An Attempt to Boost Molecular Descriptors with Quantum-Derived Features in Prediction of Maximum Emission Wavelengths of Chromophores

Fliszkiewicz¹

2021

Preprint

View full text Add to dashboard Cite

The following research assesses the capability of machine learning in predicting maximum emission wavelength of organic compounds. The predictions are based on structure descriptors and fingerprints widely applied in cheminformatics. In an attempt to further improve accuracy, developed machine learning models were enriched with quantum mechanics derived features. Multi linear, gradient boosting and random forest regressions were applied. Computers were trained and tested with database of experimental data of optical properties.

show abstract

Section: Discussionmentioning

confidence: 99%

An Attempt to Boost Molecular Descriptors with Quantum-Derived Features in Prediction of Maximum Emission Wavelengths of Chromophores

Fliszkiewicz¹

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…The first approach, used as the baseline, employs the Extended Connectivity Fingerprint (ECFP) as molecular representation. These bit vectors are widely used in the prediction of physicochemical properties, biological activity or toxicity of chemical compounds [24]. The model output is a real number, which is the estimated pIC 50 .…”

Section: The Predictor Modelmentioning

confidence: 99%

Diversity oriented Deep Reinforcement Learning for targeted molecule generation

et al. 2021

View full text Add to dashboard Cite

In this work, we explore the potential of deep learning to streamline the process of identifying new potential drugs through the computational generation of molecules with interesting biological properties. Two deep neural networks compose our targeted generation framework: the Generator, which is trained to learn the building rules of valid molecules employing SMILES strings notation, and the Predictor which evaluates the newly generated compounds by predicting their affinity for the desired target. Then, the Generator is optimized through Reinforcement Learning to produce molecules with bespoken properties. The innovation of this approach is the exploratory strategy applied during the reinforcement training process that seeks to add novelty to the generated compounds. This training strategy employs two Generators interchangeably to sample new SMILES: the initially trained model that will remain fixed and a copy of the previous one that will be updated during the training to uncover the most promising molecules. The evolution of the reward assigned by the Predictor determines how often each one is employed to select the next token of the molecule. This strategy establishes a compromise between the need to acquire more information about the chemical space and the need to sample new molecules, with the experience gained so far. To demonstrate the effectiveness of the method, the Generator is trained to design molecules with an optimized coefficient of partition and also high inhibitory power against the Adenosine $$A_{2A}$$ A 2 A and $$\kappa$$ κ opioid receptors. The results reveal that the model can effectively adjust the newly generated molecules towards the wanted direction. More importantly, it was possible to find promising sets of unique and diverse molecules, which was the main purpose of the newly implemented strategy.

show abstract

“…Most 2D descriptors are calculated with absolute accuracy while the 3D descriptors carry the errors of the methodological approximations they have been calculated with (Raevsky et al, 2019). Admitting that the 3D descriptors provide more detailed information, such as atomic distances and energy data of the compounds, there is yet no clear evidence about their impacts on the solubility predictions (Balakin et al 2006;Gao et al, 2020;Yan et al, 2004;Salahinejad et al, 2013). Although a large number of chemical descriptors are available, it is usually preferred to use a modest number of relevant descriptors to avoid redundancy and overfitting issues during the training of ML models (Wang and Hou 2011).…”

Section: The Relevance Of Chemical Descriptorsmentioning

confidence: 99%