Predicting chemical structure using reinforcement learning with a stack-augmented conditional variational autoencoder

Kim, Hwanhee; Ko, Soohyun; Kim, Byung Ju; Ryu, Sangwon; Ahn, Jaegyoon

doi:10.1186/s13321-022-00666-9

Cited by 5 publications

(8 citation statements)

References 47 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For better generalization of moBRCA-net, we adopted a data augmentation based on the deep generative model to enlarge the training dataset size. Several recent papers have shown that conditional variational autoencoder (CVAE)-based data generation for certain minority classes in the imbalanced dataset improved the classification performance in various domain tasks such as respiratory disease classification [ 41 ], temporal pattern prediction based on electronic health records [ 42 ], and prediction of chemical structure based on the chemical properties [ 43 ]. We constructed a conditional variational autoencoder (CVAE) composed of two-layered encoder and decoder, which estimates the conditional distribution with latent variables and data, and generates samples for specified breast cancer subtype.…”

Section: Resultsmentioning

confidence: 99%

moBRCA-net: a breast cancer subtype classification framework based on multi-omics attention neural networks

Choi

Chae

2023

BMC Bioinformatics

View full text Add to dashboard Cite

Background Breast cancer is a highly heterogeneous disease that comprises multiple biological components. Owing its diversity, patients have different prognostic outcomes; hence, early diagnosis and accurate subtype prediction are critical for treatment. Standardized breast cancer subtyping systems, mainly based on single-omics datasets, have been developed to ensure proper treatment in a systematic manner. Recently, multi-omics data integration has attracted attention to provide a comprehensive view of patients but poses a challenge due to the high dimensionality. In recent years, deep learning-based approaches have been proposed, but they still present several limitations. Results In this study, we describe moBRCA-net, an interpretable deep learning-based breast cancer subtype classification framework that uses multi-omics datasets. Three omics datasets comprising gene expression, DNA methylation and microRNA expression data were integrated while considering the biological relationships among them, and a self-attention module was applied to each omics dataset to capture the relative importance of each feature. The features were then transformed to new representations considering the respective learned importance, allowing moBRCA-net to predict the subtype. Conclusions Experimental results confirmed that moBRCA-net has a significantly enhanced performance compared with other methods, and the effectiveness of multi-omics integration and omics-level attention were identified. moBRCA-net is publicly available at https://github.com/cbi-bioinfo/moBRCA-net.

show abstract

Section: Resultsmentioning

confidence: 99%

moBRCA-net: a breast cancer subtype classification framework based on multi-omics attention neural networks

Choi

Chae

2023

BMC Bioinformatics

View full text Add to dashboard Cite

show abstract

“…Numerous studies have tackled the RL problem by defining an action space, a state, a policy, and an environment. The action space composed of symbol sets that represent molecular structures, a state space made up of symbol substrings, a policy for predicting the next appropriate symbol (action) to append to the current substring (state) up to a certain length, and an environment that evaluates the completed string, providing rewards based on its properties [10,11,16,18]. Policies employ deep neural network models, such as RNNs, to deal with string-based molecular structures.…”

Section: Reinforcement Learning For Molecular Generationmentioning

confidence: 99%

“…To address these challenges, many studies have used the simplified molecular-input line-entry system (SMILES), self-referencing embedded strings (SELFIES) [7,8], and graph-based representation methods [9] in training deep molecular generative models. Researchers have exploited various deep generative models, including recurrent neural networks (RNNs) [10,11], transformers [12], and graph neural networks (GNNs) [9], to efficiently handle those string-based or graph-based molecular data [13,14]. Furthermore, Bayesian optimization [9,15] and reinforcement learning (RL) techniques [10,16,17] have been exploited for deep molecular generative models to create molecules with desired chemical properties.…”

Section: Introductionmentioning

confidence: 99%

“…Many studies have demonstrated the effectiveness of molecular structure generation strategies using RL for optimizing various molecular structure properties [10,11,16,17,18]. The typical RL configuration in AI-based drug design consists of two components: an agent that generates molecular structures and an environment that evaluate the generated molecules.…”

Section: Introductionmentioning

confidence: 99%

“…If a generated molecular structure possesses desired target properties, such as a high quantitative estimate of drug-likeness (QED), the agent receives a high reward as feedback from the environment. These property-constrained RL methods are effective in fine-tuning a pre-trained molecular generative model so that it can generate a large number of hit molecules [10,11,16,17,18]. However, owing to the vast size of the chemical space, it is challenging for an agent to perform efficient exploration, resulting in failure to find an optimal policy for the desired generation of molecular structures [14,19,20].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Living Conditions and Support Measures for Immigrant Youths in Yeonsu-gu, Incheon

Park¹,

Oh²,

Lim³

2022

Studies of Koreans Abroad

View full text Add to dashboard Cite

Optimizing techniques for discovering molecular structures with desired properties is crucial in artificial intelligence(AI)-based drug discovery. Combining deep generative models with reinforcement learning has emerged as an effective strategy for generating molecules with specific properties. Despite its potential, this approach is ineffective in exploring the vast chemical space and optimizing particular chemical properties. To overcome these limitations, we present Mol-AIR, a reinforcement learning-based framework using adaptive intrinsic rewards for effective goal-directed molecular generation. Mol-AIR leverages the strengths of both historybased and learning-based intrinsic rewards by exploiting random distillation network and counting-based strategies. In benchmark tests, Mol-AIR demonstrates superior performance over existing approaches in generating molecules with desired properties without any prior knowledge, including penalized LogP, QED, and celecoxib similarity. We believe that Mol-AIR represents a significant advancement in drug discovery, offering a more efficient path to discovering novel therapeutics.

show abstract

Drug Molecule Generation Method Based on Fusion of Protein Sequence Features

Wang,

Zhang,

Liu

et al. 2024

Advanced Intelligent Computing in Bioinformatics

View full text Add to dashboard Cite

Predicting chemical structure using reinforcement learning with a stack-augmented conditional variational autoencoder

Cited by 5 publications

References 47 publications

moBRCA-net: a breast cancer subtype classification framework based on multi-omics attention neural networks

moBRCA-net: a breast cancer subtype classification framework based on multi-omics attention neural networks

Living Conditions and Support Measures for Immigrant Youths in Yeonsu-gu, Incheon

Drug Molecule Generation Method Based on Fusion of Protein Sequence Features

Contact Info

Product

Resources

About