TransPhos: A Deep-Learning Model for General Phosphorylation Site Prediction Based on Transformer-Encoder Architecture

Gao

Han

et al. 2023

IJMS

Self Cite

Recent years have seen tremendous success in the design of novel drug molecules through deep generative models. Nevertheless, existing methods only generate drug-like molecules, which require additional structural optimization to be developed into actual drugs. In this study, a deep learning method for generating target-specific ligands was proposed. This method is useful when the dataset for target-specific ligands is limited. Deep learning methods can extract and learn features (representations) in a data-driven way with little or no human participation. Generative pretraining (GPT) was used to extract the contextual features of the molecule. Three different protein-encoding methods were used to extract the physicochemical properties and amino acid information of the target protein. Protein-encoding and molecular sequence information are combined to guide molecule generation. Transfer learning was used to fine-tune the pretrained model to generate molecules with better binding ability to the target protein. The model was validated using three different targets. The docking results show that our model is capable of generating new molecules with higher docking scores for the target proteins.

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

PETrans: De Novo Drug Design with Protein-Specific Encoding Based on Transfer Learning

Gao

Han

et al. 2023

IJMS

Self Cite

“…The rapid development of high-throughput single-cell RNA sequencing (scRNA-seq) technologies has facilitated the study of the transcriptomic characterization of cell heterogeneity and dynamics [ 1 , 2 , 3 , 4 ]. In recent years, researchers have collected a large amount of single-cell gene expression data from different experiments at different times and on different sequencing platforms [ 5 , 6 ].…”

Section: Introductionmentioning

confidence: 99%

Integrating Multiple Single-Cell RNA Sequencing Datasets Using Adversarial Autoencoders

Zhang

et al. 2023

IJMS

Self Cite

Single-cell RNA sequencing (RNA-seq) has been demonstrated to be a proven method for quantifying gene-expression heterogeneity and providing insight into the transcriptome at the single-cell level. When combining multiple single-cell transcriptome datasets for analysis, it is common to first correct the batch effect. Most of the state-of-the-art processing methods are unsupervised, i.e., they do not utilize single-cell cluster labeling information, which could improve the performance of batch correction methods, especially in the case of multiple cell types. To better utilize known labels for complex dataset scenarios, we propose a novel deep learning model named IMAAE (i.e., integrating multiple single-cell datasets via an adversarial autoencoder) to correct the batch effects. After conducting experiments with various dataset scenarios, the results show that IMAAE outperforms existing methods for both qualitative measures and quantitative evaluation. In addition, IMAAE is able to retain both corrected dimension reduction data and corrected gene expression data. These features make it a potential new option for large-scale single-cell gene expression data analysis.

“…MADE ( Pang et al, 2022 ) constructs two different encoders to learn the graph information and sequence information of the drug respectively, and then uses a feature fusion atttention-based method which integating the drug multiple dimensions features. TransPhos ( Wang et al, 2022 ) proposes a two-stage deep learning approach and constructs three different structures of encoders for feature learning based on the attention mechanism. SDNN-PPI ( Li et al, 2022 ) constructs three different ways of encoding protein sequences, and then uses a self-attention mechanism to further learn semantic relationships in the sequences for Protein-Protein Interaction (PPI).…”

Section: Introductionmentioning

confidence: 99%

CAT-CPI: Combining CNN and transformer to learn compound image features for predicting compound-protein interactions

Qian

Wu²,

Zhang³

2022

Front. Mol. Biosci.

Compound-protein interaction (CPI) prediction is a foundational task for drug discovery, which process is time-consuming and costly. The effectiveness of CPI prediction can be greatly improved using deep learning methods to accelerate drug development. Large number of recent research results in the field of computer vision, especially in deep learning, have proved that the position, geometry, spatial structure and other features of objects in an image can be well characterized. We propose a novel molecular image-based model named CAT-CPI (combining CNN and transformer to predict CPI) for CPI task. We use Convolution Neural Network (CNN) to learn local features of molecular images and then use transformer encoder to capture the semantic relationships of these features. To extract protein sequence feature, we propose to use a k-gram based method and obtain the semantic relationships of sub-sequences by transformer encoder. In addition, we build a Feature Relearning (FR) module to learn interaction features of compounds and proteins. We evaluated CAT-CPI on three benchmark datasets—Human, Celegans, and Davis—and the experimental results demonstrate that CAT-CPI presents competitive performance against state-of-the-art predictors. In addition, we carry out Drug-Drug Interaction (DDI) experiments to verify the strong potential of the methods based on molecular images and FR module.