DaDL-SChlo: protein subchloroplast localization prediction based on generative adversarial networks and pre-trained protein language model

Wang, Xiao; Han, Lijun; Wang, Rong; Chen, Haoran

doi:10.1093/bib/bbad083

Cited by 5 publications

(2 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…One possible extension is to collect additional TIPs to develop a more comprehensive prediction model. Another extension could be the employment of well-known feature extractors, such as a bidirectional recurrent neural network (RNN) [ 55 ] and ProtBERT [ 56 ], to effectively capture the key information of TIPs. For the last extension, we can try to incorporate TIPred with recent innovative computational frameworks, such as an iterative feature representation algorithm [ 57 ] and deep learning (DL)-based framework [ 39 , 58 ].…”

Section: Discussionmentioning

confidence: 99%

TIPred: a novel stacked ensemble approach for the accelerated discovery of tyrosinase inhibitory peptides

Charoenkwan,

Kongsompong,

Schaduangrat

et al. 2023

BMC Bioinformatics

View full text Add to dashboard Cite

Background Tyrosinase is an enzyme involved in melanin production in the skin. Several hyperpigmentation disorders involve the overproduction of melanin and instability of tyrosinase activity resulting in darker, discolored patches on the skin. Therefore, discovering tyrosinase inhibitory peptides (TIPs) is of great significance for basic research and clinical treatments. However, the identification of TIPs using experimental methods is generally cost-ineffective and time-consuming. Results Herein, a stacked ensemble learning approach, called TIPred, is proposed for the accurate and quick identification of TIPs by using sequence information. TIPred explored a comprehensive set of various baseline models derived from well-known machine learning (ML) algorithms and heterogeneous feature encoding schemes from multiple perspectives, such as chemical structure properties, physicochemical properties, and composition information. Subsequently, 130 baseline models were trained and optimized to create new probabilistic features. Finally, the feature selection approach was utilized to determine the optimal feature vector for developing TIPred. Both tenfold cross-validation and independent test methods were employed to assess the predictive capability of TIPred by using the stacking strategy. Experimental results showed that TIPred significantly outperformed the state-of-the-art method in terms of the independent test, with an accuracy of 0.923, MCC of 0.757 and an AUC of 0.977. Conclusions The proposed TIPred approach could be a valuable tool for rapidly discovering novel TIPs and effectively identifying potential TIP candidates for follow-up experimental validation. Moreover, an online webserver of TIPred is publicly available at http://pmlabstack.pythonanywhere.com/TIPred.

show abstract

Section: Discussionmentioning

confidence: 99%

TIPred: a novel stacked ensemble approach for the accelerated discovery of tyrosinase inhibitory peptides

Charoenkwan,

Kongsompong,

Schaduangrat

et al. 2023

BMC Bioinformatics

View full text Add to dashboard Cite

show abstract

“…In addition, a long short-term memory network (LSTM) which combines the previous states and current inputs is also commonly used [56,57], with Generative Adversarial Network (GAN) [58] and Synthetic Minority Over-sampling Technique (SMOTE) [59] used for synthesizing minority samples to deal with data imbalance. Developing data augmentation methods by deep learning algorithms has also made protein language model construction possible [60,61]. Through transfer learning [62], pretrained models can be fine-tuned on different downstream tasks, reducing the need for large amounts of labeled data for training.…”

Section: Sequences-based Ai Approachesmentioning

confidence: 99%

A Review for Artificial Intelligence Based Protein Subcellular Localization

Xiao,

Zou,

Wang

et al. 2024

Biomolecules

View full text Add to dashboard Cite

Proteins need to be located in appropriate spatiotemporal contexts to carry out their diverse biological functions. Mislocalized proteins may lead to a broad range of diseases, such as cancer and Alzheimer’s disease. Knowing where a target protein resides within a cell will give insights into tailored drug design for a disease. As the gold validation standard, the conventional wet lab uses fluorescent microscopy imaging, immunoelectron microscopy, and fluorescent biomarker tags for protein subcellular location identification. However, the booming era of proteomics and high-throughput sequencing generates tons of newly discovered proteins, making protein subcellular localization by wet-lab experiments a mission impossible. To tackle this concern, in the past decades, artificial intelligence (AI) and machine learning (ML), especially deep learning methods, have made significant progress in this research area. In this article, we review the latest advances in AI-based method development in three typical types of approaches, including sequence-based, knowledge-based, and image-based methods. We also elaborately discuss existing challenges and future directions in AI-based method development in this research field.

show abstract