2023
DOI: 10.1093/bib/bbad135
|View full text |Cite
|
Sign up to set email alerts
|

UniDL4BioPep: a universal deep learning architecture for binary classification in peptide bioactivity

Abstract: Identification of potent peptides through model prediction can reduce benchwork in wet experiments. However, the conventional process of model buildings can be complex and time consuming due to challenges such as peptide representation, feature selection, model selection and hyperparameter tuning. Recently, advanced pretrained deep learning-based language models (LMs) have been released for protein sequence embedding and applied to structure and function prediction. Based on these developments, we have develop… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

1
31
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 42 publications
(32 citation statements)
references
References 46 publications
1
31
0
Order By: Relevance
“…Another representative technique is the machine learning (ML) approach where the classification criterion is determined by the ML model by learning from a separate training data set. ,,, , Besides the selection of modeling method, the most challenging task in the ML approach is the representation or encoding of protein/peptide sequences into a numerical vector/matrix. Various encoding methods, including amino acid descriptors, amino acid composition (AAC), pseudoamino acid composition (PseAAC), dipeptide composition (DPC), amino acid descriptors (AAD), position-specific scoring matrix (PSSM), physicochemical descriptors, biomedical properties, k-mer dictionary-based binary representation, etc., have been widely used in predicting allergenicity and other properties/bioactivities. ,,,,, However, these features may not always accurately represent protein sequences and simple combinations can cause high-dimensional problems as well as the feature redundancy …”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…Another representative technique is the machine learning (ML) approach where the classification criterion is determined by the ML model by learning from a separate training data set. ,,, , Besides the selection of modeling method, the most challenging task in the ML approach is the representation or encoding of protein/peptide sequences into a numerical vector/matrix. Various encoding methods, including amino acid descriptors, amino acid composition (AAC), pseudoamino acid composition (PseAAC), dipeptide composition (DPC), amino acid descriptors (AAD), position-specific scoring matrix (PSSM), physicochemical descriptors, biomedical properties, k-mer dictionary-based binary representation, etc., have been widely used in predicting allergenicity and other properties/bioactivities. ,,,,, However, these features may not always accurately represent protein sequences and simple combinations can cause high-dimensional problems as well as the feature redundancy …”
Section: Introductionmentioning
confidence: 99%
“…Recently, inspired by the extraordinary performance of transformer-based models in natural language processing (NLP) tasks, a series of attempts have been made in pretrained protein language models (pLMs) (e.g., ProtTrans, ESM, UniRep, etc.) for downstream protein sequence tasks. , The central concept lies in incorporating the attention mechanism during the self-supervised learning process in these pLMs. This mechanism enables the model to observe the entire sequence simultaneously to learn the representation of each residue as well as their relationship, rather than locally (as in convolutional neural network (CNN)) or sequentially (as in long short-term networks (LSTM)).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Given the importance of BPs, there have been several attempts to create in-silico approaches to perform a preliminary assignment of the potential functional properties and facilitate the subsequent discovery and testing process in vivo [1924]. These methods rely on several databases where peptides from various experiments have been collected and classified according to the BPs functional classes.…”
Section: Introductionmentioning
confidence: 99%
“…Tools include similarity-based classification using available sequences from databases [7,25] and prediction of physicochemical properties [26]. There have also been several attempts to use machine learning techniques to aid in the detection of BPs and their functional classification [1924]. The proposed methods used Logistic Regression [27], Support Vector Machines [28] and Random Forests [29] to predict the BPs functional role.…”
Section: Introductionmentioning
confidence: 99%