Machine Learning in Bioinformatics of Protein Sequences 2022
DOI: 10.1142/9789811258589_0002
|View full text |Cite
|
Sign up to set email alerts
|

Application of Sequence Embedding in Protein Sequence-Based Predictions

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
3
2

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 61 publications
0
6
0
Order By: Relevance
“…Zscale, MS-WHIM, Stscale [ 10 , 34 ]) rather than specific features for each residue in the sequence, as was done for rs3DDPDs given the heavy influence of the environment in the dynamic behavior of single residues. On the other hand, protein embeddings are often the byproduct of a machine or deep learning model using a protein sequence as input [ 12 , 35 ], unlike the approach followed for ps3DDPDs. Here, instead, a common main framework was kept to increase the interpretability and interoperability of the resulting descriptors.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Zscale, MS-WHIM, Stscale [ 10 , 34 ]) rather than specific features for each residue in the sequence, as was done for rs3DDPDs given the heavy influence of the environment in the dynamic behavior of single residues. On the other hand, protein embeddings are often the byproduct of a machine or deep learning model using a protein sequence as input [ 12 , 35 ], unlike the approach followed for ps3DDPDs. Here, instead, a common main framework was kept to increase the interpretability and interoperability of the resulting descriptors.…”
Section: Discussionmentioning
confidence: 99%
“…Descriptors derived from the protein sequence include discrete features calculated per residue (one-hot encoding) [ 10 ] or protein [ 11 ] capturing physicochemical properties or amino acid composition. Additionally, deep learning applications of natural language processing have prompted the generation of protein embeddings from sequences [ 12 ]. Structure-based descriptors can be derived from molecular graphs or the protein 3D structure by measuring connectivity, distances, and physicochemical properties among others [ 8 , 9 ].…”
Section: Introductionmentioning
confidence: 99%
“…We generate the sequence “[BOS], M 1 , …, M p , [SEP], P 1 , …, P q , [EOS]” with length p + q + 3 as the ESM model input, and obtain the same size embedding vectors from the last layer of ESM models, corresponding to the special tokens and the amino acids in the MHC and the peptide. As a common strategy in NLP sequence classification tasks, we use the embedding of [BOS] to be the MHC-peptide sequence-pair embedding vector ( Ibtehaz and Kihara, 2023 ). Finally, passing through a softmax classifier layer, we output the probability of binding and use it to compute the loss and apply back-propagation.…”
Section: Methodsmentioning
confidence: 99%
“…The concept of embeddings originally stems from natural language processing (NLP) and language models, but targeted adaptations for amino acid sequence data have already been presented [Ibtehaz and Kihara, 2021]. By associating numerical vectors to tokens such as characters or words, embeddings constitute the first encoding step of a model.…”
Section: Benchmarking Of Proteasomal Cleavage Prediction Strategiesmentioning
confidence: 99%
“…By associating numerical vectors to tokens such as characters or words, embeddings constitute the first encoding step of a model. The method employed to generate embeddings is crucial as it influences what intrinsic information a model can exploit for classification [Ibtehaz and Kihara, 2021]. We, thus, considered various embeddings in our analysis, while keeping the base architecture equal.…”
Section: Benchmarking Of Proteasomal Cleavage Prediction Strategiesmentioning
confidence: 99%