2020
DOI: 10.1186/s12859-020-03546-x
|View full text |Cite
|
Sign up to set email alerts
|

Amino acid encoding for deep learning applications

Abstract: Background The number of applications of deep learning algorithms in bioinformatics is increasing as they usually achieve superior performance over classical approaches, especially, when bigger training datasets are available. In deep learning applications, discrete data, e.g. words or n-grams in language, or amino acids or nucleotides in bioinformatics, are generally represented as a continuous vector through an embedding matrix. Recently, learning this embedding matrix directly from the data as part of the c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
66
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 85 publications
(68 citation statements)
references
References 21 publications
2
66
0
Order By: Relevance
“…The word embedding dimension D can be lower than the alphabet size, and thus, lower than the one-hot encoding dimension. For DeepNOG, D = 10 gave best results on validation data, which reflects the findings of a recent survey on amino acid encoding schemes ( ElAbd et al , 2020 ). Consequently, each amino acid is represented as a ten-dimensional vector.…”
Section: Methodssupporting
confidence: 83%
“…The word embedding dimension D can be lower than the alphabet size, and thus, lower than the one-hot encoding dimension. For DeepNOG, D = 10 gave best results on validation data, which reflects the findings of a recent survey on amino acid encoding schemes ( ElAbd et al , 2020 ). Consequently, each amino acid is represented as a ten-dimensional vector.…”
Section: Methodssupporting
confidence: 83%
“…3 ). Among them, one-hot encoding is to transform a character into a binary-bit vector [42] , [43] . One-hot encoding scheme is popular since deep learning models require grid-like input with numbers.…”
Section: Data Formats and Encoding Schemesmentioning
confidence: 99%
“…[13] These vectors could also be learned jointly with the main task (e.g., RT prediction or MHC-peptide binding prediction) in the same way that the weights of the neural network of the main task are learned. [14] This type of encoding method has been demonstrated to be extremely useful in certain tasks. [12,[14][15][16] Before encoding a sequence as dense numeric vectors, the sequence is typically represented as an integer vector in which each token is represented by a unique integer.…”
Section: Basic Concepts In Deep Learningmentioning
confidence: 99%
“…[14] This type of encoding method has been demonstrated to be extremely useful in certain tasks. [12,[14][15][16] Before encoding a sequence as dense numeric vectors, the sequence is typically represented as an integer vector in which each token is represented by a unique integer. The final method is to design handcrafted features and then take these features as input for modeling.…”
Section: Basic Concepts In Deep Learningmentioning
confidence: 99%