2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW) 2011
DOI: 10.1109/bibmw.2011.6112394
|View full text |Cite
|
Sign up to set email alerts
|

Amino acid encoding schemes for machine learning methods

Abstract: In this paper, we investigate the efficiency of a number of commonly used amino acid encodings by using artificial neural networks and substitution scoring matrices. An important step in many machine learning techniques applied in computational biology is encoding the symbolic data of protein sequences reasonably efficient in numeric vector representations. This encoding can be achieved by either considering the amino acid physicochemical properties or a generic numerical encoding. In order to be effective in … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
8
0

Year Published

2014
2014
2022
2022

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(8 citation statements)
references
References 16 publications
0
8
0
Order By: Relevance
“…The same input data were represented differently in order to select among different encoding methods and, therefore, each encoded input had a variable impact on machine learning measures. In computational biology, encoding of amino acids can be achieved by considering amino acids' physicochemical properties, for instance, using the BLOSUM substitution matrix, or by a generic character-wise encoding like one-hot or integer encoding used also in other ML domains (Zamani and Kremer, 2011).…”
Section: Encodingmentioning
confidence: 99%
“…The same input data were represented differently in order to select among different encoding methods and, therefore, each encoded input had a variable impact on machine learning measures. In computational biology, encoding of amino acids can be achieved by considering amino acids' physicochemical properties, for instance, using the BLOSUM substitution matrix, or by a generic character-wise encoding like one-hot or integer encoding used also in other ML domains (Zamani and Kremer, 2011).…”
Section: Encodingmentioning
confidence: 99%
“…After feature engineering, deep learning and machine learning algorithms can be applied to the extracted features to perform protein family classification. Yet, in order to apply these methods, sequences are needed to be converted to numerical representations since there is no such a method to perform artificial intelligence with raw protein sequences [7,8].…”
Section: Introductionmentioning
confidence: 99%
“…In the literature, there are limited methods for converting protein sequences into the numbers. In general, BLOSUM62 (BLOcks SUbstitution Matrix), PAM25 (Point Accepted Mutation), hydrophobicity, EIIP (Electron-Ion Interaction Potential) are applied and the performance of family classification is highly depending on the conversion method [7,9]. Recently, deep learning models are actively used in bioinformatics studies and show promising results.…”
Section: Introductionmentioning
confidence: 99%
“…In the past, several machine learning approaches have been developed for the classification of protein sequences into functional or structural existing superfamilies [ 16 , 19 22 ]. A superfamily is comprised of a set of proteins that possess sequence or structural homology.…”
Section: Introductionmentioning
confidence: 99%