2015
DOI: 10.1111/coin.12069
|View full text |Cite
|
Sign up to set email alerts
|

Computational Technique for an Efficient Classification of Protein Sequences With Distance‐Based Sequence Encoding Algorithm

Abstract: Machine learning is being implemented in bioinformatics and computational biology to solve challenging problems emerged in the analysis and modeling of biological data such as DNA, RNA, and protein. The major problems in classifying protein sequences into existing families/superfamilies are the following: the selection of a suitable sequence encoding method, the extraction of an optimized subset of features that possesses significant discriminatory information, and the adaptation of an appropriate learning alg… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
3
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 48 publications
0
3
0
Order By: Relevance
“…The length of the sequences is from few amino acids to thousands of amino acids. There are many methods for protein sequences encoding out there including distance based encoding [4], [16] that captures statistical characteristics of protein sequences. In our work we transformed the labels in to one hot vector representation using LabelBinarizer from sklearn.preprocessing.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The length of the sequences is from few amino acids to thousands of amino acids. There are many methods for protein sequences encoding out there including distance based encoding [4], [16] that captures statistical characteristics of protein sequences. In our work we transformed the labels in to one hot vector representation using LabelBinarizer from sklearn.preprocessing.…”
Section: Methodsmentioning
confidence: 99%
“…However, it is very expensive to characterize functions for biological experiments and also, it is really necessary to find the association between the information of datasets to create and improve medical tools. For classification purpose, several classification techniques were developed [3], [4]. These techniques can be divided in two parts: Sequence alignment and Machine learning algorithms.…”
Section: Introductionmentioning
confidence: 99%
“…There are 20 essential Amino acids (Figure 1) in nature. Each Amino acid has a unique chemical structure that gives it specific properties [3]. The Amino acid signatures represent encoded instances of network transactions using essential Amino acid labels that encode numerical structural properties using a Vigesimal numbering system to map ASCII codes to Amino acids.…”
mentioning
confidence: 99%