According to the nature of 64 genetic codes, we propose a simple and intuitive 2D graphical expression of protein sequences. And based on this expression we give a new Euclidean-distance method to compute the distance of different sequences for the analysis of sequence similarity. This approach contains more sequence information. A typical phylogenetic tree constructed based on this method proved the effectiveness of our approach. Finally, we use this sequence-similarity-analysis method to predict protein sub-cellular localization, in the two datasets commonly used. The results show that the method is reasonable.
On the basis of information on the evolution of the 20 amino acids and their physiochemical characteristics, we propose a new two-dimensional (2D) graphical representation of protein sequences in this article. By this representation method, we use 2D data to represent three-dimensional information constructed by the amino acids' evolution index, the class information of amino acid based on physiochemical characteristics, and the order of the amino acids appearing in the protein sequences. Then, using discrete Fourier transform, the sequence signals with different lengths can be transformed to the frequency domain, in which the sequences are with the same length. A new method is used to analyze the protein sequence similarity and to predict the protein structural class. The experiments indicate that our method is effective and useful.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.