Background
Protein feature extraction plays an important role in the areas of similarity analysis of protein sequences and prediction of protein structures, functions and interactions. The feature extraction based on graphical representation is one of the most effective and efficient ways. However, most existing methods suffer limitations from their method design.
Results
We introduce DCGR, a novel method for extracting features from protein sequences based on the chaos game representation, which is developed by constructing CGR curves of protein sequences according to physicochemical properties of amino acids, followed by converting the CGR curves into multi-dimensional feature vectors by using the distributions of points in CGR images. Tested on five data sets, DCGR was significantly superior to the state-of-the-art feature extraction methods.
Conclusion
The DCGR is practically powerful for extracting effective features from protein sequences, and therefore important in similarity analysis of protein sequences, study of protein-protein interactions and prediction of protein functions. It is freely available at
https://sourceforge.net/projects/transcriptomeassembly/files/Feature%20Extraction
.
Electronic supplementary material
The online version of this article (10.1186/s12859-019-2943-x) contains supplementary material, which is available to authorized users.
BackgroundThe mechanism of action of proteases has been widely studied based on substrate specificity. Prior research has been focused on the amino acids at a single amino acid site, but rarely on combinations of amino acids around the cleavage bond.ResultsWe propose a novel block-based approach to reveal the potential combinations of amino acids which may regulate the action of proteases. Using the entropies of eight blocks centered at a cleavage bond, we created a distance matrix for 61 proteases to compare their specificities. After quantitative analysis, we discovered a number of prominent blocks, each of which consists of successive amino acids near a cleavage bond, intuitively characterizing the site cooperation of the substrate sequences.ConclusionThis approach will help in the discovery of specific substrate sequences which may bridge between proteases and cleavage substrate as more substrate information becomes available.Electronic supplementary materialThe online version of this article (10.1186/s12859-017-1851-1) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.