Multi-family enzymes are of great importance in life, disease and other domains. However, in terms of the classification of enzymes, the information of multi-family enzymes is always removed from the dataset to account for the limitation of traditional single-label prediction methods. In order to predict multiple classes of multi-family enzymes, we adopted two multi-label learning algorithms, namely RAkEL-RF and MLKNN, and two types of protein descriptors, namely CTD and PseAAC, to generate four predictors, RAkEL-RF-CTD, RAkEL-RF-PseAAC, MLKNN-CTD and MLKNN-PseAAC. When the four predictors were tested on a training set with 10-fold cross validation, the overall success rates reached 97.99%, 96.07%, 96.01% and 95.31%, respectively. For the independent test set, the corresponding rates reached 97.57%, 95.03%, 95.9% and 93.9%, respectively. In conclusion, it proved the outstanding prediction capability and robustness of our predictors from the extremely small difference between two sets for each predictor and the relatively higher accuracy. In addition, three of seven pairs of homologous enzymes with different functions and eighteen of twenty-three distantly related enzymes with a similar family were correctly classified by the RAkEL-RF-CTD predictor. These results indicated the extensive applicability of our predictors.
Structural domains in proteins are the basic units to form various proteins. In the protein's evolution and functioning, domains play important roles. But the definition of domain is not yet precisely given, and the update cycle of structural domain databases is long. The automatic algorithms identify domains slowly, while protein entities with great structural complexity are on the rise. Here, we present a method which recognizes the compact and modular segments of polypeptide chains to identify structural domains, and contrast some data sets to illuminate their effect. The method combines support vector machine (SVM) with K-means algorithm. It is faster and more stable than most current algorithms and performs better. It also indicates that when proteins are presented as some Alpha-carbon atoms in 3D space, it is feasible to identify structural domains by the spatially structural properties. We have developed a web-server, which would be helpful in identification of structural domains (http://vis.sculab.org/~huayongpan/cgi-bin/domainAssignment.cgi).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.