Protein-DNA interactions play an important role in biological progress, such as DNA replication, repair, and modification processes. In order to have a better understanding of its functions, the one of the most important steps is the identification of DNAbinding proteins. We propose a DNA-binding protein predictor, namely, RF-SVM, which contains four types features, that is, pseudo amino acid composition (PseAAC), amino acid distribution (AAD), adjacent amino acid composition frequency (ACF) and Local-DPP. Random Forest algorithm is utilized for selecting top 174 features, which are established the predictor model with the support vector machine (SVM) on training dataset UniSwiss-Tr. Finally, RF-SVM method is compared with other existing methods on test dataset UniSwiss-Tst. The experimental results demonstrated that RF-SVM has accuracy of 84.25%. Meanwhile, we discover that the physicochemical properties of amino acids for OOBM770101(H), CIDH920104(H), MIYS990104(H), NISK860101(H), VINM940103(H), and SNEP660101(A) have contribution to predict DNA-binding proteins. The main code and datasets can gain in https://github.com/ NiJianWei996/RF-SVM.
Aim and Objective::
Given the rapidly increasing number of molecular biology data available, computational
methods of low complexity are necessary to infer protein structure, function, and evolution.
Method::
In the work, we proposed a novel mthod, FermatS, which based on the global position information and local position
representation from the curve and normalized moments of inertia, respectively, to extract features information of protein
sequences. Furthermore, we use the generated features by FermatS method to analyze the similarity/dissimilarity of nine
ND5 proteins and establish the prediction model of DNA-binding proteins based on logistic regression with 5-fold crossvalidation.
Results::
In the similarity/dissimilarity analysis of nine ND5 proteins, the results are consistent with evolutionary theory.
Moreover, this method can effectively predict the DNA-binding proteins in realistic situations.
Conclusion::
The findings demonstrate that the proposed method is effective for comparing, recognizing and predicting protein
sequences. The main code and datasets can download from https://github.com/GaoYa1122/FermatS.
Aim and Objective:
Given the rapidly increasing number of molecular biology data available, computational
methods of low complexity are necessary to infer protein structure, function, and evolution.
Method:
In the work, we proposed a novel mthod, FermatS, which based on the global position information and local position
representation from the curve and normalized moments of inertia, respectively, to extract features information of protein
sequences. Furthermore, we use the generated features by FermatS method to analyze the similarity/dissimilarity of nine
ND5 proteins and establish the prediction model of DNA-binding proteins based on logistic regression with 5-fold crossvalidation.
Results:
In the similarity/dissimilarity analysis of nine ND5 proteins, the results are consistent with evolutionary theory.
Moreover, this method can effectively predict the DNA-binding proteins in realistic situations.
Conclusion:
The findings demonstrate that the proposed method is effective for comparing, recognizing and predicting protein
sequences. The main code and datasets can download from https://github.com/GaoYa1122/FermatS..
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.