Machine learning-based prediction of protein functions plays a key role in bioinformatics and pharmaceutical research, facilitating swift discovery of new drugs in highthroughput settings. This paper presents an adaptation of Random Forest to the structure-based protein function prediction. Our system represents protein's 3D physicochemical structural information in microenvironment descriptors whose spatial resolution is much finer than other sequence-based protein descriptors. We prepare our datasets for seven active sites from five protein function classes by using multiple public data banks and train Random Forest classifiers to identify these seven function models in proteins. This paper presents two experiment studies: 1) a 5-fold stratified cross-validation for comparing Random Forest with Naive Bayes and Support Vector Machine and 2) systematic comparison of Random Forest's two variable importance measures. Promising results of these studies demonstrate a potential for Random Forest to improve the accuracy of the current protein function assays.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.