Abstract-Metal ions in protein are critical to the function, structure and stability of protein. For this reason accurate prediction of metal binding sites in protein is very important. Here, we present our study which is performed for predicting metal binding sites for histidines (HIS) and cysteines from protein sequence. Three different methods are applied for this task: Support Vector Machine (SVM), Naive Bayes and Variable-length Markov chain. All these methods use only sequence information to classify a residue as metal binding or not. Several feature sets are employed to evaluate impact on prediction results. We predict metal binding sites for mentioned amino acids at 35% precision and 75% recall with Naive Bayes, at 25% precision and 23% recall with Support Vector Machine and at 0.05% precision and 60% recall with Variable-length Markov chain. We observe significant differences in performance depending on the selected feature set. The results show that Naive Bayes is competitive for metal binding site detection.
I. INTRODUCTIONProtein plays a crucial role in all biological processes. And they consist of one or more long chains of amino acid residues. In the frame of this perspective, amino acids are important ligands with nitrogen and oxygen as the donor, constituent of many biological important molecules [1].It is estimated that approximately half of all proteins contain a metal [2]. A significant fraction (about one third) of all known proteins is believed to bind metal ions as cofactors in their native conformation [3]. The biological activities of proteins require these cofactors to assist their daily routines. For this reason, a metal ion in a protein and prediction of its binding point is very important to understand the function of proteins in biological activities. Metal ions in proteins are responsible for multiple tasks. They help stabilizing protein structure [4], induce conformational changes [5][6][7], and assist protein functions (e.g. electron transfer, nucleophilic catalysis).There are many related studies about predicting metal binding sites, however, machine learning techniques have been recently applied to predict the metal binding sites of residues. Predicting metal binding sites by using non-computational methods has some drawbacks.X-ray absorption spectroscopy (HT-XAS) has been recently proved to be capable of identifying metalloproteins with high reliability [8,9]. However, the specific ligands involved in binding metal ion(s) cannot be identified by these techniques [9]. Motif-based system has also been developed by using regular expressions but since regular expressions can be quite specific, their results have many false negatives. To overcome these drawbacks, many computational learning techniques have been developed to predict metal binding sites. Early approaches can be found in the work of Nakata et al. (1995). In this study, they focused on predicting zincfinger DNA-binding proteins with a neural network. In this approach, applicable results were generated by a method for c...