So far, in order to predict important sites of a protein, many computational methods have been developed. In the era of big-data, it is required for improvements and sophistication of existing methods by integrating sequence data in the structural data. In this paper, we aim at two things: improving sequence-based methods and developing a new method using both sequence and structural data. Therefore, we developed an originally modified evolutionary trace method, in which we defined conservative grades calculated from a given multiple sequence alignment and a proximate grade in order to evaluate predicted active sites from a viewpoint of protein-ion, protein-ligand, protein-nucleic acid, proteinprotein interaction by use of three-dimensional structures. In other words, the proximate grade also can evaluate an amino acid residue. When we applied our method to translation elongation factor Tu/1A proteins, it showed that the conservative grades are evaluated accurately by the proximate grade. Consequently, our idea indicated two advantages. One is that we can take into account various cocrystal structures for evaluation. Another one is that, by calculating the fitness between the given conservative grade and the proximate grade, we can select the best conservative grade.
Journal of Data Mining in Genomics & Proteomics
Mapping by a character typeIn this section, we define mathematical formulation of a mapping by similarity of the amino acid symbols on i M . Let Where T=1,2,…N.As shown in Figure 1B Where τ is a threshold of ( )Where is a multiset which is separated at time point . For example, ;Let A denote a field of sets of amino acid symbols anddenote a field of sets of gaps in 1 M . For example, A is definable asand t G is definable aswhere G is a number of gaps.h M be represented as following four definitions:whereA G is a number of sets in ∪ i A G and , log , Where max , min S , ( ) , S l l and ( ) , S l m are the maximum, the minimum, a diagonal element and an off-diagonal element in an amino acid substitution matrix, respectively, andwhere l is a weight of sequence l .
Mapping by a coordinate type
Let denote a set of real numbers and there be , ,...,M denote a set of residues in i M denote a set of residues in i M and
Materials and Methods
Data collectionIn UniProtKB/Swiss-Prot release 2015_01 [24], entries which are annotated as 'Classic translation factor GTPase family. EF-Tu/EF-1A subfamily', do not include 'X' in the sequence and are not a fragment were 984 entries. In the PDB, entries which are referenced from above 984 entries and are determined by X-ray crystallography were 68 entries. 14 entries were excluded because of binding an immunoprotein [25] and forming a chimeric protein [26][27][28][29]. Consequently, as shown in Table 1, 54 entries including 103 chains were retained.
Computations of f 1 and f 2As N=984 and K=103 in Figure 2, the sequences were aligned by the Correlations between f 1 and f 2 Let ( )denote a subset of non-negative real numbers and a set of ( ) .... Let j t denote a...