The difficult aspect of developing new protein sequence
comparison
techniques is coming up with a method that can quickly and effectively
handle huge data sets of various lengths in a timely manner. In this
work, we first obtain two numerical representations of protein sequences
separately based on one physical property and one chemical property
of amino acids. The lengths of all the sequences under comparison
are made equal by appending the required number of zeroes. Then, fast
Fourier transform is applied to this numerical time series to obtain
the corresponding spectrum. Next, the spectrum values are reduced
by the standard inter coefficient difference method. Finally, the
corresponding normalized values of the reduced spectrum are selected
as the descriptors for protein sequence comparison. Using these descriptors,
the distance matrices are obtained using Euclidian distance. They
are subsequently used to draw the phylogenetic trees using the UPGMA
algorithm. Phylogenetic trees are first constructed for 9 ND4, 9 ND5,
and 9 ND6 proteins using the polarity value as the chemical property
and the molecular weight as the physical property. They are compared,
and it is seen that polarity is a better choice than molecular weight
in protein sequence comparison. Next, using the polarity property, phylogenetic trees are obtained for
12 baculovirus and 24 transferrin proteins. The results are compared
with those obtained earlier on the identical sequences by other methods.
Three assessment criteria are considered for comparison of the resultsquality
based on rationalized perception, quantitative measures based on symmetric
distance, and computational speed. In all the cases, the results are
found to be more satisfactory.