This paper presents a comprehensive study on clustering: exiting methods and developments made at various times. Clustering is defined as an unsupervised learning where the objects are grouped on the basis of some similarity inherent among them. There are different methods for clustering the objects such as hierarchical, partitional, grid, density based and model based. The approaches used in these methods are discussed with their respective states of art and applicability. The measures of similarity as well as the evaluation criteria, which are the central components of clustering are also presented in the paper. The applications of clustering in some fields like image segmentation, object and character recognition and data mining are highlighted.
In this article, a novel approach for extracting features from protein sequences is proposed. This approach extracts only six features corresponding to each protein sequence. These features are computed by globally considering the probabilities of occurrences of the amino acids in different positions within the superfamily which locally belongs to the six exchange groups. Then, these features are used as an input to the Neural Network formed by Boolean-Like Training Algorithm (BLTA). The BLTA is used to classify the protein sequences obtained from the Protein Information Resource (PIR). To investigate the efficacy of proposed feature extraction approach, the experimentation is performed on two superfamilies, namely Ras and Globin using tenfold cross validation. The highest Classification Accuracy achieved is 100.00±00.00 with Computational Time 170.49±70.87 (s) are remarkably better in comparison to the Classification Accuracies and Computational Time achieved by Mansouri, Bandyopadhyay and Wang. The experimental results demonstrate that the proposed approach extracts the most significant and lesser number of features for each protein sequence due to which it results in considerably potential improvement in Classification Accuracy and takes less Computational Time in comparison with other well-known feature extraction approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.