Web pages nowadays have different forms and types of content. When the Web content is considered, they are in the form of pictures, videos, audio files, and text files in different languages. The content can be multilingual, heterogeneous, and unstructured. The mining should be independent of the language and software. Statistical features of the images are extracted from the pixel map of the image. The extracted features are presented to the fuzzy clustering algorithm (FCM) and Gath-Geva algorithm. The similarity metric being Euclidean distance and Gaussian distance, respectively. The accuracy is compared and presented.
Data mining techniques are used to extract useful patterns from a large data set. k-mean algorithm is one of the most famous partitioning clustering algorithm. But, Euclidean distance is sensitive to outliers and is suitable to only numeric values. Real time datasets have mixed attribute values, missing values and measurements are not in the standard format.The proposed algorithm extends the ability of the kmean algorithm to use a mixed simil arity measure to find the similarity between data objects for clustering mixed datasets.For imp uting missing values, correlation based data imputation is used.In addition, k-mean output depends on the initial cluster centre and local optima suffers from the number of clusters(k). In order to improve the efficiency of the k-mean algorithm, Artificial Bee Colony Optimization (ABC) based clustering algorithm is suggested. ABC is successful at exploring the search space, but endures in leveraging the search space. Collaborative search is used to amplify the search quality of bees to amplify the search quality of bees employees. To determine the number of clusters for the given data set, the Elbow method is used. In order to evaluate the outcome of the proposed algorithm, real time datasets are used. The results showed that the proposed method performs well compared to comparative algorithms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.