Abstract. Clustering as a formal, systematic subject in dissertations can be considered the most influential unsupervised learning problem; so, as every other problem of this kind, it deals with finding the structure in a collection of unlabeled data. One of the matters associated with this subject is undoubtedly determination of the number of clusters. In this chapter, an efficient grouping genetic algorithm is proposed under the circumstances of an anonymous number of clusters. Concurrent clustering with different number of clusters is implemented on the same data in each chromosome of grouping genetic algorithm in order to discern the accurate number of clusters. In subsequent iterations of the algorithm, new solutions with different clusters number or distinct accuracy of clustering are produced by application of efficient crossover and mutation operators that led to significant improvement of clustering. Furthermore, a local search by a special probability is applied in each chromosome of each new population in order to increase the accuracy of clustering.These special operators will lead to the successful application of the proposed method in the big data analysis. To prove the accuracy and the efficiency of the algorithm, its tested on various artificial and real data sets in a comparable manner. Most of the datasets consisted of overlapping clusters, but the algorithm could detect the proper number of all data sets with high accuracy of clustering. The consequences make the best evidence of the algorithms successful performance of finding an appropriate number of clusters and accomplishment of the best clusterings quality in comparison with others.
In the paper, an algorithm that allows to detect and reject outliers in a self-organizing map (SOM) has been proposed. SOM is used for data clustering as well as dimensionality reduction and the results obtained are presented in a special graphical form. To detect outliers in SOM, a genetic algorithm-based travelling salesman approach has been applied. After outliers are detected and removed, the SOM quality has to be estimated. A measure has been proposed to evaluate the coincidence of data classes and clusters obtained in SOM. A larger value of the measure means that the distance between centers of different classes in SOM is longer and the clusters corresponding to the data classes separate better. With a view to illustrate the proposed algorithm, two datasets (numerical and textual) are used in this investigation.
Human Skin Detection is one of the most applicable methods in human detection, face detection and so many other detections. These processes can be used in a wide spectrum like industry, medicine, security, etc. The objective of this work is to provide an accurate and efficient method to detect human skin in images. This method can detect and classify skin pixels and reduce the size of images. With the use of RGB and YCbCr color spaces, proposed approach can localize a Region Of Interest (ROI) that contains skin pixels. This method consists of three steps. In the first stage, pre-processing an image like normalization, detecting skin range from the dataset, etc. is done. In the second stage, the proposed method detects candidate’s pixels that are in the range of skin color. In the third stage, with the use of a classifier, it decreases unwanted pixels and areas to decrease the accuracy of the region. The results show 97% sensitivity and 85% specificity for support vector machine classifier.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.