Abstract:In many data mining applications, we use a clustering algorithm on a large amount of uncertain data. In this paper, we adapt an uncertain data clustering algorithm called fast density-based spatial clustering of applications with noise (FDBSCAN) to multicore systems in order to have fast processing. The new algorithm, which we call multicore FDBSCAN (M-FDBSCAN), splits the data domain into c rectangular regions, where c is the number of cores in the system. The FDBSCAN algorithm is then applied to each rectangular region simultaneously. After the clustering operation is completed, semiclusters that occur during splitting are detected and merged to construct the final clusters.M-FDBSCAN is tested for correctness and performance. The experiments show that there is a significant performance increase due to M-FDBSCAN, which is not just due to multicore usage.
DNA microarray experiments are frequently used because they have various advantages. However, gene expression data from DNA microarray experiments are noisy, and, consequently, the computations that are based on such noisy data may lack accuracy. In this paper, an evolutionary uncertain data-clustering algorithm, E-MFDBSCAN, and a prediction model using E-MFDBSCAN for uncertain data are proposed. The proposed methodology may be successfully applied to noisy gene expression data. In this methodology, global patterns of time series data can be extracted using our evolutionary clustering approach. These patterns are used to infer future projections. In the proposed methodology, an autoregressive time series function (using these patterns) used to predict the similarities among sets of gene expression clusters is constructed. The algorithms are tested with two different gene expression time series datasets.
The IoT is a sensors world that detects countless physical events in our environment and transforms them into data, and transfers this data to different environments or digital systems. The usage areas of Internet of things-based technologies are constantly increasing and technologies are being developed to support the IoT infrastructure. But, in order to effectively manage the large number of big-data generate in the detection layer, it should be pre-processed and done in accordance with big-data standards. For the effective management of big data, it is imperative to improving the standards of the data set, and filtering methods are being developed for a higher quality data set. For instance, using data cleaning methods is a preprocessing method that facilitates data mining operations. In this way, more manageable data is obtained by preventing the formation of interference and big data can be managed more effectively. In this study, we investigate the efficient operation of IoT and big data originating from the internet of things. Additionally, real-time anomalous data filtering is performed on IoT edges with a data set consisting of six different data produced in real- time. Furthermore, the speed and accuracy performances of classifiers are compared, and machine learning algorithms such as the random cut forest-RCF, logistic regression-LR, naive bayes-NB, and neural network-NN classifiers are used for comparison. According to the accuracy performance values, the RCF and LR classifiers are very close, but considering the speed values, it is seen that the LR classifier is more successful in IoT systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.