Similarity or distance measures are core components used by distance-based clustering algorithms to cluster similar data points into the same clusters, while dissimilar or distant data points are placed into different clusters. The performance of similarity measures is mostly addressed in two or three-dimensional spaces, beyond which, to the best of our knowledge, there is no empirical study that has revealed the behavior of similarity measures when dealing with high-dimensional datasets. To fill this gap, a technical framework is proposed in this study to analyze, compare and benchmark the influence of different similarity measures on the results of distance-based clustering algorithms. For reproducibility purposes, fifteen publicly available datasets were used for this study, and consequently, future distance measures can be evaluated and compared with the results of the measures discussed in this work. These datasets were classified as low and high-dimensional categories to study the performance of each measure against each category. This research should help the research community to identify suitable distance measures for datasets and also to facilitate a comparison and evaluation of the newly proposed similarity or distance measures with traditional ones.
Abstract.Clustering is an essential data mining and tool for analyzing big data. There are difficulties for applying clustering techniques to big data duo to new challenges that are raised with big data. As Big Data is referring to terabytes and petabytes of data and clustering algorithms are come with high computational costs, the question is how to cope with this problem and how to deploy clustering techniques to big data and get the results in a reasonable time. This study is aimed to review the trend and progress of clustering algorithms to cope with big data challenges from very first proposed algorithms until today's novel solutions. The algorithms and the targeted challenges for producing improved clustering algorithms are introduced and analyzed, and afterward the possible future path for more advanced algorithms is illuminated based on today's available technologies and frameworks.
In this paper, the existence and uniqueness of the interface coupling (IC) of time and spatial (TS) arbitrary-order fractional (AOF) nonlinear hyperbolic scalar conservation laws (NHSCL) are investigated. The technique of arbitrary fractional characteristic method (AFCM) is used to accomplish this task. We apply Jumarie’s modification of Riemann–Liouville and Liouville–Caputo’s definition to extend some formulae to the arbitrary-order fractional calculus. Then these formulae are utilized to prove the main theorem. In this process, we develop an analytic method, which gives us the ability to find the solution of IC AOF NHSCL. The feature of this method is that it enables us to verify that the obtained solution satisfies the fractional partial differential equation (FPDE), and the solution is unique. Furthermore, a few examples illustrate the implementation of this technique in the application section.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.