The previous decade has brought a remarkable increase of the interest in applications that deal with querying and mining of time series data. Many of the research efforts in this context have focused on introducing new representation methods for dimensionality reduction or novel similarity measures for the underlying data. In the vast majority of cases, each individual work introducing a particular method has made specific claims and, aside from the occasional theoretical justifications, provided quantitative experimental observations. However, for the most part, the comparative aspects of these experiments were too narrowly focused on demonstrating the benefits of the proposed methods over some of the previously introduced ones. In order to provide a comprehensive validation, we conducted an extensive experimental study re-
Processing applications with a large number of dimensions has been a challenge lo the KDD community. Feature selection. an effective dimensionality reduction technique, is an essential pre-processing method to remove noisy features. In rhe literature there are only a few methods pmposed for feature selection for clustering. And, almost all of rhose methods are 'wrapper' techniques that require a clustering algorithm to evaluate the candidate feature subsets. The wrapper approach is largely unsuitable in real-world applications due to its heavy reliance on clustering algarirhms that require parameters such as number of clusters. and due ro lack of suitable clusrering criteria to evaluate clusrering in different subspaces. I n this paper we propose a %Iter' method that is independent of any clusrering algorithm. The proposed method is based on the observation that data with clusters has v e v different point-to-point distance histogram than that of data without clusters. Using this we propose an entropy measure thar is low ifdata has disrinct clusters and high otherwise. The entropy measure is suitable for selecting the most important subset of features because it is invariant with number of dimensions, and is affected only by the quality of clustering. Extensive performance evaluation over synthetic, benchmark, and real datasets shows its effectiveness.
The last decade has witnessed a tremendous growths of interests in applications that deal with querying and mining of time series data. Numerous representation methods for dimensionality reduction and similarity measures geared towards time series have been introduced. Each individual work introducing a particular method has made specific claims and, aside from the occasional theoretical justifications, provided quantitative experimental observations. However, for the most part, the comparative aspects of these experiments were too narrowly focused on demonstrating the benefits of the proposed methods over some of the previously introduced ones. In order to provide a comprehensive validation, we conducted an extensive set of time series experiments re-implementing 8 different representation methods and 9 similarity measures and their variants, and testing their effectiveness on 38 time series data sets from a wide variety of application domains. In this paper, we give an overview of these different techniques and present our comparative experimental findings regarding their effectiveness. Our experiments have provided both a unified validation of some of the existing achievements, and in some cases, suggested that certain claims in the literature may be unduly optimistic.
Replication of documents on geographically distributed servers can improve both performance and reliability of the Web service. Server selection algorithms allow Web clients to select one of the replicated servers which is "close" to them and thereby minimize the response time of the Web service. Using client proxy server traces, we compare the effectiveness of several "proximity" metrics including the number of hops between the client and server, the ping round trip time and the HTTP request latency. Based on this analysis, we design two new algorithms for selection of replicated servers and compare their performance against other existing algorithms. We show that the new server selection algorithms improve the performance of other existing algorithms on the average by 55%. In addition, the new algorithms improve the performance of the existing nonreplicated Web servers on average by 69%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.