Random projection trees for vector quantization

Dasgupta,; Freund,

doi:10.1109/allerton.2008.4797555

Cited by 25 publications

(42 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is NP-hard to find optimal clusterings even for two clusters [8], [9]. Therefore, dimensionality reduction methods have been extensively studied in the literature to reduce the number of dimensions.…”

Section: Related Workmentioning

confidence: 99%

Feature Construction and Calibration for Clustering Daily Load Curves from Smart-Meter Data

Alotaibi

Jin

Wilcox

et al. 2016

IEEE Trans. Ind. Inf.

120

View full text Add to dashboard Cite

General rightsThis document is made available in accordance with publisher policies. Please cite only the published version using the reference above. Abstract-This paper proposes and compares feature construction and calibration methods for clustering daily electricity load curves. Such load curves describe electricity demand over a period of time. A rich body of the literature has studied clustering of load curves, usually using temporal features. This limits the potential to discover new knowledge which may not be best represented as models consisting of all time points on load curves. This paper presents three new methods to construct features: conditional filters on time-resolution based features, calibration and normalization, and using profile errors. These new features extend the potential of clustering load curves. Moreover, smart metering is now generating high-resolution time series, and so the dimensionality reduction offered by these features is welcome.The clustering results using the proposed new features are compared with clusterings obtained from temporal features as well as clusterings with Fourier features, using household electricity consumption time series as test data. The experimental results suggest that the proposed feature construction methods offer new means for gaining insight in energy consumption patterns.

show abstract

Section: Related Workmentioning

confidence: 99%

Feature Construction and Calibration for Clustering Daily Load Curves from Smart-Meter Data

Alotaibi

Jin

Wilcox

et al. 2016

IEEE Trans. Ind. Inf.

120

View full text Add to dashboard Cite

show abstract

“…The principal function of algorithm involves finding the k-means. First, an initial set of means is defined and then subsequent classification is based on their distances to the centres [6]. Next, the clusters' mean is computed again and then reclassification is done based on the new set of means.…”

Section: K-means Clustering Techniquementioning

confidence: 99%

“…It could be concluded that, it was a group which liked to use credit cards, spent more freely, believed in women power, believed in economics rather than politics and felt quality products could be worth purchasing. Also, they seemed to have taste of modern life style and were fashion oriented.Cluster-4 gave out an analysis that the variables 2, 4, 5, 7, 10 belong to this cluster, had opposite statistical characteristics to the variable 1, 3,6,8,11,12,13 and were neutral in comparison to variables 14, 15. It was concluded that, this group was optimistic, free spending and a good target for TV advertising, particularly consumer durables items and entertainment.…”

mentioning

confidence: 99%

“…Cluster-4 gave out an analysis that the variables 2, 4, 5, 7, 10 belong to this cluster, had opposite statistical characteristics to the variable 1, 3,6,8,11,12,13 and were neutral in comparison to variables 14, 15. It was concluded that, this group was optimistic, free spending and a good target for TV advertising, particularly consumer durables items and entertainment.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Customer Segmentation Using Clustering and Data Mining Techniques

Kashwan¹,

Velu²

2013

IJCTE

View full text Add to dashboard Cite

Abstract-Clustering technique is critically important step in data mining process. It is a multivariate procedure quite suitable for segmentation applications in the market forecasting and planning research. This research paper is a comprehensive report of k-means clustering technique and SPSS Tool to develop a real time and online system for a particular super market to predict sales in various annual seasonal cycles. The model developed was an intelligent tool which received inputs directly from sales data records and automatically updated segmentation statistics at the end of day's business. The model was successfully implemented and tested over a period of three months. A total of n = 2138, customer, were tested for observations which were then divided into k = 4 similar groups. The classification was based on nearest mean. An ANOVA analysis was also carried out to test the stability of the clusters. The actual day to day sales statistics were compared with predicted statistics by the model. Results were quite encouraging and had shown high accuracy.Index Terms-Cluster analysis, data mining, customer segmentation, ANOVA analysis. I. INTRODUCTIONHighlight Clustering is a statistical technique much similar to classification. It sorts raw data into meaningful clusters and groups of relatively homogeneous observations. The objects of a particular cluster have similar characteristics and properties but differ with those of other clusters. The grouping is accomplished by finding similarities among data according to characteristics found in raw data [1]. The main objective was to find optimum number of clusters. There are two basic types of clustering methods, hierarchical and non-hierarchical. Clustering process is not one time task but is continuous and an iterative process of knowledge discovery from huge quantities of raw and unorganized data [2]. For a particular classification problem, an appropriate clustering algorithm and parameters must be selected for obtaining optimum results. [3]. Clustering is a type of explorative data mining used in many application oriented areas such as machine learning, classification and pattern recognition [4]. In recent times, data mining is gaining much faster momentum for knowledge based services such as distributed and grid computing. Cloud computing is yet another example of Manuscript received December 25, 2012; revised February 28, 2013. Kishana R. Kashwan is with the Department of Electronics and Communication Engineering-PG, Sona College of Technology (An Autonomous Institution Affiliated to Anna University), TPT Road, Tamil Nadu, drkrkashwan@sonatech.ac.in).C. M. Velu is with the Department of CSE, Dattakala Group of Institutions, Swami Chincholi, Daund, Pune-413130, India (e-mail: cmvelu41@gmail.com).frontier research topic in computer science and engineering.For clustering method, the most important property is that a tuple of particular cluster is more likely to be similar to the other tuples within the same cluster than the tuples of other clusters. For classific...

show abstract

“…Even though the number of clusters is small, the problem of finding an optimal k-means algorithm solution is NP-hard [2,3]. For this reason, a k-means algorithm adapts heuristics and finds local minimum as approximate optimal solutions.…”

Section: Introductionmentioning

confidence: 99%

A Fast K-prototypes Algorithm Using Partial Distance Computation

Kim

2017

Symmetry

View full text Add to dashboard Cite

Abstract:The k-means is one of the most popular and widely used clustering algorithm; however, it is limited to numerical data only. The k-prototypes algorithm is an algorithm famous for dealing with both numerical and categorical data. However, there have been no studies to accelerate it. In this paper, we propose a new, fast k-prototypes algorithm that provides the same answers as those of the original k-prototypes algorithm. The proposed algorithm avoids distance computations using partial distance computation. Our k-prototypes algorithm finds minimum distance without distance computations of all attributes between an object and a cluster center, which allows it to reduce time complexity. A partial distance computation uses a fact that a value of the maximum difference between two categorical attributes is 1 during distance computations. If data objects have m categorical attributes, the maximum difference of categorical attributes between an object and a cluster center is m. Our algorithm first computes distance with numerical attributes only. If a difference of the minimum distance and the second smallest with numerical attributes is higher than m, we can find the minimum distance between an object and a cluster center without distance computations of categorical attributes. The experimental results show that the computational performance of the proposed k-prototypes algorithm is superior to the original k-prototypes algorithm in our dataset.

show abstract

Random projection trees for vector quantization

Cited by 25 publications

References 14 publications

Feature Construction and Calibration for Clustering Daily Load Curves from Smart-Meter Data

Feature Construction and Calibration for Clustering Daily Load Curves from Smart-Meter Data

Customer Segmentation Using Clustering and Data Mining Techniques

A Fast K-prototypes Algorithm Using Partial Distance Computation

Contact Info

Product

Resources

About