Automated variable weighting in k-means type clustering

Huang, Joshua Zhexue; Ng, Michael K.; Rong, Hongqiang; Li, Zi‐Chen

doi:10.1109/tpami.2005.95

Cited by 703 publications

(387 citation statements)

References 17 publications

Supporting

Mentioning

387

Contrasting

Order By: Relevance

“…There are mainly two types. The first is text cluster [9][10][11][12][13][14]. The cluster analysis [9] is one of important method to realize text mining.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

An Accurate Topic Mining Algorithm Based on Business Dictionary

Yang

Lin

et al. 2017

ITM Web Conf.

View full text Add to dashboard Cite

Abstract:The text mining is an important branch of data mining. Many scientific research institutions and teams are actively exploring and putting forward algorithms. Because of industry and scene difference, it is difficult to use the common analysis algorithm of log to mine the potential information accurately. For example, a topic is given in one scene, how to find the main related words is not easy. To deal with the problem, this paper provides the accurate topic mining algorithm based on business dictionary. In the algorithm, segmenting with business dictionary is achieved in the document set after screening the valid documents. In this step, the document set is split into professional terms and then the invalid words are removed. Finally, the qualitative analysis is transformed to quantitative analysis. With the relevance index, the relevance degree of every word is computed. The relevance matrix is returned to the user to analyze the relevance of the words and topic. The algorithm has been applied to PMS and the validation result shows the main related factors can be analyzed accurately.

show abstract

“…There are mainly two types. The first is text cluster [9][10][11][12][13][14]. The cluster analysis [9] is one of important method to realize text mining.…”

Section: Introductionmentioning

confidence: 99%

“…Then the distance function of weighted characteristic comes into being. Another representation method is automatic weighted characteristic technique [12,13]. In the K-Means or FCM, the feature weight vector indicates importance of each feature on the whole data set.…”

Section: Introductionmentioning

confidence: 99%

An Accurate Topic Mining Algorithm Based on Business Dictionary

Yang

Lin

et al. 2017

ITM Web Conf.

View full text Add to dashboard Cite

show abstract

“…Feature weighting can be done at the same time as the clustering itself. Feature weighting has received considerable attention in partitional clustering (Amorim and Mirkin, 2012;Amorim and Fenner, 2012;Chan et al, 2004;Huang et al, 2005Huang et al, , 2008Makarenkov and Legendre, 2001), but not so in hierarchical clustering. Surely, it is possible to apply a feature selection algorithm to a dataset before using the Ward method.…”

Section: Introductionmentioning

confidence: 99%

“…We have decided to use the L p norm because this transforms the weights into feature rescaling factors, in contrast to the work of Chan et al (2004), and Huang et al (2005Huang et al ( , 2008.…”

Section: Introductionmentioning

confidence: 99%

Feature Relevance in Ward’s Hierarchical Clustering Using the L p Norm

Amorim

2015

J Classif

View full text Add to dashboard Cite

In this paper we introduce a new hierarchical clustering algorithm called Ward p . Unlike the original Ward, Ward p generates feature weights, which can be seen as feature rescaling factors thanks to the use of the L p norm. The feature weights are cluster dependent, allowing a feature to have different degrees of relevance at different clusters.We validate our method by performing experiments on a total of 75 realworld and synthetic datasets, with and without added features made of uniformly random noise. Our experiments show that: (i) the use of our feature weighting method produces results that are superior to those produced by the original Ward method on datasets containing noise features; (ii) it is indeed possible to estimate a good exponent p under a totally unsupervised framework. The clusterings produced by Ward p are dependent on p. This makes the estimation of a good value for this exponent a requirement for this algorithm, and indeed for any other also based on the L p norm.

show abstract

“…This essentially inductive view is that the disbenefit of noise and uncertainty generated by data led generalisation are outweighed by the greater risk of straightjacketing a classification to realise pre-ordained outcomes. Only recently have studies been conducted into how weighting schemes can be automated through an adaptation of the k means algorithm (Huang et al, 2005) Although some view PCA as useful to filter variables that may be redundant or have negative effects upon classification outcomes (Debenham et al, 2002), a contrary view is that the technique results in undesirable information loss and creates complexity in results which are difficult to interpret (Harris et al, 2005).…”

Section: Building the Bespoke He Geodemographic Classificationmentioning

confidence: 99%

Education Applications

Hannah¹

Health Informatics

View full text Add to dashboard Cite

This paper explores the use of geodemographic classifications to investigate the social, economic and spatial dimensions of participation in higher education. Education is a public service that confers very significant and tangible benefits upon receiving individuals: as such, we argue that understanding the geodemography of educational opportunity requires an application-specific classification, that exploits under-used educational data sources. We develop a classification for the UK higher education sector, and apply it to the Gospel Oak area of London. We discuss the wider merits of sector specific applications of geodemographics, with particular reference to issues of public service provision.

show abstract

Automated variable weighting in k-means type clustering

Cited by 703 publications

References 17 publications

An Accurate Topic Mining Algorithm Based on Business Dictionary

An Accurate Topic Mining Algorithm Based on Business Dictionary

Feature Relevance in Ward’s Hierarchical Clustering Using the L p Norm

Education Applications

Contact Info

Product

Resources

About