The Clustering Validity with Silhouette and Sum of Squared Errors

Thinsungnoen, Tippaya; Kaoungku, Nuntawut; Durongdumronchai, Pongsakorn; Kerdprasop, Kittisak; Kerdprasop, Nittaya

doi:10.12792/iciae2015.012

Cited by 116 publications

(62 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Next, the segments were characterized using a two-step clustering approach (Rundle-Thiele, Kubacki, Tkaczynski, & Parkinson, 2015). The higher the value >0, the more robust the cluster configuration (Thinsungnoena, Kaoungkub, Durongdumronchaib, Kerdprasopb, & Kerdprasopb, 2015). Using the Bayesian information criterion (BIC), the best number of clusters was identified.…”

Section: Discussionmentioning

confidence: 99%

“…First, the silhouette measure of cohesion and separation, a measure of how close each point in a cluster is to the points in its neighboring clusters (from −1 to +1), was calculated. The higher the value >0, the more robust the cluster configuration (Thinsungnoena, Kaoungkub, Durongdumronchaib, Kerdprasopb, & Kerdprasopb, 2015). Next, a test of significance was performed on each construct to identify the differences (if any) amongst the clusters.…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Segmenting urban populations for greater conservation gains: A new approach targeting cobenefits is required

MacDonald

Harbrow

Jack

et al. 2019

Conservat Sci and Prac

View full text Add to dashboard Cite

Engaging urban residents in greater proconservation behaviors is essential to mitigate the biodiversity crisis. To date, most behavior-change campaigns have been based on a one-size-fits-all "think-care-act" approach resulting in insufficient, sometimes counterproductive, conservation gains. In our study, we assess the "think-care-act" paradigm and also consider a range of cobenefits that may motivate different segments of urban populations to take greater conservation action for reasons other than biodiversity gains. We surveyed a representative sample of Auckland, New Zealand (n = 2,124) and four clusters emerged through clustering analysis. The first segment (Environmentally Active; 32%), exhibited the "thinkcare-act" paradigm. The second segment (Well Informed; 28%), was highly knowledgeable and concerned about conservation problems but exhibited lower conservation behaviors. The third segment (Active Outdoors; 19%) was actively engaged in outdoor activities, but exhibited low conservation knowledge, concern, and behaviors. The fourth segment (Socially Motivated; 21%), demonstrated high levels of conservation behaviors but lower knowledge and concern about conservation issues. We discuss potential ways to engage with each segment based on cobenefits and the need to move away from the traditional "think-care-act" paradigm and instead work with existing values systems and foster greater conservation behavior based on existing cobenefits.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

Segmenting urban populations for greater conservation gains: A new approach targeting cobenefits is required

MacDonald

Harbrow

Jack

et al. 2019

Conservat Sci and Prac

View full text Add to dashboard Cite

show abstract

“…Along with this, describe a new method for the silhouette to minimize the computation time with reducing addition operations amount during distance calculation, which has experimentally proven that about 50% CPU time gained. It is also a measure that helps in concluding clustering legitimacy and selecting the optimal K value to divide a ratio scale data into distinct classes [38]. For true K the preferable number of clusters whose silhouette value predicted large enough.…”

Section: B the Silhouette Methods Towards K Findingmentioning

confidence: 99%

Extractive based Text Summarization Using KMeans and TF-IDF

Khan¹,

Qian²,

Naeem³

2019

IJIEEB

View full text Add to dashboard Cite

The quantity of information on the internet is massively increasing and gigantic volume of data with numerous compositions accessible openly online become more widespread. It is challenging nowadays for a user to extract the information efficiently and smoothly. As one of the methods to tackle this challenge, text summarization process diminishes the redundant information and retrieves the useful and relevant information from a text document to form a compressed and shorter version which is easy to understand and timesaving while reflecting the main idea of the discussed topic within the document. The approaches of automatic text summarization earn a keen interest within the Text Mining and NLP (Natural Language Processing) communities because it is a laborious job to manually summarize a text document. Mainly there are two types of text summarization, namely extractive based and abstractive based. This paper focuses on the extractive based summarization using K-Means Clustering with TF-IDF (Term Frequency-Inverse Document Frequency) for summarization. The paper also reflects the idea of true K and using that value of K divides the sentences of the input document to present the final summary. Furth more, we have combined the K-means, TF-IDF with the issue of K value and predict the resulting system summary which shows comparatively best results.

show abstract

“…We determine the optimal clustering number using the Silhouette Index/Average Silhouette and Gap Statistic methods to remove this uncertainty. The first technique calculates the average silhouette value of the instances for multiple k values [23]. Optimal number of clusters maximizes the average silhouette score [23].…”

Section: Optimal Number Of Clustersmentioning

confidence: 99%

“…The first technique calculates the average silhouette value of the instances for multiple k values [23]. Optimal number of clusters maximizes the average silhouette score [23]. We may note that in Scikit-Learn toolkit, the default range of k is 1 to 10.…”

Section: Optimal Number Of Clustersmentioning

confidence: 99%

Detection of Auction Fraud in Commercial Sites

Anowar

Sadaoui

2020

J. theor. appl. electron. commer. res.

View full text Add to dashboard Cite

Online auctions have become one of the most convenient ways to commit fraud due to a large amount of money being traded every day. Shill bidding is the predominant form of auction fraud, and it is also the most difficult to detect because it so closely resembles normal bidding behavior. Furthermore, shill bidding does not leave behind any apparent evidence, and it is relatively easy to use to cheat innocent buyers. Our goal is to develop a classification model that is capable of efficiently differentiating between legitimate bidders and shill bidders. For our study, we employ an actual training dataset, but the data are unlabeled. First, we properly label the shill bidding samples by combining a robust hierarchical clustering technique and a semi-automated labeling approach. Since shill bidding datasets are imbalanced, we assess advanced over-sampling, under-sampling and hybrid-sampling methods and compare their performances based on several classification algorithms. The optimal shill bidding classifier displays high detection and low misclassification rates of fraudulent activities.

show abstract

The Clustering Validity with Silhouette and Sum of Squared Errors

Cited by 116 publications

References 16 publications

Segmenting urban populations for greater conservation gains: A new approach targeting cobenefits is required

Segmenting urban populations for greater conservation gains: A new approach targeting cobenefits is required

Extractive based Text Summarization Using KMeans and TF-IDF

Detection of Auction Fraud in Commercial Sites

Contact Info

Product

Resources

About