Balancing effort and benefit of K-means clustering algorithms in Big Data realms

Pérez-Ortega, Joaquín; Almanza-Ortega, Nelva Nely; Romero, David

doi:10.1371/journal.pone.0201874

Cited by 26 publications

(20 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…K-means clustering is a kind of data clustering techniques to divide cases or variables of a dataset into non-overlapping groups/clusters, based on the characteristics uncovered. The goal is to produce groups of cases/variables with a high degree of similarity within each group and a low degree of similarity between groups [26–29]. In this method, we only used the FBIS score and classified the sample into high burden and low burden group by K-means clustering to get a cutoff point for the FBIS score.…”

Section: Methodsmentioning

confidence: 99%

Determining a cutoff score for the family burden interview schedule using three statistical methods

Liu

Zhou

et al. 2019

BMC Med Res Methodol

View full text Add to dashboard Cite

Background While it is widely acknowledged that family burden can be ameliorated with effective psycho-social interventions, how to measure family burden and define a valid cutoff to identify family caregivers in need of such interventions remains a key question. The purpose of the present study was to determine a statistically valid cutoff score for the Family Burden Interview Schedule (FBIS), using the cutoff scores of the Patient Health Questionnaire (PHQ-9) and the Generalized Anxiety Disorder Scale (GAD-7) as the reference. Methods The FBIS, PHQ-9, and GAD-7 were administered to a representative community sample of 327 family caregivers of schizophrenia patients. A FBIS cutoff score was determined using three different statistical methods: tree-based modeling, K-means clustering technique and linear regression. Contingency analysis was conducted to compare the FBIS cutoff with depression and anxiety scale scores. Results Findings proposed a cutoff score of 23 for the FBIS, with sensitivity being 76% for PHQ-9 and 74% for GAD-7, specificity being 68% for PHQ-9 and 67% for GAD-7. Conclusion This cutoff score would enable health care providers to assess family caregivers at risk and provide necessary interventions to improve their quality of life. Electronic supplementary material The online version of this article (10.1186/s12874-019-0734-8) contains supplementary material, which is available to authorized users.

show abstract

Section: Methodsmentioning

confidence: 99%

Determining a cutoff score for the family burden interview schedule using three statistical methods

Liu

Zhou

et al. 2019

BMC Med Res Methodol

View full text Add to dashboard Cite

show abstract

“…Median lesion intensities per lesion from the FSPGR, SE, and FLAIR sequences were used, the number of clusters was set to 2; the iterate and classify method was used, and the number of maximum iterations was set to 10. The K-means cluster algorithm creates clusters from the dataset, placing centroids in a way that the data in a given cluster have similar attributes or closeness to the centroid, whilst the distance between clusters (centroids) is maximized ( 23 ). In order to quantify how median intensity values of the two lesion type clusters differ from the normal white matter intensity profile, we employed a bootstrap-based approach using a custom-made MATLAB script.…”

Section: Methodsmentioning

confidence: 99%

Two Classes of T1 Hypointense Lesions in Multiple Sclerosis With Different Clinical Relevance

et al. 2021

View full text Add to dashboard Cite

Background: Hypointense lesions on T1-weighted images have important clinical relevance in multiple sclerosis patients. Traditionally, spin-echo (SE) sequences are used to assess these lesions (termed black holes), but Fast Spoiled Gradient-Echo (FSPGR) sequences provide an excellent alternative.Objective: To determine whether the contrast difference between T1 hypointense lesions and the surrounding normal white matter is similar on the two sequences, whether different lesion types could be identified, and whether the clinical relevance of these lesions types are different.Methods: Seventy-nine multiple sclerosis patients' lesions were manually segmented, then registered to T1 sequences. Median intensity values of lesions were identified on all sequences, then K-means clustering was applied to assess whether distinct clusters of lesions can be defined based on intensity values on SE, FSPGR, and FLAIR sequences. The standardized intensity of the lesions in each cluster was compared to the intensity of the normal appearing white matter in order to see if lesions stand out from the white matter on a given sequence.Results: 100% of lesions on FSPGR images and 69% on SE sequence in cluster #1 exceeded a standardized lesion distance of Z = 2.3 (p < 0.05). In cluster #2, 78.7% of lesions on FSPGR and only 17.7% of lesions on SE sequence were above this cutoff value, meaning that these lesions were not easily seen on SE images. Lesion count in the second cluster (lesions less identifiable on SE) significantly correlated with the Expanded Disability Status Scale (EDSS) (R: 0.30, p ≤ 0.006) and with disease duration (R: 0.33, p ≤ 0.002).Conclusion: We showed that black holes can be separated into two distinct clusters based on their intensity values on various sequences, only one of which is related to clinical parameters. This emphasizes the joint role of FSPGR and SE sequences in the monitoring of MS patients and provides insight into the role of black holes in MS.

show abstract

“…The Clustering algorithm has the advantage of finding a solution for a large complex vehicle routing and scheduling problem by splitting the problem into sub-problems of smaller clusters to solve, which is relatively easier, and combining the outcomes to form a total solution. It can provide a good balance between effort and quality of solution [ 101 ]. The shortcoming is that it can be challenging in splitting the original problem into an appropriate number of clusters to obtain optimality.…”

Section: Solution Approachesmentioning

confidence: 99%

Multi-resource scheduling and routing for emergency recovery operations

Bodaghi

Shahparvari

Fadaki

et al. 2020

International Journal of Disaster Risk Reduction

View full text Add to dashboard Cite

Efficient delivery of multiple resources for emergency recovery during disasters is a matter of life and death. Nevertheless, most studies in this field only handle situations involving single resource. This paper formulates the Multi-Resource Scheduling and Routing Problem (MRSRP) for emergency relief and develops a solution framework to effectively deliver expendable and non-expendable resources in Emergency Recovery Operations. Six methods, namely, Greedy, Augmented Greedy, k-Node Crossover, Scheduling. Monte Carlo, and Clustering, are developed and benchmarked against the exact method (for small instances) and the genetic algorithm (for large instances). Results reveal that all six heuristics are valid and generate near or actual optimal solutions for small instances. With respect to large instances, the developed methods can generate near-optimal solutions within an acceptable computational time frame. The Monte Carlo algorithm, however, emerges as the most effective method. Findings of comprehensive comparative analysis suggest that the proposed MRSRP model and the Monte Carlo method can serve as a useful tool for decision-makers to better deploy resources during emergency recovery operations.

show abstract

Balancing effort and benefit of K-means clustering algorithms in Big Data realms

Cited by 26 publications

References 30 publications

Determining a cutoff score for the family burden interview schedule using three statistical methods

Determining a cutoff score for the family burden interview schedule using three statistical methods

Two Classes of T1 Hypointense Lesions in Multiple Sclerosis With Different Clinical Relevance

Multi-resource scheduling and routing for emergency recovery operations

Contact Info

Product

Resources

About