K-Means Divide and Conquer Clustering

Khalilian, Madjid; Boroujeni, Farsad Zamani; Mustapha, Norwati; Sulaiman, Md. Nasir

doi:10.1109/iccae.2009.59

Cited by 22 publications

(13 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this step, we apply unsupervised discretization method to find boundary values in eligibility criteria and use them to subdivide the value intervals. K -means clustering, which takes the distribution of attribute values into account, is a popular unsupervised discretization method for quantizing one-dimensional continuous variables into non-uniform value intervals [33, 34]. We first retrieve all the occurrences of the boundary values of each quantitative variable in the eligibility criteria of T2DM studies.…”

Section: Methodsmentioning

confidence: 99%

Multivariate analysis of the population representativeness of related clinical studies

Ryan

Hoxha

et al. 2016

Journal of Biomedical Informatics

View full text Add to dashboard Cite

Objective To develop a multivariate method for quantifying the population representativeness across related clinical studies and a computational method for identifying and characterizing underrepresented subgroups in clinical studies. Methods We extended a published metric named Generalizability Index for Study Traits (GIST) to include multiple study traits for quantifying the population representativeness of a set of related studies by assuming the independence and equal importance among all study traits. On this basis, we compared the effectiveness of GIST and multivariate GIST (mGIST) qualitatively. We further developed an algorithm called “Multivariate Underrepresented Subgroup Identification” (MAGIC) for constructing optimal combinations of distinct value intervals of multiple traits to define underrepresented subgroups in a set of related studies. Using Type 2 diabetes mellitus (T2DM) as an example, we identified and extracted frequently used quantitative eligibility criteria variables in a set of clinical studies. We profiled the T2DM target population using the National Health and Nutrition Examination Survey (NHANES) data. Results According to the mGIST scores for four example variables, i.e., age, HbA1c, BMI, and gender, the included observational T2DM studies had superior population representativeness than the interventional T2DM studies. For the interventional T2DM studies, Phase I trials had better population representativeness than Phase III trials. People at least 65 years old with HbA1c value between 5.7% and 7.2% were particularly underrepresented in the included T2DM trials. These results confirmed well-known knowledge and demonstrated the effectiveness of our methods in population representativeness assessment. Conclusions mGIST is effective at quantifying population representativeness of related clinical studies using multiple numeric study traits. MAGIC identifies underrepresented subgroups in clinical studies. Both data-driven methods can be used to improve the transparency of design bias in participation selection at the research community level.

show abstract

Section: Methodsmentioning

confidence: 99%

Multivariate analysis of the population representativeness of related clinical studies

Ryan

Hoxha

et al. 2016

Journal of Biomedical Informatics

View full text Add to dashboard Cite

show abstract

“…In our technique, we combine each user clicks and report and question contents to determine the similarity. Better outcomes [5].…”

Section: Fig1 Different Clusteringmentioning

confidence: 99%

Review on Data Analysis Using Data Mining Techniques for Optimized Proteins Localization

Rajput¹,

Shrivastav²

2018

International Journal of Advanced Research in Computer and Comm

View full text Add to dashboard Cite

Cluster analysis may be a descriptive task that seeks to identify consistent cluster of object and it's additionally one in all the most analytical technique in data processing. K-mean is that the preferred partitional bunch technique. During this paper they have a tendency to discuss commonplace k mean formula and analyze the defect of kmean formula. During this paper 3 dissimilar changed k-mean formulas are mentioned that take away the limitation of k-mean formula and improve the speed and potency of k-mean formula. Experiments supported the standard data UCI show that the projected technique can end up a high purity cluster results and eliminate the sensitivity to the initial centers to some extent. E.Coli dataset and Yeast dataset resides issue organism and altogether totally different super molecule assign in their cell. If that protein is wounded, then these cause varied infections that affected anatomy adversely. So, the target of this work is to classify proteins into altogether totally different cellular localization sites supported organic compound sequences of E.Coli bacterium and Yeast. It's found that projected bunch provides correct result as compared to K-Mean and is perfect resolution to localization of proteins. It's additionally called nearest neighbor looking. It merely clusters the datasets into given variety of clusters. Varied efforts are created to improve the presentation of the K-means bunch formula. Throughout this paper they've been briefed among the sort of a review the work distributed by the assorted researchers' victimization K-means bunch. They have mentioned the restrictions and applications of the K-means bunch formula still. Detect our projected formula best resolution.

show abstract

“…However, the first description of the D&C algorithm appears in John Mauchly's article discussing its application in computer sorting [19]. Nowadays, the D&C approach is applied widely in areas such as Parallel Computing [20], Clustering Computing [21], Granular Computing [22], and Huge Data Mining [23].…”

Section: A Divide-and-conquer (Dandc)mentioning

confidence: 99%

Software Cost Estimation Framework for Service-Oriented Architecture Systems Using Divide-and-Conquer Approach

Keung

2010

2010 Fifth IEEE International Symposium on Service Oriented System Engineering

View full text Add to dashboard Cite

Due to the complexity of Service-Oriented Architecture (SOA), cost and effort estimation for SOA-based software development is more difficult than that for traditional software development. Unfortunately, there is a lack of published work about cost and effort estimation for SOA-based software. Existing cost estimation approaches are inadequate to address the complex service-oriented systems. This paper proposes a novel framework based on Divide-and-Conquer (D&C) for cost estimation for building SOA-based software. By dealing with separately development parts, the D&C framework can help organizations simplify and regulate SOA implementation cost estimation. Furthermore, both cost estimation modeling and software sizing work can be satisfied respectively by switching the corresponding metrics within this framework. Given the requirement of developing these metrics, this framework also defines the future research in four different directions according to the separate cost estimation sub-problems.

show abstract

K-Means Divide and Conquer Clustering

Cited by 22 publications

References 6 publications

Multivariate analysis of the population representativeness of related clinical studies

Multivariate analysis of the population representativeness of related clinical studies

Review on Data Analysis Using Data Mining Techniques for Optimized Proteins Localization

Software Cost Estimation Framework for Service-Oriented Architecture Systems Using Divide-and-Conquer Approach

Contact Info

Product

Resources

About