2012
DOI: 10.1371/journal.pone.0029578
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Analysis of Classical Biomedical Markers: Robustness and Medical Relevance of Patient Clustering Using Bioinformatics Tools

Abstract: MotivationIt has been proposed that clustering clinical markers, such as blood test results, can be used to stratify patients. However, the robustness of clusters formed with this approach to data pre-processing and clustering algorithm choices has not been evaluated, nor has clustering reproducibility. Here, we made use of the NHANES survey to compare clusters generated with various combinations of pre-processing and clustering algorithms, and tested their reproducibility in two separate samples.MethodValues … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
4
1

Year Published

2013
2013
2024
2024

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 9 publications
(5 citation statements)
references
References 18 publications
0
4
1
Order By: Relevance
“…Finally, our results on ethnic and educational disparities in the prevalence of specific clusters were consistent with previous studies that considered risk factors either individually 36,78 or through the lens of optimal cardiometabolic health 23 , but these studies did not examine disparities in a comprehensive set of cardiometabolic and renal phenotypes of risk factors. Our results are not directly comparable with those using electronic health records due to differences in the study population, methods and clinical conditions used in the clustering and because some of these studies aimed at identifying subtypes of specific diseases 45,47,48,50,51,[53][54][55][56][57][58][59][60][61][62][63][64][65][66][67][68] . Among such studies, two studies in different populations identified phenotypes characterized by compromised kidney function and low DBP 50,56 .…”
Section: Discussioncontrasting
confidence: 73%
“…Finally, our results on ethnic and educational disparities in the prevalence of specific clusters were consistent with previous studies that considered risk factors either individually 36,78 or through the lens of optimal cardiometabolic health 23 , but these studies did not examine disparities in a comprehensive set of cardiometabolic and renal phenotypes of risk factors. Our results are not directly comparable with those using electronic health records due to differences in the study population, methods and clinical conditions used in the clustering and because some of these studies aimed at identifying subtypes of specific diseases 45,47,48,50,51,[53][54][55][56][57][58][59][60][61][62][63][64][65][66][67][68] . Among such studies, two studies in different populations identified phenotypes characterized by compromised kidney function and low DBP 50,56 .…”
Section: Discussioncontrasting
confidence: 73%
“…Second, the type of data (observational/longitudinal) is also critical in cluster analysis to give a chance to observe temporal patterns of disease progression, as cluster analysis does not explain the aetiology of the disease. Third, the number of clusters depends on the specific methodology applied as well as the proportions of populations among clusters that could vary based on the chosen sample size and the presence/absence of scaling the dataset (preprocessing) [ 25 ].…”
Section: Discussionmentioning
confidence: 99%
“…An unsupervised clustering analysis was performed using the “ConsensusClusterPlus” R package. 18 The k-means approach was run 1000 times with a maximum subtype number of k (k = 6). The optimal subtype number was ascertained using the cumulative distribution function curve, consensus matrix, and consistent cluster score.…”
Section: Methodsmentioning
confidence: 99%