2017
DOI: 10.1371/journal.pone.0188274
|View full text |Cite
|
Sign up to set email alerts
|

Clustering of samples and variables with mixed-type data

Abstract: Analysis of data measured on different scales is a relevant challenge. Biomedical studies often focus on high-throughput datasets of, e.g., quantitative measurements. However, the need for integration of other features possibly measured on different scales, e.g. clinical or cytogenetic factors, becomes increasingly important. The analysis results (e.g. a selection of relevant genes) are then visualized, while adding further information, like clinical factors, on top. However, a more integrative approach is des… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
42
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 54 publications
(42 citation statements)
references
References 40 publications
0
42
0
Order By: Relevance
“…The transformation of this expression pattern also indicates that the progression of the disease affects the function of gene regulation, which leads to some function level changes. The clustering results are visualized using heatmap thermal maps [ 15 ].…”
Section: Methodsmentioning
confidence: 99%
“…The transformation of this expression pattern also indicates that the progression of the disease affects the function of gene regulation, which leads to some function level changes. The clustering results are visualized using heatmap thermal maps [ 15 ].…”
Section: Methodsmentioning
confidence: 99%
“…From Tables 10 and 11, it can be seen that the proposed methodology improves the performance of the otherwise independent models and achieves comparable or better performance compared to the models proposed in previous studies. In addition, the CKD data set is composed of mixed variables (numeric and category), so the similarity evaluation methods based on mixed data could be used to calculate the similarity between samples, such as general similarity coefficient [37]. In this study, we used euclidean distance to evaluate the similarity between samples, and KNN could obtain a good result based on euclidean distance with the highest accuracy of 99.25%.…”
Section: Experiments and Evaluationsmentioning
confidence: 99%
“…Thus, of the 318 observed calves, 191 were included in the analysis. To visualize the data, a hierarchical cluster analysis of variables was performed using the package CluMix 57 . Calves were clustered by similarity using Gowers distance and variables clustered using a combination of association measures according to the CluMix-ama approach 58 .…”
Section: Sampling and Microbiological Analysis Of Vtec O157: H7 Faecmentioning
confidence: 99%
“…To visualize the data, a hierarchical cluster analysis of variables was performed using the package CluMix 57 . Calves were clustered by similarity using Gowers distance and variables clustered using a combination of association measures according to the CluMix-ama approach 58 . The combined effects of each individual risk factors and their association with colonisation of VTEC O157:H7 were analysed using Elastic net regression in the package glmnet (alpha = 0.5) 59 .…”
Section: Sampling and Microbiological Analysis Of Vtec O157: H7 Faecmentioning
confidence: 99%