2018
DOI: 10.1093/bioinformatics/bty899
|View full text |Cite
|
Sign up to set email alerts
|

Clustermatch: discovering hidden relations in highly diverse kinds of qualitative and quantitative data without standardization

Abstract: Motivation: Heterogeneous and voluminous data sources are common in modern datasets, particularly in systems biology studies. For instance, in multi-holistic approaches in the fruit biology field, data sources can include a mix of measurements such as morpho-agronomic traits, different kinds of molecules (nucleic acids and metabolites) and consumer preferences. These sources not only have different types of data (quantitative and qualitative), but also large amounts of variables with possibly non-linear relati… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
2
2
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 28 publications
0
2
0
Order By: Relevance
“…Recent implementations of MIC, for example, take several seconds to compute on a single variable pair across a few thousand objects or conditions [26]. We previously developed a clustering method for highly diverse datasets that significantly outperformed approaches based on Pearson, Spearman, DC and MIC in detecting clusters of simulated linear and nonlinear relationships with varying noise levels [29]. Here we introduce the Clustermatch Correlation Coefficient (CCC), an efficient not-only-linear coefficient that works across quantitative and qualitative variables.…”
Section: Introductionmentioning
confidence: 99%
“…Recent implementations of MIC, for example, take several seconds to compute on a single variable pair across a few thousand objects or conditions [26]. We previously developed a clustering method for highly diverse datasets that significantly outperformed approaches based on Pearson, Spearman, DC and MIC in detecting clusters of simulated linear and nonlinear relationships with varying noise levels [29]. Here we introduce the Clustermatch Correlation Coefficient (CCC), an efficient not-only-linear coefficient that works across quantitative and qualitative variables.…”
Section: Introductionmentioning
confidence: 99%
“…A prevalent question is how we integrate large extents of variables, both quantitative and qualitative, which may have a priori non-linear relationships between them. Some resources are emerging which provide insights into complex relationships derived from highly heterogeneous datasets (including the capacity to assess not only omics data but also morpho-agronomic traits) such as Cluster-match which has promising features in next generation agronomic data-mining (Pividori et al, 2018).…”
Section: Introducing Physiomics In the Post-genomic Eramentioning
confidence: 99%