2020
DOI: 10.1002/bimj.201900366
|View full text |Cite
|
Sign up to set email alerts
|

Clustering with missing and left‐censored data: A simulation study comparing multiple‐imputation‐based procedures

Abstract: Cluster analysis, commonly used to explore large biomedical datasets, can be challenging, notably due to missing data or left‐censored data induced by the sensitivity limits of the biochemical measurement method. Usually, complete‐case analysis, simple imputation, or stochastic simple imputation are applied before clustering. More recently, consensus methods following multiple imputation have been proposed. However, they ignore left‐censoring and do not allow the number of clusters to vary across the partition… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
18
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 9 publications
(19 citation statements)
references
References 40 publications
1
18
0
Order By: Relevance
“…(b) T distribution for ρ = 0.6 As expected, for M ≥ 2 the instability is constant whatever the proportion of missing values, the number of individuals or the correlation between variables. Similar results are observed when data are generated under a MAR mechanism or when data are imputed by FCS-RF (see Figures 9,10,11 in Appendix).…”
Section: Influence Of Msupporting
confidence: 76%
See 2 more Smart Citations
“…(b) T distribution for ρ = 0.6 As expected, for M ≥ 2 the instability is constant whatever the proportion of missing values, the number of individuals or the correlation between variables. Similar results are observed when data are generated under a MAR mechanism or when data are imputed by FCS-RF (see Figures 9,10,11 in Appendix).…”
Section: Influence Of Msupporting
confidence: 76%
“…Like CSPA, this approach in two steps cannot be considered as based on the median partition problem [20]. Lately, [10] proposed consensus based on the MultiCons algorithm [27]. The algorithm presents many advantages, in particular it allows a visualization of the hidden cluster structure in the data set, but it does not aim at minimizing the median partition problem [p. 16] [27].…”
Section: Partitions Pooling After MImentioning
confidence: 99%
See 1 more Smart Citation
“…Associations between circulating antibody levels and each categorical variable were assessed using the Wilcoxon–Mann–Whitney (2 levels variables) or Kruskal–Wallis (>2 levels variables) tests. Clinical patient profiles were determined using a clustering approach that handles missing data through multiple imputation and consensus clustering [ 10 ]. The variables used were circulating antibody levels, age, gender, need for oxygen therapy, pulmonary extension, BNP, D-dimers, lymphocyte count, ferritin, troponin, CRP, and NLR.…”
Section: Methodsmentioning
confidence: 99%
“…Approaches often make assumptions including few discrete time steps (Young et al 2018;Huopaniemi et al 2014); a single piecewise linear function (Young et al 2018) or Gaussian mixture model (Huopaniemi et al 2014); significantly more samples per object than number of objects (Mattar, Hanson, and Learned-Miller 2012;Gaffney and Smyth 2005); very small windows of potential misalignment (Liu, Tong, and Wheeler 2009;Listgarten et al 2007); or known lag time (Li et al 2011). Methods that directly measure similarity between time-series, e.g., dynamic time warping (Cuturi 2011) or methods that aggregate multiple imputation methods (Faucheux et al 2021) can also be used for clustering time-series data. Our method aims to cluster intervalcensored multivariate time series without these constraints.…”
Section: Related Workmentioning
confidence: 99%