Characterisation, identification, clustering, and classification of disease

Webster, Anthony J; Gaitskell, Kezia; Turnbull, Iain; Cairns, BJ; Clarke, Robert

doi:10.1038/s41598-021-84860-z

Cited by 25 publications

(39 citation statements)

References 45 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…An alternative clustering-based approach, is to assume that diseases are in one or more clusters with equal associations, and test if the log-likelihood for the model [32] is minimised by one, or more clusters. This objective test can incorporate a prior, and examples suggest it is more lenient than a Q 2 test, with the disease clustering data of Webster et al [1] minimising the log-likelihood when I 2 50%. The merits of this approach for applications such as meta-analyses, will need exploring in greater detail elsewhere.…”

Section: Discussionmentioning

confidence: 99%

“…The null hypothesis (in a fixed effects model), is that all diseases in a composite endpoint have the same associations with one or more parameters, such as a drug, or a collection of potential risk factors. These might be a subset of associations, with potential confounders adjusted for, and subsequently removed by marginalisation [1]. Consider m composite endpoints (or clusters of diseases), labelled by g. Under the null hypothesis of the same associations for diseases in a composite endpoint, labelled i = 1 to i = n g ,…”

Section: Heterogeneity Of Associationsmentioning

confidence: 99%

“…For a flat prior, the sum is from i = 1 to i = n g . The subscripts g allow the discussion to include more than one cluster of diseases, as was considered in Webster et al [1], or a random effects model. Here unless stated otherwise, we consider a single composite endpoint, and the subscript g could be omitted.…”

Section: Heterogeneity Of Associationsmentioning

confidence: 99%

“…It is common for epidemiological studies [1] and clinical trials [2][3][4] to group similar diseases together into a cluster of diseases, providing more total cases, and more statistical power to detect associations. This procedure has been essential since the pioneering epidemiological studies of John Graunt in the 1600s [5], providing sufficient cases to allow meaningful statistical study, while attempting to ensure that clustered diseases have a similar etiology.…”

Section: Introductionmentioning

confidence: 99%

“…The ICD system clusters increasingly detailed disease descriptions into larger clusters with similar disease etiology, often allowing larger clusters to be used in epidemiological studies. In practice, diseases are selected by a clinician to help ensure that only diseases with a clearly defined and common etiology are clustered together [1]. With datadriven clustering studies increasingly being used to identify potential composite endpoints [1,[7][8][9][10][11], it seems appropriate to review the assumptions needed to study a cluster of diseases, and the existing arguments for and against doing so in clinical trials.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Statistical tests for heterogeneity of clusters and composite endpoints

Webster

2021

Preprint

View full text Add to dashboard Cite

Clinical trials and epidemiological cohort studies often group similar diseases together into a composite endpoint, to increase statistical power. A common example is to use a 3-digit code from the International Classification of Diseases (ICD), to represent a collection of several 4-digit coded diseases. More recently, data-driven studies are using associations with risk factors to cluster diseases, leading this article to reconsider the assumptions needed to study a composite endpoint of several potentially distinct diseases. An important assumption is that the (possibly multivariate) associations are the same for all diseases in a composite endpoint (not heterogeneous). Therefore, multivariate measures of heterogeneity from meta analysis are considered, including multi-variate versions of the I2 statistic and Cochran's Q statistic. Whereas meta-analysis offers tools to test heterogeneity of clustering studies, clustering models suggest an alternative heterogeneity test, of whether data are better described by one, or more, clusters of elements with the same mean. The assumptions needed to model composite endpoints with a proportional hazards model are also considered. It is found that the model can fail if one or more diseases in the composite endpoint have different associations. Tests of the proportional hazards assumption can help identify when this occurs. It is emphasised that in multi-stage diseases such as cancer, some germline genetic variants can strongly modify the baseline hazard function and cannot be adjusted for, but must instead be used to stratify the data.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Heterogeneity Of Associationsmentioning

confidence: 99%

Section: Heterogeneity Of Associationsmentioning

confidence: 99%