2019
DOI: 10.7717/peerj.6918
|View full text |Cite
|
Sign up to set email alerts
|

We need to talk about reliability: making better use of test-retest studies for study design and interpretation

Abstract: Neuroimaging, in addition to many other fields of clinical research, is both time-consuming and expensive, and recruitable patients can be scarce. These constraints limit the possibility of large-sample experimental designs, and often lead to statistically underpowered studies. This problem is exacerbated by the use of outcome measures whose accuracy is sometimes insufficient to answer the scientific questions posed. Reliability is usually assessed in validation studies using healthy participants, however thes… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

7
150
0
1

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
3

Relationship

2
7

Authors

Journals

citations
Cited by 173 publications
(158 citation statements)
references
References 52 publications
7
150
0
1
Order By: Relevance
“…As such, an ICC of 1 indicates perfect measurement reliability with all observed variability being due to true (biological) differences and none to measurement variability (error), while an ICC of 0.5 indicates that the variability is comprised of true differences and measurement error in equal measure. Different interpretations of the ICC exist: as proposed by Portney & Watkins [28] and suggested by Matheson [27], we regard an ICC < 0.5 as low, 0.5-0.75 moderate, 0.75-0.9 good, and > 0.9 excellent. Measurement reliability with an ICC > 0.9 is recommended as a lowest acceptable standard for measurements from which diagnostic decisions are made, ICC > 0.7 for research purposes, with 0.95 and 0.8 considered as adequate, respectively [29].…”
Section: Discussionmentioning
confidence: 82%
See 1 more Smart Citation
“…As such, an ICC of 1 indicates perfect measurement reliability with all observed variability being due to true (biological) differences and none to measurement variability (error), while an ICC of 0.5 indicates that the variability is comprised of true differences and measurement error in equal measure. Different interpretations of the ICC exist: as proposed by Portney & Watkins [28] and suggested by Matheson [27], we regard an ICC < 0.5 as low, 0.5-0.75 moderate, 0.75-0.9 good, and > 0.9 excellent. Measurement reliability with an ICC > 0.9 is recommended as a lowest acceptable standard for measurements from which diagnostic decisions are made, ICC > 0.7 for research purposes, with 0.95 and 0.8 considered as adequate, respectively [29].…”
Section: Discussionmentioning
confidence: 82%
“…For statistical analysis, R version 3.4.3, was used with the package relfeas (https://github.com/mathesong/relfeas). [ 18 F]FE-PE2I measurement reproducibility was determined with calculation of repeatability (absolute intrasubject variability, AbsVar; and the minimum detectable difference, MDD) and reliability (intraclass correlation coe cient, ICC), as per recommendation of Weir, Baumgartner, and Matheson [25][26][27]. Absolute variability was calculated as: (test-retest)/(mean test and retest)*100.…”
Section: Discussionmentioning
confidence: 99%
“…Although graph measures, such as global efficiency and Gcc, show fair-to-good testretest reliability (Welton et al 2015), a recent meta-analysis showed that edges within a functional connectivity matrix -on the basis of which graph measures are calculated -show poor test-retest reliability (Noble et al 2019). This low test-retest reliability influences statistical power and necessitates the inclusion of larger samples to reach the effect size of interest (Matheson 2019;Zuo et al 2019). It is, however, important to note that test-retest reliability is not the same as validity and the meta-analysis showed that one of the main factors that influenced test-retest reliability was artefact correction (Noble et al 2019); a necessary step during preprocessing to remove motion and other non-neural physiological noise from the data and avoid spurious results (Parkes et al 2018).…”
Section: Discussionmentioning
confidence: 99%
“…The intra-class correlation coefficient (ICC) for interrater reliability along with the 95% confidence interval (95% CI) was calculated for all four raters, as well as for the two orthopaedic surgeons and two radiologist raters separately as a subgroup analysis. The two-way random effects model for the single rater was used for inter-rater reliability and ICC with a two-way mixed-effects model (single rater) was calculated for test-retest reliability, as recommended in literature [17][18][19] . The ICC values for PTTG and TTTG were compared.…”
Section: Methodsmentioning
confidence: 99%