C linical trials in acute stroke are large and expensive, even with recent innovations in trial design.1 The modified Rankin scale (mRS) is the most commonly used outcome measure in stroke research. 2 The mRS is an ordinal scale of 7 categories describing full recovery, by increasing degrees of disability, to death.3 Typically, mRS assessment is based on a clinician's rating of a patient interview, and interobserver variability is common. 4 Meta-analysis suggests an overall reliability of κ=0.62 (κ w =0.9), 4 but this may be less (κ=0.25) in multicenter studies. 5 Mandatory training in mRS assessment is used in most trials to mitigate this, 6 but the problem persists. The end point misclassification inherent in this interobserver variability may affect trial power 7 and treatment effect size. 8 Central adjudication of trial end points is routinely used in a variety of settings but has been rarely used in stroke. Group adjudication of mRS has been based on review of written summaries 9 or telephone interview. 10,11 Advantages to a remote adjudication approach include the following: expert opinion and experience; quality control of mRS interviews and avoiding potential bias from use of mRS where local observers are not or cannot be blinded to treatment allocation. To date, remote functional assessment has been limited as a result of the difficulty in capturing a high-fidelity recording, suitable for off-line review. Furthermore, most trials are international, and culturally sensitive translation may also prove problematic. Our pilot data suggest that a video-based mRS assessment is a valid and reliable solution.
12Background and Purpose-Use of the modified Rankin scale (mRS) in multicenter trials may be limited by interobserver variability. We assessed the effect of this on trial power and developed a novel group adjudication approach. Methods-We generated power and sample size estimates from simulated trials modeled with varying mRS reliability.We conducted a virtual acute stroke trial across 14 UK sites to develop a group adjudication approach. Traditional mRS interviews, performed at local sites, were digitally recorded and scored by adjudication committee. We assessed the effect of translation by comparing scores in translated mRS interviews, originally conducted in English and Mandarin. Agreement was measured using κ and weighted κ (κ w ) statistics and intraclass correlation coefficient. Results-Statistical simulations suggest that improving mRS reliability from κ=0.25 to κ=0.5 or 0.7 may allow reductions in sample size of n=386 or 490 in a typical n=2000 study. Our virtual acute stroke trial included 370 participants and 563 mRS video assessments. We adjudicated mRS in 538 of 563 (96%) study visits.
Methods
Sample Size SimulationsWe performed simulations to demonstrate the effect of increasing mRS reliability and using multiple observers to assign mRS scores. We generated power estimates from simulated mRS studies under various combinations of sample size (N), effect size (δ), reliability (unweighted κ...