Objective
To determine whether improvement of more than 20% in core set parameters should be required before patients are characterized as imporved in rheumatoid arthritis (RA) clinical trials.
Methods
Data from 6 RA trials were reanalysed to evaluate the discriminant validity (ability to differentiate active treatment from control) of 4 proposed definitions of improvement: the current American College of Rheumatology (ACR) definition (a 20% threshold for core set parameters [ACR 20]), a 50% threshold (ACR 50), a 70% threshold (ACR 70), and an ordinal definiton in which a patient could be classified in any of 3 categories (unimproved, ACR 20, or ACR 50). To evaluate the discriminant validity of these 4 definitions of improvement, we characterized each patient in each trial as improved or not, based on each definition, and computed a chi‐square value differentiating the active treatment group from the control group, with the corresponding P value.
Results
With an increase in the threshold from improvement, the percentage of placebo‐treated patients who were classified as experiencing response dropped dramatically in all trials, as did the percentage of patients receiving active therapy (second‐line drug, combination therapy, tumor necrosis factor p75‐Fc fusion protein. Generally, the drop in active treatment response rates was greater than the drop in placebo response rates, leaving the difference between the 2 groups less at the higher thresholds. Therefore, chi‐square values fell as the threshold for response was raised. The ordinal definition of improvement yielded chi‐square values similar to those obtained using ACR 20 alone.
Conclusion
Adopting a definition of efficacy in RA trials that requires 50% or 70% improvement in core set parameters would likely compromise statistical power and make it more difficult to distinguish between 2 treatments with different efficacy. ACR 20 should continue to be the primary measure of efficacy in RA trials, with higher thresholds for improvement being determined and reported as secondary efficacy measures.