Background
Medical consumers are increasingly requesting methods to discriminate among the results of different providers. Standards for appropriate modeling, risk adjustment, and evaluation (“scorecarding”) in this setting are not well developed, although such evaluation is being performed by the medical insurance industry and by several states in the United States. Our objectives were to develop and examine clinically meaningful methodology for assessing the operator-specific results for percutaneous coronary revascularization.
Methods and Results
From a multicenter database of patients treated since January 1, 1990, we used training and validation samples (n=4860) to develop several models for risk adjustment and applied them to 38 providers performing 25 to 523 procedures in the database. Models were developed using multivariable logistic regression techniques for combinations of the end points of death, myocardial infarction, bypass surgery, and procedural success. Models were evaluated for predictive accuracy by using receiver operating characteristic (ROC) analysis, for the capacity to discriminate between superior and inferior provider outcomes, and for subjectivity and concordance. Major complications occurred in 3.6% of patients. The area under the ROC curve (with perfect discriminatory accuracy, area=1.0; with no apparent accuracy, area=0.5) in the validation sample, and frequency of identification of operators with outcomes outside the 95% CI for the outcome in question for the models were for death, 0.85 and 7.9%; for death, Q-wave infarction, and bypass surgery, 0.77 and 13.2%; for death, all infarction, and bypass surgery, 0.66 and 10.5%; and for procedural success, 0.76 and 23.7%. For the models as a group, identification of outliers was inversely related to provider volume (
P
=.05). Models evaluating non–Q-wave infarction or requiring measurement of percent diameter stenosis were identified as being most susceptible to provider manipulation.
Conclusions
For percutaneous coronary revascularization, modeling to discriminate between provider outcomes is limited by the low incidence of major adverse events, subjectivity or susceptibility to manipulation of more frequently occurring adverse events, the generally modest predictive capacity of the models, and the low volume of individual provider treatments. Modeling will be most useful in the identification of providers with extremely poor outcomes and for discrimination between providers with very large procedural volume. Until improved understanding of the biological and mechanical correlates of major complications allows the development of more predictive models, interpretation of the results of scorecarding, particularly for low-volume providers, should be made with caution.