Objectives: The current score for primary graft dysfunction after lung transplantation relies heavily on chest radiographs, and radiologic judgment can make the difference between the lowest (primary graft dysfunction 0) and the highest (primary graft dysfunction 3) grade. This study aimed to evaluate interobserver variability of the scoring of postoperative chest radiographs and its impact on primary graft dysfunction grades in a large single-center cohort.Methods: We retrospectively analyzed 497 lung transplantations performed between January 2010 and July 2016 at the Medical University of Vienna. Five trained thoracic radiologists were asked to independently examine postoperative chest radiographs performed at 0 to 6 hours, 24 hours, 48 hours, and 72 hours after arrival at the intensive care unit. Interobserver variability was calculated using Fleiss' kappa (k) statistics.Results: A total of 1988 chest radiographs were evaluated. Consensus among all 5 radiologists was found in only 826 cases (43.0%). At 0 to 6 hours and 24 hours, only a moderate agreement was found among the 5 radiologists (k ¼ 0.456 and 0.456, respectively), and agreement was even worse at 48 and 72 hours (k ¼ 0.405 and k ¼ 0.409). On the basis of this high interobserver variability, best and worst case scenarios were calculated leading to primary graft dysfunction 3 rates of 8.4% versus 28.4% at 0 to 6 hours, 1.8% versus 4.8% at 24 hours, 2.0% versus 5.3% at 48 hours, and 0.2% versus 3.1% at 72 hours. A high recipient body mass index and size-reduced transplants were found to be factors associated with higher rates of interobserver variability.
Conclusions:The substantial interobserver variability found in this retrospective analysis underlines the difficulty to adequately grade post-transplant organ function. Future revisions of the primary graft dysfunction grading should take this problem into consideration.