Introduction: There is limited knowledge on pediatric chest radiograph (pCXR) interpretation skill among practicing physicians. We systematically determined baseline interpretation skill, the number of pCXR cases physicians required complete to achieve a performance benchmark, and which diagnoses posed the greatest diagnostic challenge. Methods: Physicians interpreted 434 pCXR cases via a web-based platform until they achieved a performance benchmark of 85% accuracy, sensitivity, and specificity. Interpretation difficulty scores for each case were derived by applying one-parameter item response theory to participant data. We compared interpretation difficulty scores across diagnostic categories and described the diagnoses of the 30% most difficult-to-interpret cases.Results: 240 physicians who practice in one of three geographic areas interpreted cases, yielding 56,833 pCXR case interpretations. The initial diagnostic performance (first 50 cases) of our participants demonstrated an accuracy of 68.9%, sensitivity of 69.4%, and a specificity of 68.4%. The median number of cases completed to achieve the performance benchmark was 102 (interquartile range 69, 176; min, max, 54, 431). Among the 30% most difficult-to-interpret cases, 39.2% were normal pCXR and 32.3% were cases of lobar pneumonia. Cases with a single trauma-related imaging finding, cardiac, hilar, and diaphragmatic pathologies were also among the most challenging. Discussion: At baseline, practicing physicians misdiagnosed about one-third of pCXR and there was up to an eight-fold difference between participants in number of cases completed to achieve the standardized performance benchmark. We also identified the diagnoses with the greatest potential for educational intervention.