T raumatic skeletal injuries are a leading source of consultation in emergency departments, with an annual incidence reported to be as high as 1.3% in the United States (1) and 0.32% in China (2). Radiography is the first-line imaging modality for the diagnosis of these lesions and the most used imaging modality worldwide (3-5). The reading of trauma radiographs is a demanding task that requires radiologic expertise, and there is a lack of radiologists (6). Consequently, emergency physicians are required to make patient treatment decisions before the availability of a radiologist's report, with a risk of interpretation error (7-9). Missed fractures, a preventable cause of morbidity (10), represent up to 80% of emergency department diagnostic errors (11). In American medical-legal claims, extremity fractures are the second most frequently missed diagnosis leading to a claim, after breast cancer (12). Assisting physicians in detecting and localizing fractures on plain radiographs could therefore reduce error rates.Computer-aided detection software has been developed for more than 20 years to provide decision support to radiologists, especially for screening breast cancer on mammograms (13) and lung nodules on CT scans (14). However, computer-aided detection systems have a high false-positive rate, which has limited their acceptance (13). Similar technologies have been unsuccessfully investigated for fracture detection, potentially because of Background: The interpretation of radiographs suffers from an ever-increasing workload in emergency and radiology departments, while missed fractures represent up to 80% of diagnostic errors in the emergency department.Purpose: To assess the performance of an artificial intelligence (AI) system designed to aid radiologists and emergency physicians in the detection and localization of appendicular skeletal fractures.
Materials and Methods:The AI system was previously trained on 60 170 radiographs obtained in patients with trauma. The radiographs were randomly split into 70% training, 10% validation, and 20% test sets. Between 2016 and 2018, 600 adult patients in whom multiview radiographs had been obtained after a recent trauma, with or without one or more fractures of shoulder, arm, hand, pelvis, leg, and foot, were retrospectively included from 17 French medical centers. Radiographs with quality precluding human interpretation or containing only obvious fractures were excluded. Six radiologists and six emergency physicians were asked to detect and localize fractures with (n = 300) and fractures without (n = 300) the aid of software highlighting boxes around AIdetected fractures. Aided and unaided sensitivity, specificity, and reading times were compared by means of paired Student t tests after averaging of performances of each reader.Results: A total of 600 patients (mean age 6 standard deviation, 57 years 6 22; 358 women) were included. The AI aid improved the sensitivity of physicians by 8.7% (95% CI: 3.1, 14.2; P = .003 for superiority) and the specificity by 4.1% (95% CI: ...