ObjectiveConvolutional neural networks (CNNs) have revolutionized medical image segmentation in recent years. This scoping review aimed to carry out a comprehensive review of the literature describing automated image segmentation of the middle ear using CNNs from computed tomography (CT) scans.Data SourcesA comprehensive literature search, generated jointly with a medical librarian, was performed on Medline, Embase, Scopus, Web of Science, and Cochrane, using Medical Subject Heading terms and keywords. Databases were searched from inception to July 2023. Reference lists of included papers were also screened.Review MethodsTen studies were included for analysis, which contained a total of 866 scans which were used in model training/testing. Thirteen different architectures were described to perform automated segmentation. The best Dice similarity coefficient (DSC) for the entire ossicular chain was 0.87 using ResNet. The highest DSC for any structure was the incus using 3D‐V‐Net at 0.93. The most difficult structure to segment was the stapes, with the highest DSC of 0.84 using 3D‐V‐Net.ConclusionsNumerous architectures have demonstrated good performance in segmenting the middle ear using CNNs. To overcome some of the difficulties in segmenting the stapes, we recommend the development of an architecture trained on cone beam CTs to provide improved spatial resolution to assist with delineating the smallest ossicle.Implications for PracticeThis has clinical applications for preoperative planning, diagnosis, and simulation.