ObjectiveTo develop a convolutional neural network‐based computer vision model to recognize and track 2 mastoidectomy surgical instruments—the drill and the suction‐irrigator—from intraoperative video recordings of mastoidectomies.Study DesignTechnological development and model validation.SettingAcademic center.MethodsTen 1‐minute videos of mastoidectomies done for cochlear implantation by varying levels of resident surgeons were collected. For each video, containing 900 frames, an open‐access computer vision annotation tool was used to annotate the drill and suction‐irrigator class images with bounding boxes. A mastoidectomy instrument tracking module, which extracts the center coordinates of bounding boxes, was developed using a feature pyramid network and layered with DETECTRON, an open‐access faster—region‐based convolutional neural network. Eight videos were used to train the model, and 2 videos were used for testing. Outcome measures included Intersection over Union (IoU) ratio, accuracy, and average precision.ResultsFor an IoU of 0.5, the mean average precision for the drill was 99% and 86% for the suction‐irrigator. The model proved capable of generating maps of drill and suction‐irrigator stroke direction and distance for the entirety of each video.ConclusionsThis computer vision model can identify and track the drill and suction‐irrigator from videos of intraoperative mastoidectomies performed by residents with excellent precision. It can now be employed to retrospectively study objective mastoidectomy measures of expert and resident surgeons, such as drill and suction‐irrigator stroke concentration, economy of motion, speed, and coordination, setting the stage for characterization of objective expectations for safe and efficient mastoidectomies.