Abstract-In the automatic analysis of a tennis game, it is important to detect some anomalous match events, such as "fault serve" and "ball out", as these events are crucial in understanding the progress of a game. Audio information can be used to detect these events, but it is unreliable, because of the acoustic mismatch between the training and the test data and interfering noise caused by spectator applause, players' yells etc. We present a framework to detect these events in which audio and visual information are used both separately and in combination. We accumulate audio evidence for anomalous events that is based on audio event classification and pitch estimation, and combine this with video evidence based on scene segmentation (itself based on audio ball-hit detection) and estimation of the ball's trajectory. To evaluate the effectiveness and robustness of our approach, we test it on three different tennis matches. Results show that our approach outperforms several audio-based baselines: the best performance is an F -score of 61% on the test data.