In automatic milking systems (AMSs), the detection of clinical mastitis (CM) and the subsequent separation of abnormal milk should be reliably performed by commercial AMSs. Therefore, the objectives of this cross-sectional study were (1) to determine the sensitivity (SN) and specificity (SP) of CM detection of AMS by the four most common manufacturers in Bavarian dairy farms, and (2) to identify routinely collected cow data (AMS and monthly test day data of the regional Dairy Herd Improvement Association (DHIA)) that could improve the SN and SP of clinical mastitis detection. Bavarian dairy farms with AMS from the manufacturers DeLaval, GEA Farm Technologies, Lely, and Lemmer-Fullwood were recruited with the aim of sampling at least 40 cows with clinical mastitis per AMS manufacturer in addition to clinically healthy ones. During a single farm visit, cow-level milking information was first electronically extracted from each AMS and then all lactating cows examined for their udder health status in the barn. Clinical mastitis was defined as at least the presence of visibly abnormal milk. In addition, available DHIA test results from the previous six months were collected. None of the manufacturers provided a definition for clinical mastitis (i.e., visually abnormal milk), therefore, the SN and SP of AMS warning lists for udder health were assessed for each manufacturer individually, based on the clinical evaluation results. Generalized linear mixed models (GLMMs) with herd as random effect were used to determine the potential influence of routinely recorded parameters on SN and SP. A total of 7411 cows on 114 farms were assessed; of these, 7096 cows could be matched to AMS data and were included in the analysis. The prevalence of clinical mastitis was 3.4% (239 cows). When considering the 95% confidence interval (95% CI), all but one manufacturer achieved the minimum SN limit of >80%: DeLaval (SN: 61.4% (95% CI: 49.0%–72.8%)), GEA (75.9% (62.4%–86.5%)), Lely (78.2% (67.4%–86.8%)), and Lemmer-Fullwood (67.6% (50.2%–82.0%)). However, none of the evaluated AMSs achieved the minimum SP limit of 99%: DeLaval (SP: 89.3% (95% CI: 87.7%–90.7%)), GEA (79.2% (77.1%–81.2%)), Lely (86.2% (84.6%–87.7%)), and Lemmer-Fullwood (92.2% (90.8%–93.5%)). All AMS manufacturers’ robots showed an association of SP with cow classification based on somatic cell count (SCC) measurement from the last two DHIA test results: cows that were above the threshold of 100,000 cells/mL for subclinical mastitis on both test days had lower chances of being classified as healthy by the AMS compared to cows that were below the threshold. In conclusion, the detection of clinical mastitis cases was satisfactory across AMS manufacturers. However, the low SP will lead to unnecessarily discarded milk and increased workload to assess potentially false-positive mastitis cases. Based on the results of our study, farmers must evaluate all available data (test day data, AMS data, and daily assessment of their cows in the barn) to make decisions about individual cows and to ultimately ensure animal welfare, food quality, and the economic viability of their farm.