In recent years, many imaging systems have been developed to monitor the physiological and behavioral status of dairy cows. However, most of these systems do not have the ability to identify individual cows because the systems need to cooperate with radio frequency identification (RFID) to collect information about individual animals. The distance at which RFID can identify a target is limited, and matching the identified targets in a scenario of multitarget images is difficult. To solve the above problems, we constructed a cascaded method based on cascaded deep learning models, to detect and segment a cow collar ID tag in an image. First, EfficientDet-D4 was used to detect the ID tag area of the image, and then, YOLACT++ was used to segment the area of the tag to realize the accurate segmentation of the ID tag when the collar area accounts for a small proportion of the image. In total, 938 and 406 images of cows with collar ID tags, which were collected at Coldstream Research Dairy Farm, University of Kentucky, USA, in August 2016, were used to train and test the two models, respectively. The results showed that the average precision of the EfficientDet-D4 model reached 96.5% when the intersection over union (IoU) was set to 0.5, and the average precision of the YOLACT++ model reached 100% when the IoU was set to 0.75. The overall accuracy of the cascaded model was 96.5%, and the processing time of a single frame image was 1.92 s. The performance of the cascaded model proposed in this paper is better than that of the common instance segmentation models, and it is robust to changes in brightness, deformation, and interference around the tag.