Despite advancements in video-based behaviour analysis and detection models for various species, existing methods are suboptimal to detect macaques in complex laboratory environments. To address this gap, we present MacqD, a modified Mask R-CNN model incorporating a SWIN transformer backbone for enhanced attention-based feature extraction. MacqD robustly detects macaques in their home-cage under challenging scenarios, including occlusions, glass reflections, and overexposure to light. To evaluate MacqD and compare its performance against pre-existing macaque detection models, we collected and analysed video frames from 20 caged rhesus macaques at Newcastle University, UK. Our results demonstrate MacqD's superiority, achieving a median F1-score of 99% for frames with a single macaque in the focal cage (surpassing the next-best model by 21%) and 90% for frames with two macaques. Generalisation tests on frames from a different set of macaques from the same animal facility yielded median F1-scores of 95% for frames with a single macaque (surpassing the next-best model by 15%) and 81% for frames with two macaques (surpassing the alternative approach by 39% ). Finally, MacqD was applied to videos of paired macaques from another facility and resulted in F1-score of 90%, reflecting its strong generalisation capacity. This study highlights MacqD's effectiveness in accurately detecting macaques across diverse settings.