Consumer RGB‐D and binocular stereo cameras were applied to fruit detection and localization. However, few studies are documented on performance comparison of newly released cameras under same scene in complex orchard. This study evaluates performance of consumer RGB‐D and binocular stereo cameras based on YOLOv5x for kiwifruit detection and localization and selection of optimal one with better application in complex orchard environment. Firstly, Azure Kinect, RealSense D435, and ZED 2i cameras were employed to capture images of kiwifruit canopies. Subsequently, YOLOv5x was applied to train and detect kiwifruits and calyxes in the images. Meanwhile, an overlap‐partitioning detection strategy was applied on kiwifruit and calyx detection. Additionally, spatial coordinate transformation was performed by integrating camera's extrinsic parameters and depth map generated by each camera. Finally, three‐dimensional coordinates of calyxes were calculated and compared with ground truth, followed by localization accuracy of calyxes were analyzed. Results show that YOLOv5x obtained mean average precision of 93.2%, 91.3%, and 95.8% for three cameras on kiwifruit and calyx detection, respectively. Overlap‐partitioning detection strategy improved the calyx detection and significantly increased average precision by 13.00%, 16.30%, and 7.70%, respectively. The mean absolute deviation of calyx coordinates on Y‐axis is relatively high for ZED 2i at 8.44 mm in comparison of 6.67 mm for Azure Kinect, while RealSense D435 achieved minimum of 10.42 mm on X‐axis and 18.33 mm on Z‐axis. Average spatial localization speed of calyxes in one image was 0.164 s, 0.037 s, and 0.062 s for Azure Kinect, RealSense D435, and ZED 2i, respectively. These results indicate the excellent performance of RealSense D435 than Azure Kinect and ZED 2i in kiwifruit orchard, which could be a valuable reference for other orchards to select a camera with high precision localization capacity.