Nowadays, many industries use robots and cameras in tandem to detect specific objects and perform specific tasks. However, misdetection can occur due to inconsistencies in lighting, background, and environment. In order to address the aforementioned issues, this study proposes using a dual arm six-degree-of-freedom (6-DoF) collaborative robot, ABB YuMi, and red, green, blue-depth (RGB-D) camera with YOLOv5 in a pick-and-place application. In order to prepare the dataset, the images are collected and labeled. The dataset has been trained with the YOLOv5 machine learning algorithm. It has taken on the role of weight for real-time detection. When RGB images from a camera are sent to YOLOv5, data pertaining to the bottle’s position x-y and color are extracted from the depth and color images. The position of the robot is used to control its movement. There are three parts to the experiment. To begin, YOLOv5 is tested with and without trained images. Second, YOLOv5 is tested with real-time camera images. Finally, we assume that YOLOv5 has perfect detection and grasping ability. The results were 95, 90, and 90%.