Localizing and recognition of objects are critical problems for indoor manipulation tasks. This paper describes an algorithm based on computer vision and machine learning that does object detection and gripping tasks. Detection of objects is carried out using a combination of a camera and depth sensor using a Kinect v1 depth sensor. Moreover, machine learning algorithms (YOLO) are used for computer vision. The project presents a method that allows the Kinect sensor to detect objects' 3D location. At the same time, it is attached to any robotic arm base, allowing for a more versatile and compact solution to be used in stationary places using industrial robot arms or mobile robots. The results show an error of locating an object to be 5 mm. and more than 70% confidence in detecting objects correctly. There are many possibilities in which this project can be used, such as in industrial fields, to sort, load, and unload different kinds of objects based on their type, size, and shape. In agriculture fields, to collect or sort different kinds of fruits, in kitchens and cafes where sorting objects like cups, bottles, and cans can occur. Also, this project can be added to mobile robots to do indoor human services or collect trash from different places.