In the context of the industrialization era, robots are gradually replacing workers in some production stages. There is an irreversible trend toward incorporating image processing techniques in the realm of robot control. In recent years, vision-based techniques have achieved significant milestones. However, most of these techniques require complex setups, specialized cameras, and skilled operators for burden computation. This paper presents an efficient vision-based solution for object detection and grasping in indoor environments. The framework of the system, encompassing geometrical constraints, robot control theories, and the hardware platform, is described. The proposed method, covering calibration to visual estimation, is detailed for handling the detection and grasping task. Our approach's efficiency, feasibility, and applicability are evident from the results of both theoretical simulations and experiments.