The excellent performance of fruit and vegetable picking robots is usually contributed by the reasonable structure of end-effector and recognition–localization methods with high accuracy. As a result, efforts are focused on two aspects, and diverse structures of end-effector, target recognition methods as well as their combinations are yielded continuously. A good understanding for the working principle, advantages, limitations, and the adaptability in respective fields is helpful to design picking robots. Therefore, depending on different grasping ways, separating methods, structures, materials, and driving modes, main characteristics existing in traditional schemes will be depicted firstly. According to technical routes, advantages, potential applications, and challenges, underactuated manipulators and soft manipulators representing future development are then summarized systematically. Secondly, partial recognition and localization methods are also demonstrated. Specifically, current recognition manners adopting the single-feature, multi-feature fusion and deep learning are explained in view of their advantages, limitations, and successful instances. In the field of 3D localization, active vision based on the structured light, laser scanning, time of flight, and radar is reflected through the respective applications. Also, another 3D localization method called passive vision is also evaluated by advantages, limitations, the degree of automation, reconstruction effects, and the application scenario, such as monocular vision, binocular vision, and multiocular vision. Finally portrayed from structural development, recognition, and localization methods, it is possible to develop future end-effectors for fruit and vegetable picking robots with superior characteristics containing the less driving element, rigid–flexible–bionic coupling soft manipulators, simple control program, high efficiency, low damage, low cost, high versatility, and high recognition accuracy in all-season picking tasks.