For reaching to and grasping of an object, visual information about the object must be transformed into motor or postural commands for the arm and hand. In this paper, we present a robot model for visually guided reaching and grasping. The model mimics two alternative processing pathways for grasping, which are also likely to coexist in the human brain. The first pathway directly uses the retinal activation to encode the target position. In the second pathway, a saccade controller makes the eyes (cameras) focus on the target, and the gaze direction is used instead as positional input. For both pathways, an arm controller transforms information on the target's position and orientation into an arm posture suitable for grasping. For the training of the saccade controller, we suggest a novel staged learning method which does not require a teacher that provides the necessary motor commands. The arm controller uses unsupervised learning: it is based on a density model of the sensor and the motor data. Using this density, a mapping is achieved by completing a partially given sensorimotor pattern. The controller can cope with the ambiguity in having a set of redundant arm postures for a given target. The combined model of saccade and arm controller was able to fixate and grasp an elongated object with arbitrary orientation and at arbitrary position on a table in 94% of trials.