This paper investigates the motion forms of robots generated by the Q-Learning algorithm during the learning process. We analyzed the manner in which a caterpillar robot, which performs looping motions using two actuators, acquires advance actions by focusing on the process. By observing a series of processes, we confirmed that various motion forms appeared or disappeared as a result of their interactions with the learning process and approach an optimum motion form. In most algorithms, such motion forms cannot appear in the learning process because its framework is almost predetermined by the teacher data, and the cost functions for learning cannot be usually considered as a continuous process. The characteristics of reinforcement learning are very interesting from the viewpoint of biological evolution. This paper describes the effects of the interaction between the robot kinematics and the environment as a direct result of changing the environment.In addition, this study challenged the acquisition of two-dimensional motions with a starfish robot having four actuators. The result demonstrates that the robot can obtain a reasonable motion from the complicated relationships with the environment by skillfully employing its structure. Moreover, this paper implies that the reward manipulation may give a new insight for the learning process by the investigations performed in this study. This paper examines the possibility of the reward combinations for generating arbitrary motions.