The work describes a module that has been implemented for being included in a social humanoid robot architecture, in particular a storyteller robot, named NarRob. This module gives a humanoid robot the capability of mimicking and acquiring the motion of a human user in real-time. This allows the robot to increase the population of his dataset of gestures. The module relies on a Kinect based acquisition setup. The gestures are acquired by observing the typical gesture displayed by humans. The movements are then annotated by several evaluators according to their particular meaning, and they are organized considering a specific typology in the knowledge base of the robot. The properly annotated gestures are then used to enrich the narration of the stories. During the narration, the robot semantically analyses the textual content of the story in order to detect meaningful terms in the sentences and emotions that can be expressed. This analysis drives the choice of the gesture that accompanies the sentences when the story is read.