To facilitate referential communication between humans and robots and mediate their differences in representing the shared environment, we are exploring embodied collaborative models for referring expression generation (REG). Instead of a single minimum description to describe a target object, episodes of expressions are generated based on human feedback during human-robot interaction. We particularly investigate the role of embodiment such as robot gesture behaviors (i.e., pointing to an object) and human's gaze feedback (i.e., looking at a particular object) in the collaborative process. This paper examines different strategies of incorporating embodiment and collaboration in REG and discusses their possibilities and challenges in enabling human-robot referential communication.