The emotion expressions of social robots are some of the most important developments in recent studies on human–robot interactions (HRIs). Several research studies have been conducted to assess effective factors to improve the quality of emotion expression of the robots. In this study, we examined the effects of a robot’s vertical oscillation and transition on the quality of its emotion expression, where the former indicates the periodic up/down movement of the body of the robot, while the latter indicates a one-time up or down movement. Short-term and long-term emotion expressions of the robot were studied independently for the four basic emotions described in the circumplex model of emotions: joy, anger, sadness, and relief. We designed an experiment with an adequate statistical power and minimum sample size of human subjects based on a priori power analysis. Human subjects were asked to evaluate the robot’s emotion expressions by watching its video with/without vertical movement. The results of the experiment showed that for the long-term emotions, the speed of vertical oscillation corresponded to the degree of arousal of the emotion expression as noted in the circumplex model; this indicated that fast oscillations improved the emotion expression with a higher degree of arousal, such as joy and anger, while slow or no oscillations were more suited to emotions with a lower degree of arousal, such as sadness and relief. For the short-term emotions, the direction of the vertical transition corresponded to the degree of valence for most of the expressed emotions, while the speed of vertical oscillation reflected the degree of arousal. The findings of this study can be adopted in the development of conversational robots to enhance their emotion expression.