Human-Robot Interaction (HRI) has challenges in investigation of a nonverbal and natural interaction. This study contributes to developing a gesture recognition system capable of recognizing the entire human upper body for HRI, which has never been done in previous research. Preprocessing is applied to improve image quality, reduce noise and highlight important features of each image, including color segmentation, thresholding and resizing. The hue, saturation, value (HSV) color segmentation is executed by utilizing blue color backdrop and additional lighting to deal with illumination issue. Then thresholding is performed to get a black and white image to distinguish between background and foreground. The resizing is completed to adjust the image to match the size expected by the model. The preprocessed data image is used as input for gesture recognition based on Convolutional Neural Network (CNN). This study recorded five gestures from five research subjects in difference gender and body posture with total of 450 images which divided into 380 and 70 images for training and testing respectively. Experiments that performed in an indoor environment showed that CNN achieved 92% of accuracy in the gesture recognition. It has lower level of accuracy compare to AlexNet model but with faster training computation time of 9 seconds. This result was obtained by testing the system over various distances. The optimal distance for a camera setting from user to interact with mobile robot by using gesture was 2.5 m. For future research, the proposed method will be improved and implemented for mobile robot motion control.