This study describes the optimization of hand gesture recognition on Raspberry Pi 4 technology has advanced over the past years, some computers are now able to compute much more complex problems like real-time object detection. But for small devices, optimization is required to run in real-time with acceptable performance in terms of latency and low-cost effect on accuracy. Low latency is a requirement for most technology, especially when integrating real-time object detection as input into Self-Service Technology on Raspberry Pi for the store. This research was conducted on 288 pictures with six types of chosen hand gestures for command inputs that have been configured in the Self-Service Technology as a training dataset. In the experiment carried out with 5 CNN object detection models were used, namely YOLOv3-Tiny-PRN, YOLOv4-Tiny, MobileNetV2-Yolov3-NANO, YOLO-Fastest-1.1, and YOLO-Fastest-1.1-XL. Based on the experiment after optimization, the FPS and inference time metrics have improved performance. The performance improves due to a gained average value of FPS by 3 FPS and a reduced average value of inference time by 119,260 ms. But such an improvement also comes with a reduction in overall accuracy. The rest of the parameters have a reduced score on Precision, Recall, F1-Score, and some for IoU. Only YOLO-Fastest-1.1-XL have an improved value of IoU by about 0.58%. Some improvements in the CNN and dataset might improve the performance even more without sacrificing too much on the accuracy, but it's most likely suitable for another research as a continuation of this topic.