Video‐mediated communication systems are generally used to enable face‐to‐face‐like communication between people in distant locations. However, in terms of social telepresence, video communication systems are generally inferior to actual face‐to‐face communication, which is a long‐standing challenge for online communication systems. Multiple studies have reported that several specific components, such as the resolution of the transmitted video and the width of the visual field, affect social telepresence separately. However, methods have not been established to comprehensively evaluate the sense of social telepresence provided by video communication systems, that is, integrated multiple component systems with various specifications. In this study, employing a regression analysis based on subjective evaluation results, we establish a new method to predict the sense of social telepresence based on the specifications of some system components. Using this formulation, it is clear that realizing eye contact and life‐size scaling make relatively large contributions to enhancing social telepresence. Finally, we develop a prototype system realizing eye contact and life‐size scaling and perform additional subjective evaluations to reconfirm the effectiveness of these components to enhance the sense of social telepresence in real‐time communication.