Due to the amount of transmitted data and the security of personal or private information in wireless communication, there are cases where the information for a multimedia service should be directly transferred from the user’s device to the cloud server without the captured original images. This paper proposes a new method to generate 3D (dimensional) keypoints based on a user’s mobile device with a commercial RGB camera in a distributed computing environment such as a cloud server. The images are captured with a moving camera and 2D keypoints are extracted from them. After executing feature extraction between continuous frames, disparities are calculated between frames using the relationships between matched keypoints. The physical distance of the baseline is estimated by using the motion information of the camera, and the actual distance is calculated by using the calculated disparity and the estimated baseline. Finally, 3D keypoints are generated by adding the extracted 2D keypoints to the calculated distance. A keypoint-based scene change method is proposed as well. Due to the existing similarity between continuous frames captured from a camera, not all 3D keypoints are transferred and stored, only the new ones. Compared with the ground truth of the TUM dataset, the average error of the estimated 3D keypoints was measured as 5.98 mm, which shows that the proposed method has relatively good performance considering that it uses a commercial RGB camera on a mobile device. Furthermore, the transferred 3D keypoints were decreased to about 73.6%.