Semantic video scene-understanding applications rely on object-camera motion recognition techniques for scene contextual movement representation. While existing machine learning-based methods perform efficiently, their primary limitation is to analyze motion patterns from normal frames only, neglecting the scene transition frames. This causes significant false alarms due to the undetected objectcamera motion patterns during scene transitions. In this paper, we propose a novel method for object and camera motion recognition of two consecutive scenes from their transition frames. First, our method detects cut transitions using principal component analysis (PCA) to segment the video into shots. Additionally, it eliminates large text transitions that are often falsely detected as cut transitions using structural similarity index measurement (SSIM) properties. Second, it selects candidate segments to localize normal and wipe transition frames using slope angle characteristics obtained from linear regression. Third, it extracts dense semantic spatial features at multi-scale using the modified DeepLabv3+ network to segment selected candidate frames into foreground, background, and wipe pixels. Finally, an optical flow algorithm-based temporal trajectory tracking model is applied on each segmented pixel to recognize the object, camera pan, zoom-in, and zoom-out motion patterns. We further remove falsely detected non-transition motion frames to improve wipe transition detection. The experimental results are obtained using the benchmark TRECVID and the multimedia datasets. The proposed method using pixel-level classification and temporal trajectory analysis achieved an average accuracy improvement of 9.28% for object-camera motion recognition, 3.75% for cut transition detection, and 3.01% for wipe transition detection.