Video summarization is the process of creating a shortened version of a longer video while retaining its essential content and meaning. It entails automatically identifying the most important parts of the video and selecting the relevant frames, shots, or scenes that best represent the original video's content. Video summarization entails complex image signal analysis and processing to extract the most important frames or shots from a video while discarding redundant or less informative ones. Several stages of analysis and processing are typically involved in the process, which may include video segmentation, feature extraction, frame selection, classification, and quality assessment. A variety of algorithms and system models are available for this task. Classification architectures such as convolutional neural networks, recurrent neural networks, and others are used to categorize video frames as redundant or non-redundant. This article provides a categorization and analysis of video summarization methodologies, with a focus on methods from the real-time video summarizing (RVS) domain. The current study will aid in laying the groundwork for future research and investigating potential research avenues by combining key research findings and data for quick reference. Video summarization has been shown to be useful in a variety of real-world contexts in smart cities, such as detecting anomalies in a video surveillance system. To address this issue, research studies can be conducted to evaluate and compare different video summarization algorithms in terms of their effectiveness, efficiency, and suitability for various applications. These studies can use benchmark datasets and standardized evaluation metrics to provide objective and quantitative comparisons of different algorithms. Based on the findings of these studies, researchers and multimedia system designers can make informed decisions about which algorithmic combination will work best for their application.