This paper addresses real-time moving object detection with high accuracy in high-resolution video frames. A previously developed framework for moving object detection is modified to enable real-time processing of high-resolution images. First, a computationally efficient method is employed, which detects moving regions on a resized image while maintaining moving regions on the original image with mapping coordinates. Second, a light backbone deep neural network in place of a more complex one is utilized. Third, the focal loss function is employed to alleviate the imbalance between positive and negative samples. The results of the extensive experimentations conducted indicate that the modified framework developed in this paper achieves a processing rate of 21 frames per second with 86.15% accuracy on the dataset SimitMovingDataset, which contains high-resolution images of the size 1920 × 1080.
In large field of view for open country, the real-time detection and identification of moving objects with high accuracy is a very challenging work due to the excessive amount of data. This paper proposes a novel framework that consists of a coarse-grained detection as well as a fine-grained detection. To solve the problem of noise-induced object fracture during the coarse-grained detection process, we present a low-complexity connected region detection algorithm to extract moving regions. Furthermore, in the finegrained detection, Deep Convolution Neural Networks are leveraged to detect more precise coordinates and identify the category of objects. To the best of our knowledge, this is the first work that proposes a coarse-tofine grained framework to detect moving objects on high-resolution scenes. Experimental results show that the proposed framework can robustly work on the high resolution video frames (1920*1080p) with complex situations more fastly and accurately over existing methods. INDEX TERMS Connected region detection, deep convolution neural networks, foreground extraction, high resolution, moving object detection.
Although there are well established object detection methods based on static images, their application to video data on a frame by frame basis faces two shortcomings: (i) lack of computational efficiency due to redundancy across image frames or by not using a temporal and spatial correlation of features across image frames, and (ii) lack of robustness to real-world conditions such as motion blur and occlusion. Since the introduction of the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2015, a growing number of methods have appeared in the literature on video object detection, many of which have utilized deep learning models. The aim of this paper is to provide a review of these papers on video object detection. An overview of the existing datasets for video object detection together with commonly used evaluation metrics is first presented. Video object detection methods are then categorized and a description of each of them is stated. Two comparison tables are provided to see their differences in terms of both accuracy and computational efficiency. Finally, some future trends in video object detection to address the challenges involved are noted.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.