Object Tracking (OT) on a Moving Camera socalled Moving Object Tracking (MOT) is extremely vital in Computer Vision. While other conventional tracking methods based on fixed camera can only track the objects in its range, a moving camera can tackle this issue by following the objects. Moreover, single tracker is used widely to track object but it is not effective due to the moving camera because the challenges such as sudden movements, blurring, pose variation. The paper proposes a method inherited by tracking by detection approach. It integrates a single tracker with object detection method. The proposed tracking system can track object efficiency and effectively because object detection method can be used to find the tracked object again if the single tracker loses track. Three main contributions are presented in the paper as follow. First, the proposed Unified Visual based-MOT system can do the tasks such as Localization, 3D Environment Reconstruction and Tracking based on Stereo Camera and Inertial Measurement Unit (IMU). Second, it takes into account camera motion and the moving objects to improve the precision rate in localization and tracking. Third, proposed tracking system based on integration of single tracker as Deep Particle Filter and Object Detection as Yolov3. The overall system is tested on the dataset KITTI 2012, and it has achieved a good accuracy rate in real time.
Recognizing human action is valuable for many real world applications such as video surveillance, human computer interaction, smart home and gaming. In this paper, we present a method of action recognition based on hypothesizing that the classification of action can be boosted by motion information using optical flow. Emergence of automatic RGBD video analysis, we propose fusing optical flow is extracted from both RGB and depth channels for action representation. Firstly, we extract optical flow from RGB and depth data. Secondly, motion descriptor with spatial pyramid is computed from histogram of optical flow of RGB and depth. Then, feature pooling technique is used in order to accumulate RGB and depth feature into set of feature vectors for each action. Finally, we use the Multiple Kernel Learning (MKL) technique at the kernel level for action classification from RGB and depth feature pooling. To demonstrate generalizability, our proposed method has been systematically evaluated on two benchmark datasets shown to be more effective and accurate for action recognition compared to the previous work. We obtain overall accuracies of: 97.5 % and 92.8 % with our proposed method on the 3D ActionPairs and MSR-Daily Activity 3D dataset, respectively.
Recently, the Microsoft Kinect sensor has provided the whole new type of data in computer vision, the depth information. The most important contribution of depth information is to overcome one of the hardest parts in visual information extraction, the segmentation process. Especially in human action recognition field, the depth data help reduce the noise and variance of background and illumination of the real world environment. But beside that, most of state-of-the-art approaches are still using the complex feature representation with quite long feature vectors and they lead to many other tasks to do to reduce the complexity of the whole system model. In this paper, we want to solve this problem using the Elliptical Density Shape (EDS) model that could provide the simplified geometric shape feature of any complex shape object through time sequences but still robust enough when applying in the recognition process.
Nowadays, digital video documents grow in both the number and the size of storing spaces. Therefore, it requires efficient management techniques and methods that allow retrieving video documents in an efficient way. This paper presents a method for automatic structural analysis of digital videos to generate the table of content and the index table of the given videos.These tables allow storing videos by a hierarchy of video, cluster of shots, shots, key-frames of shots, cluster of regions. Then a method for retrieving videos by using the input data as visual feature and semantic concepts is represented. The video retrieval is performed by two steps in off-line and on-line modes. The off-line step consists of decomposing a video sequence into elementary segments. Then, these elementary segments are classified by a hierarchical clustering algorithm. Finally, a table of content and an index table are generated for the given video sequence. The on-line step consists of retrieving videos based on a hierachical data base using the input data as video clip, shot ,key-frames, representatives of region's clusters, then the retrieved results are filtered by semantic concepts. The obtained results show that the proposed model is more efficient than the traditional systems which are only based on global, local visual features or keywords. Tóm tȃt. Hiê. n nay dũ. liê. u video số du. o. . c lu. u trũ. và phát triê ' n vó. i số lu. o. . ng ngày càng tȃng, do vâ. y dẫn dến mô. t nhu cầ u là cầ n có mô. t cách thú. c qua 'n lý hũ. u hiê. u ho. n dê ' phu. c vu. viê. c truy tìm thông tin và cách thú. c truy tìm. Trong bài báo này chúng tôi trình bày mô. t phu. o. ng pháp giúp phân tích tu. . dô. ng cấu trúc cu 'a video số nhȃm ta. o ra ba ' n mu. c lu. c và chı ' mu. c, giúp lu. u trũ. nô. i dung doa. n video số theo cấu trúc phân cấp: video, ló. p các doa. n co. so. ' , doa. n co. so. ' , khung hình chính, ló. p các vùng và truy vấn du. . a vào dȃ. c tru. ng thi. giác và ngũ. nghĩa. Bài toán du. o. . c tiếp câ. n bȃng viê. c dầ u tiên là phân tích tu. . dô. ng video số thành các doa. n co. so. ' , sau dó nhóm chúng la. i theo phu. o. ng pháp phân ló. p phân cấp và cuối cùng là rút go. n cấu trúc phân cấp dê ' ta. o ba ' ng mu. c lu. c và chı ' mu. c. Viê. c truy vấn du. o. . c thu. . c hiê. n du. . a trên cấu trúc phân cấp vó. i hai giai doa. n, trong giai doa. n dầ u, kết qua ' truy vấn du. . a vào dȃ. c tru. ng thi. giác, trong giai doa. n cuối, kết qua ' du. o. . c lo. c la. i du. . a vào ngũ. nghĩa hoȃ. c ngu. o. . c la. i. Kết qua ' thu. . c nghiê. m cho thấy phu. o. ng pháp này da. t kết qua ' cao ho. n so vó. i phu. o. ng pháp truy vấn chı ' du. . a vào dȃ. c tru. ng thi. giác toàn cu. c, cu. c bô. hoȃ. c ngũ. nghĩa.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.