Video object detection is more challenging than image object detection because of the deteriorated frame quality. To enhance the feature representation, state-of-the-art methods propagate temporal information into the deteriorated frame by aligning and aggregating entire feature maps from multiple nearby frames. However, restricted by feature map's low storage-efficiency and vulnerable contentaddress allocation, long-term temporal information is not fully stressed by these methods. In this work, we propose the first object guided external memory network for online video object detection. Storage-efficiency is handled by object guided hard-attention to selectively store valuable features, and long-term information is protected when stored in an addressable external data matrix. A set of read/write operations are designed to accurately propagate/allocate and delete multi-level memory feature under object guidance. We evaluate our method on the ImageNet VID dataset and achieve state-of-the-art performance as well as good speedaccuracy tradeoff. Furthermore, by visualizing the external memory, we show the detailed object-level reasoning process across frames.
With the rapid growth of video data, video summarization technique plays a key role in reducing people's efforts to explore the content of videos by generating concise but informative summaries. Though supervised video summarization approaches have been well studied and achieved state-of-the-art performance, unsupervised methods are still highly demanded due to the intrinsic difficulty of obtaining high-quality annotations. In this paper, we propose a novel yet simple unsupervised video summarization method with attentive conditional Generative Adversarial Networks (GANs). Firstly, we build our framework upon Generative Adversarial Networks in an unsupervised manner. Specifically, the generator produces high-level weighted frame features and predicts frame-level importance scores, while the discriminator tries to distinguish between weighted frame features and raw frame features. Furthermore, we utilize a conditional feature selector to guide GAN model to focus on more important temporal regions of the whole video frames. Secondly, we are the first to introduce the frame-level multi-head self-attention for video summarization, which learns long-range temporal dependencies along the whole video sequence and overcomes the local constraints of recurrent units, e.g., LSTMs. Extensive evaluations on two datasets, SumMe and TVSum, show that our proposed framework surpasses state-of-the-art unsupervised methods by a large margin, and even outperforms most of the supervised methods. Additionally, we also conduct the ablation study to unveil the influence of each component and parameter settings in our framework. CCS CONCEPTS • Computing methodologies → Artificial intelligence.
This paper tackles the task of online video object segmentation with weak supervision, i.e., labeling the target object and background with pixel-level accuracy in unconstrained videos, given only one bounding box information in the first frame. We present a novel tracking-assisted visual object segmentation framework to achieve this. On the one hand, initialized with a given bounding box in the first frame, the auxiliary object tracking module guides the segmentation module frame by frame by providing motion and region information, which is usually missing in semi-supervised methods. Moreover, compared with the unsupervised approach, our approach with such minimum supervision can focus on the target object without bringing unrelated objects into the final results. On the other hand, the video object segmentation module also improves the robustness of the visual object tracking module by pixel-level localization and objectness information. Thus, segmentation and tracking in our framework can mutually help each other in an online manner. To verify the generality and effectiveness of the proposed framework, we evaluate our weakly supervised method on two cross-domain datasets, i.e., the DAVIS and VOT2016 datasets, with the same configuration and parameter setting. Experimental results show the top performance of our method, which is even better than the leading semi-supervised methods. Furthermore, we conduct the extensive ablation study on our approach to investigate the influence of each component and main parameters.
For the pursuit of ubiquitous computing, distributed computing systems containing the cloud, edge devices, and Internet-of-Things devices are highly demanded. However, existing distributed frameworks do not tailor for the fast development of Deep Neural Network (DNN), which is the key technique behind many intelligent applications nowadays. Based on prior exploration on distributed deep neural networks (DDNN), we propose Heterogeneous Distributed Deep Neural Network (HDDNN) over the distributed hierarchy, targeting at ubiquitous intelligent computing. While being able to support basic functionalities of DNNs, our framework is optimized for various types of heterogeneity, including heterogeneous computing nodes, heterogeneous neural networks, and heterogeneous system tasks. Besides, our framework features parallel computing, privacy protection and robustness, with other consideration for the combination of heterogeneous distributed system and DNN. Extensive experiments demonstrate that our framework is capable of utilizing hierarchical distributed system better for DNN and tailoring DNN for real-world distributed system properly, which is with low response time, high performance, and better user experience.Manuscript received xx, xx; revised xx, xx.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.