Generating temporal action proposals remains a very challenging problem, where the main issue lies in predicting precise temporal proposal boundaries and reliable action confidence in long and untrimmed real-world videos. In this paper, we propose an efficient and unified framework to generate temporal action proposals named Dense Boundary Generator (DBG), which draws inspiration from boundary-sensitive methods and implements boundary classification and action completeness regression for densely distributed proposals. In particular, the DBG consists of two modules: Temporal boundary classification (TBC) and Action-aware completeness regression (ACR). The TBC aims to provide two temporal boundary confidence maps by low-level two-stream features, while the ACR is designed to generate an action completeness score map by high-level action-aware features. Moreover, we introduce a dual stream BaseNet (DSB) to encode RGB and optical flow information, which helps to capture discriminative boundary and actionness features. Extensive experiments on popular benchmarks ActivityNet-1.3 and THUMOS14 demonstrate the superiority of DBG over the state-of-the-art proposal generator (e.g., MGG and BMN).
For video super-resolution, current state-of-the-art approaches either process multiple low-resolution (LR) frames to produce each output high-resolution (HR) frame separately in a sliding window fashion or recurrently exploit the previously estimated HR frames to super-resolve the following frame. The main weaknesses of these approaches are: 1) separately generating each output frame may obtain high-quality HR estimates while resulting in unsatisfactory flickering artifacts, and 2) combining previously generated HR frames can produce temporally consistent results in the case of short information flow, but it will cause significant jitter and jagged artifacts because the previous super-resolving errors are constantly accumulated to the subsequent frames. In this paper, we propose a fully end-to-end trainable frame and feature-context video super-resolution (FFCVSR) network that consists of two key sub-networks: local network and context network, where the first one explicitly utilizes a sequence of consecutive LR frames to generate local feature and local SR frame, and the other combines the outputs of local network and the previously estimated HR frames and features to super-resolve the subsequent frame. Our approach takes full advantage of the inter-frame information from multiple LR frames and the context information from previously predicted HR frames, producing temporally consistent highquality results while maintaining real-time speed by directly reusing previous features and frames. Extensive evaluations and comparisons demonstrate that our approach produces state-of-the-art results on a standard benchmark dataset, with advantages in terms of accuracy, efficiency, and visual quality over the existing approaches.
Supervised deep learning techniques have achieved great success in various fields due to getting rid of the limitation of handcrafted representations. However, most previous image retargeting algorithms still employ fixed design principles such as using gradient map or handcrafted features to compute saliency map, which inevitably restricts its generality. Deep learning techniques may help to address this issue, but the challenging problem is that we need to build a large-scale image retargeting dataset for the training of deep retargeting models. However, building such a dataset requires huge human efforts.In this paper, we propose a novel deep cyclic image retargeting approach, called Cycle-IR, to firstly implement image retargeting with a single deep model, without relying on any explicit user annotations. Our idea is built on the reverse mapping from the retargeted images to the given input images. If the retargeted image has serious distortion or excessive loss of important visual information, the reverse mapping is unlikely to restore the input image well. We constrain this forward-reverse consistency by introducing a cyclic perception coherence loss. In addition, we propose a simple yet effective image retargeting network (IRNet) to implement the image retargeting process. Our IRNet contains a spatial and channel attention layer, which is able to discriminate visually important regions of input images effectively, especially in cluttered images. Given arbitrary sizes of input images and desired aspect ratios, our Cycle-IR can produce visually pleasing target images directly. Extensive experiments on the standard RetargetMe dataset show the superiority of our Cycle-IR. In addition, our Cycle-IR outperforms the Multiop method and obtains the best result in the user study. Code is available at https://github.com/mintanwei/Cycle-IR.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.