Up to now, the image-based inverse tone mapping (iTM) models have been widely investigated, while there is little research on video-based iTM methods. It would be interesting to make use of these existing image-based models in the video iTM task. However, directly transferring the imagebased iTM models to video data without modeling spatial-temporal information remains nontrivial and challenging. Considering both the intra-frame quality and the inter-frame consistency of a video, this article presents a new video iTM method based on a kernel prediction network (KPN), which takes advantage of multi-frame interaction (MFI) module to capture temporal-spatial information for video data. Specifically, a basic encoder-decoder KPN, essentially designed for image iTM, is trained to guarantee the mapping quality within each frame. More importantly, the MFI module is incorporated to capture temporal-spatial context information and preserve the inter-frame consistency by exploiting the correction between adjacent frames. Notably, we can readily extend any existing image iTM models to video iTM ones by involving the proposed MFI module. Furthermore, we propose an inter-frame brightness consistency loss function based on the Gaussian pyramid to reduce the video temporal inconsistency. Extensive experiments demonstrate that our model outperforms state-ofthe-art image and video-based methods. The code
is available at https://github.com/caogaofeng/KPNMFI.
Recent works on single image high dynamic range (HDR) reconstruction fail to hallucinate plausible textures, resulting in information missing and artifacts in large-scale under/over-exposed regions. In this paper, a decoupled kernel prediction network is proposed to infer an HDR image from a low dynamic range (LDR) image. Specifically, we first adopt a simple module to generate a preliminary result, which can precisely estimate well-exposed HDR regions. Meanwhile, an encoder-decoder backbone network with a soft mask guidance module is presented to predict pixel-wise kernels, which is further convolved with the preliminary result to obtain the final HDR output. Instead of traditional kernels, our predicted kernels are decoupled along the spatial and channel dimensions. The advantages of our method are threefold at least. First, our model is guided by the soft mask so that it can focus on the most relevant information for under/over-exposed regions. Second, pixel-wise kernels are able to adaptively solve the different degradations for differently exposed regions. Third, decoupled kernels can avoid information redundancy across channels and reduce the solution space of our model. Thus, our method is able to hallucinate fine details in the under/over-exposed regions and renders visually pleasing results. Extensive experiments demonstrate that our model outperforms state-of-the-art ones.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.