In recent years, optical flow methods develop rapidly, achieving unprecedented high performance. Most of the methods only consider single-modal optical flow under the well-known brightness-constancy assumption. However, in many application systems, images of different modalities need to be aligned, which demands to estimate cross-modal flow between the cross-modal image pairs. A lot of cross-modal matching methods are designed for some specific cross-modal scenarios. We argue that the prior knowledge of the advanced optical flow models can be transferred to the cross-modal flow estimation, which may be a simple but unified solution for diverse cross-modal matching tasks. To verify our hypothesis, we design a self-supervised framework to promote the single-modal optical flow networks for diverse corss-modal flow estimation. Moreover, we add a Cross-Modal-Adapter block as a plugin to the state-of-the-art optical flow model RAFT for better performance in cross-modal scenarios. Our proposed Modality Promotion Framework and Cross-Modal Adapter have multiple advantages compared to the existing methods. The experiments demonstrate that our method is effective on multiple datasets of different cross-modal scenarios.
The limited spatial and angular resolutions in multi-view multimedia applications restrict their visual experience in practical use. In this paper, we first argue the space-angle super-resolution (SASR) problem for irregular arranged multi-view images. It aims to increase the spatial resolution of source views and synthesize arbitrary virtual high resolution (HR) views between them jointly. One feasible solution is to perform super-resolution (SR) and view synthesis (VS) methods separately. However, it cannot fully exploit the intra-relationship between SR and VS tasks. Intuitively, multi-view images can provide more angular references, and higher resolution can provide more high-frequency details. Therefore, we propose a one-stage space-angle super-resolution network called SASRnet, which simultaneously synthesizes real and virtual HR views. Extensive experiments on several benchmarks demonstrate that our proposed method outperforms two-stage methods, meanwhile prove that SR and VS can promote each other. To our knowledge, this work is the first to address the SASR problem for unstructured multi-view images in an end-to-end learning-based manner.
CCS CONCEPTS• Computing methodologies → Image-based rendering; Reconstruction; Image processing.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.