We consider the problem of localizing a spatio-temporal tube in a video corresponding to a given text query. This is a challenging task that requires the joint and efficient modeling of temporal, spatial and multi-modal interactions. To address this task, we propose TubeDETR, a transformerbased architecture inspired by the recent success of such models for text-conditioned object detection. Our model notably includes: (i) an efficient video and text encoder that models spatial multi-modal interactions over sparsely sampled frames and (ii) a space-time decoder that jointly performs spatio-temporal localization. We demonstrate the advantage of our proposed components through an extensive ablation study. We also evaluate our full approach on the spatio-temporal video grounding task and demonstrate improvements over the state of the art on the challenging VidSTG and HC-STVG benchmarks.
Both RGB and YCrCb color space are often used in video image processing, along with the wide application of FPGA in the field of video image processing, RGB to YCrCb color space conversion is frequently needed on FPGA. This paper analyzes the process of RGB to YCrCb color space conversion on FPGA, and proposes a fast conversion method using look-up table and pipeline technology. Firstly, on the premise of holding accuracy, floating point numbers are expanded to integer which is convenient for FPGA processing. Secondly, aimed at the speed limitation of multiplication in conversion, multiplications are transformed to look-up tables and additions. Finally, in the course of numerous addition operations, pipeline technology is fully utilized to further improve the operation speed. The proposed method which is implemented on XC4VLX15 chip for color space conversion, obtains maximum operating frequency of 358MHz, 3.5 times faster than that of direct method. Experimental results demonstrate that the proposed method can effectively improve the speed of RGB to YCrCb color space conversion when compared with existing method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.