Hongxu Jiang scite author profile

We consider the problem of localizing a spatio-temporal tube in a video corresponding to a given text query. This is a challenging task that requires the joint and efficient modeling of temporal, spatial and multi-modal interactions. To address this task, we propose TubeDETR, a transformerbased architecture inspired by the recent success of such models for text-conditioned object detection. Our model notably includes: (i) an efficient video and text encoder that models spatial multi-modal interactions over sparsely sampled frames and (ii) a space-time decoder that jointly performs spatio-temporal localization. We demonstrate the advantage of our proposed components through an extensive ablation study. We also evaluate our full approach on the spatio-temporal video grounding task and demonstrate improvements over the state of the art on the challenging VidSTG and HC-STVG benchmarks.

show abstract

Remote-Sensing Image Compression Using Two-Dimensional Oriented Wavelet Transform

Yang

Jiang

2011

IEEE Trans. Geosci. Remote Sensing

View full text Add to dashboard Cite

Optimized-SSIM Based Quantization in Optical Remote Sensing Image Compression

Yang

Jiang

2011

View full text Add to dashboard Cite

A fast method for RGB to YCrCb conversion based on FPGA

Jiang

Liu

et al. 2013

View full text Add to dashboard Cite

Both RGB and YCrCb color space are often used in video image processing, along with the wide application of FPGA in the field of video image processing, RGB to YCrCb color space conversion is frequently needed on FPGA. This paper analyzes the process of RGB to YCrCb color space conversion on FPGA, and proposes a fast conversion method using look-up table and pipeline technology. Firstly, on the premise of holding accuracy, floating point numbers are expanded to integer which is convenient for FPGA processing. Secondly, aimed at the speed limitation of multiplication in conversion, multiplications are transformed to look-up tables and additions. Finally, in the course of numerous addition operations, pipeline technology is fully utilized to further improve the operation speed. The proposed method which is implemented on XC4VLX15 chip for color space conversion, obtains maximum operating frequency of 358MHz, 3.5 times faster than that of direct method. Experimental results demonstrate that the proposed method can effectively improve the speed of RGB to YCrCb color space conversion when compared with existing method.

show abstract

Highly Paralleled Low-Cost Embedded HEVC Video Encoder on TI KeyStone Multicore DSP

Jiang

Fan

Zhang

et al. 2019

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Hongxu Jiang

Human-Centric Spatio-Temporal Video Grounding With Visual Transformers

Remote-Sensing Image Compression Using Two-Dimensional Oriented Wavelet Transform

Optimized-SSIM Based Quantization in Optical Remote Sensing Image Compression

A fast method for RGB to YCrCb conversion based on FPGA

Highly Paralleled Low-Cost Embedded HEVC Video Encoder on TI KeyStone Multicore DSP

Contact Info

Product

Resources

About