Runnan Chen scite author profile

We study the problem of weakly supervised grounded image captioning. That is, given an image, the goal is to automatically generate a sentence describing the context of the image with each noun word grounded to the corresponding region in the image. This task is challenging due to the lack of explicit fine-grained region word alignments as supervision. Previous weakly supervised methods mainly explore various kinds of regularization schemes to improve attention accuracy. However, their performances are still far from the fully supervised ones. One main issue that has been ignored is that the attention for generating visually groundable words may only focus on the most discriminate parts and can not cover the whole object. To this end, we propose a simple yet effective method to alleviate the issue, termed as partial grounding problem in our paper. Specifically, we design a distributed attention mechanism to enforce the network to aggregate information from multiple spatially different regions with consistent semantics while generating the words. Therefore, the union of the focused region proposals should form a visual region that encloses the object of interest completely. Extensive experiments have demonstrated the superiority of our proposed method compared with the state-of-the-arts.

show abstract

PR-Net: Preference Reasoning for Personalized Video Highlight Detection

Chen

Zhou

Wang

et al. 2021

View full text Add to dashboard Cite

Structure-Aware Long Short-Term Memory Network for 3D Cephalometric Landmark Detection

Chen

Chen³

et al. 2022

IEEE Trans. Med. Imaging

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Runnan Chen

TSegNet: An efficient and accurate tooth segmentation network on 3D dental model

TANet: Towards Fully Automatic Tooth Arrangement

Distributed Attention for Grounded Image Captioning

PR-Net: Preference Reasoning for Personalized Video Highlight Detection

Structure-Aware Long Short-Term Memory Network for 3D Cephalometric Landmark Detection

Contact Info

Product

Resources

About