Yong Qian scite author profile

Yong Qian

3Publications

0Citation Statements Received

206Citation Statements Given

How they've been cited

How they cite others

204

Affiliations

Hohai University

Publications

Order By: Most citations

FRDet: Few‐shot object detection via feature reconstruction

Chen

Mao

Qian

et al. 2023

IET Image Processing

View full text Add to dashboard Cite

State‐of‐the‐art object detection models rely on large‐scale datasets for training to achieve good precision. Without sufficient samples, the model can suffer from severe overfitting. Current explorations in few‐shot object detection are mainly divided into meta‐learning‐based methods and fine‐tuning‐based methods. However, existing models do not focus on how feature maps should be processed to present more accurate regions of interest (RoIs), leading to many non‐supporting RoIs. These non‐supporting RoIs can increase the burden of subsequent classification and even lead to misclassification. Additionally, catastrophic forgetting is inevitable in both few‐shot object detection models. Many models classify directly in low‐dimensional spaces due to insufficient resources, but this transformation of the data space can confuse some categories and lead to misclassification. To address these problems, the Feature Reconstruction Detector (FRDet) is proposed, a simple yet effective fine‐tune‐based approach for few‐shot object detection. FRDet includes a region proposal network (RPN) based on channel attention and space attention called Multi‐Attention RPN (MARPN) and a head based on feature reconstruction called Feature Reconstruction Head (FRHead). MARPN utilizes channel attention to suppress non‐supporting classes and spatial attention to enhance support classes based on Attention RPN, resulting in fewer but more accurate RoIs. Meanwhile, FRHead utilizes support features to reconstruct query RoI features through a closed‐form solution, allowing for a comprehensive and fine‐grained comparison. The model was validated on the PASCAL VOC, MS COCO, FSOD, and CUB200 datasets and achieved better results.

show abstract

CACRM: Cross-Attention Based Image-Text CrossModal Retrieval

Suo

Mao

et al. 2022

View full text Add to dashboard Cite

Dense video captioning based on local attention

Qian

Mao

Chen

et al. 2023

IET Image Processing

View full text Add to dashboard Cite

Dense video captioning aims to locate multiple events in an untrimmed video and generate captions for each event. Previous methods experienced difficulties in establishing the multimodal feature relationship between frames and captions, resulting in low accuracy of the generated captions. To address this problem, a novel Dense Video Captioning Model Based on Local Attention (DVCL) is proposed. DVCL employs a 2D temporal differential CNN to extract video features, followed by feature encoding using a deformable transformer that establishes the global feature dependence of the input sequence. Then DIoU and TIoU are incorporated into the event proposal match algorithm and evaluation algorithm during training, to yield more accurate event proposals and hence increase the quality of the captions. Furthermore, an LSTM based on local attention is designed to generate captions, enabling each word in the captions to correspond to the relevant frame. Extensive experimental results demonstrate the effectiveness of DVCL. On the ActivityNet Captions dataset, DVCL performs significantly better than other baselines, with improvements of 5.6%, 8.2%, and 15.8% over the best baseline in BLEU4, METEOR, and CIDEr, respectively.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yong Qian

FRDet: Few‐shot object detection via feature reconstruction

CACRM: Cross-Attention Based Image-Text CrossModal Retrieval

Dense video captioning based on local attention

Contact Info

Product

Resources

About