Semantic‐meshed and content‐guided transformer for image captioning

Li, Xuan; Zhang, Wenkai; Sun, Xian; Gao, Xin

doi:10.1049/cvi2.12099

Cited by 5 publications

(1 citation statement)

References 46 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Computer vision is an important branch of computer technology, and it is a complex field. An important task of computer vision is to process the collected image and video information, and the processing effect can be similar to that of human processing [1][2][3]. This technology has broad applications, which involve high processing requirements, and some tasks require human assistance.…”

Section: Introductionmentioning

confidence: 99%

The Application of FLICSP Algorithm Based on Multi-Feature Fusion in Image Saliency Detection

Li,

Wu,

Wang

2024

IEEE Access

View full text Add to dashboard Cite

To raise the performance of the current image saliency detection method and promote the extraction effect of salient regions in complex images, the color and texture features fusion is studied on the basis of fast linear iterative clustering with active search to obtain the relevant saliency detection algorithm, and then the fused saliency map is obtained. Algorithm performance is optimized by using a deep prior information-assisted image enhancement model. The outcomes express that compared with other algorithms, the improved algorithm has a smaller mean absolute error, higher precision and F value, lower missed detection rate and false detection rate, and higher peak signal-to-noise ratio and structure similarity index. In the JUDD dataset, the minimum mean absolute error value of the improved algorithm is 0.143, which is 0.34 less than the original algorithm. The improved algorithm in the PASCAL dataset has the highest precision, and F-value, with 0.786 and 0.754 respectively, while the F-value of the pre-improved algorithm is 0.678. In terms of missed detection rate, the improved algorithm is 4.7%, which is 2.5% lower than the previous algorithm; In terms of false detection rates, the pre and post-improvement algorithms have false detection rates of 3.2% and 5.1%, respectively. In the peak signal-to-noise ratio index, the improved algorithm has a maximum value of 39.45 dB, which is 6.82 dB higher than the previous algorithm; Unlike other algorithms, the improved algorithm has the highest similarity index value of 0.892. Research methods can effectively detect the saliency of complex images.

show abstract

Section: Introductionmentioning

confidence: 99%

The Application of FLICSP Algorithm Based on Multi-Feature Fusion in Image Saliency Detection

Li,

Wu,

Wang

2024

IEEE Access

View full text Add to dashboard Cite

show abstract

Image captioning using transformer-based double attention network

Parvin

Naghsh-Nilchi

Mohammadi

2023

Engineering Applications of Artificial Intelligence

View full text Add to dashboard Cite

Tag‐inferring and tag‐guided Transformer for image captioning

Yi,

Liang,

Kong

et al. 2024

IET Computer Vision

View full text Add to dashboard Cite

Image captioning is an important task for understanding images. Recently, many studies have used tags to build alignments between image information and language information. However, existing methods ignore the problem that simple semantic tags have difficulty expressing the detailed semantics for different image contents. Therefore, the authors propose a tag‐inferring and tag‐guided Transformer for image captioning to generate fine‐grained captions. First, a tag‐inferring encoder is proposed, which uses the tags extracted by the scene graph model to infer tags with deeper semantic information. Then, with the obtained deep tag information, a tag‐guided decoder that includes short‐term attention to improve the features of words in the sentence and gated cross‐modal attention to combine image features, tag features and language features to produce informative semantic features is proposed. Finally, the word probability distribution of all positions in the sequence is calculated to generate descriptions for the image. The experiments demonstrate that the authors’ method can combine tags to obtain precise captions and that it achieves competitive performance with a 40.6% BLEU‐4 score and 135.3% CIDEr score on the MSCOCO data set.

show abstract

Semantic‐meshed and content‐guided transformer for image captioning

Cited by 5 publications

References 46 publications

The Application of FLICSP Algorithm Based on Multi-Feature Fusion in Image Saliency Detection

The Application of FLICSP Algorithm Based on Multi-Feature Fusion in Image Saliency Detection

Image captioning using transformer-based double attention network

Tag‐inferring and tag‐guided Transformer for image captioning

Contact Info

Product

Resources

About