2022
DOI: 10.3390/s22041338
|View full text |Cite
|
Sign up to set email alerts
|

Cross-Modal Object Detection Based on a Knowledge Update

Abstract: As an important field of computer vision, object detection has been studied extensively in recent years. However, existing object detection methods merely utilize the visual information of the image and fail to mine the high-level semantic information of the object, which leads to great limitations. To take full advantage of multi-source information, a knowledge update-based multimodal object recognition model is proposed in this paper. Specifically, our method initially uses Faster R-CNN to regionalize the im… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 23 publications
0
3
0
Order By: Relevance
“…Several studies have developed diverse deep cross-modal representation learning methods that utilize the neural networks to extract image and text features and build relationships between different modalities. For example, Gao et al [ 27 ] proposed an image encoder, text encoder, and multi-modal encoder to extract text features and image features and mine rich feature information. Viviana et al [ 28 ] used convolutional neural networks to deeply extract features from images and text in the stage of establishing connections between different modal data.…”
Section: Related Workmentioning
confidence: 99%
“…Several studies have developed diverse deep cross-modal representation learning methods that utilize the neural networks to extract image and text features and build relationships between different modalities. For example, Gao et al [ 27 ] proposed an image encoder, text encoder, and multi-modal encoder to extract text features and image features and mine rich feature information. Viviana et al [ 28 ] used convolutional neural networks to deeply extract features from images and text in the stage of establishing connections between different modal data.…”
Section: Related Workmentioning
confidence: 99%
“…Target detection technology has a very important role in the field of machine vision and practical applications in life, so the automatic detection of targets is an important research task 24–27 . The performance of target detection can be seriously affected by complex background, target occlusion, noise interference, low resolution, scale and pose variation and so forth 28 . Among the many instruments in substations, the appearance and expression of instruments with different functions are very different, but there are similar characteristics for different categories of industrial instruments 29–32 .…”
Section: Introductionmentioning
confidence: 99%
“…[24][25][26][27] The performance of target detection can be seriously affected by complex background, target occlusion, noise interference, low resolution, scale and pose variation and so forth. 28 Among the many instruments in substations, the appearance and expression of instruments with different functions are very different, but there are similar characteristics for different categories of industrial instruments. [29][30][31][32] At the same time, when acquiring the image information of substation meters, the different acquisition methods of substation inspection robots will also lead to some different types of interference information, such as picture noise, which will affect the accuracy of detection results to a certain extent.…”
Section: Introductionmentioning
confidence: 99%