2021
DOI: 10.1109/tpami.2019.2950025
|View full text |Cite
|
Sign up to set email alerts
|

Visual Semantic Information Pursuit: A Survey

Abstract: Visual semantic information comprises two important parts: the meaning of each visual semantic unit and the coherent visual semantic relation conveyed by these visual semantic units. Essentially, the former one is a visual perception task while the latter one corresponds to visual context reasoning. Remarkable advances in visual perception have been achieved due to the success of deep learning. In contrast, visual semantic information pursuit, a visual scene semantic interpretation task combining visual percep… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
15
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4

Relationship

2
7

Authors

Journals

citations
Cited by 30 publications
(15 citation statements)
references
References 102 publications
0
15
0
Order By: Relevance
“…In the context of Image Understanding, another key aspect is linking the linguistic, encyclopaedical and commonsense textual sources discussed in the previous Sections with imagery. A set of relevant KBs for Image Understanding can be derived from (Wu et al, 2017) and (Liu et al, 2019). Here we focus on the image collections, among those identified in the last two surveys, which have been mapped to the taxonomies discussed in the previous sections, to facilitate entity resolution across different knowledge sources.…”
Section: Knowledge Representation For Service Robotsmentioning
confidence: 99%
“…In the context of Image Understanding, another key aspect is linking the linguistic, encyclopaedical and commonsense textual sources discussed in the previous Sections with imagery. A set of relevant KBs for Image Understanding can be derived from (Wu et al, 2017) and (Liu et al, 2019). Here we focus on the image collections, among those identified in the last two surveys, which have been mapped to the taxonomies discussed in the previous sections, to facilitate entity resolution across different knowledge sources.…”
Section: Knowledge Representation For Service Robotsmentioning
confidence: 99%
“…In 2005, the 3DSV (3D Story Visualiser) system [22] was proposed by Zeng et al They implemented an interactive 3D animation interface that builds a scene from a story text with simple constraints. Spika et al [23] describe AVDT (Automatic Visualization of Descriptive Texts) for advanced scene production. Chang et al (2014) [24] is the developer of a text-to-scene system, they focused on learning dimension special by lots of indoor scenes.…”
Section: Related Workmentioning
confidence: 99%
“…z * = arg max z p(z|x), while the variational learning step tries to fit the model posterior p(z|x) with the underlying ground-truth posterior p r (z|x) by maximizing the conditional likelihood. Such a VB framework is implemented in current SGG models [7], [8], [9], [10], [11], [12], [13], [14] by constructing two fundamental modules, namely: visual perception and visual context reasoning [15], as shown in Fig. 1.…”
Section: Introductionmentioning
confidence: 99%