2017
DOI: 10.1007/978-3-319-68288-4_4
|View full text |Cite
|
Sign up to set email alerts
|

Improving Visual Relationship Detection Using Semantic Modeling of Scene Descriptions

Abstract: Structured scene descriptions of images are useful for the automatic processing and querying of large image databases. We show how the combination of a semantic and a visual statistical model can improve on the task of mapping images to their associated scene description. In this paper we consider scene descriptions which are represented as a set of triples (subject, predicate, object), where each triple consists of a pair of visual objects, which appear in the image, and the relationship between them (e.g. ma… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
49
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 41 publications
(49 citation statements)
references
References 35 publications
0
49
0
Order By: Relevance
“…These scores are combined with a language prior score (based on word embeddings) that models the semantics of the visual relationships. The methods in [2] also combines visual and semantic information. However, link prediction methods (RESCAL, MultiwayNN, CompleEx, DistMult) are used for modelling the visual relationship semantics in place of word embeddings.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…These scores are combined with a language prior score (based on word embeddings) that models the semantics of the visual relationships. The methods in [2] also combines visual and semantic information. However, link prediction methods (RESCAL, MultiwayNN, CompleEx, DistMult) are used for modelling the visual relationship semantics in place of word embeddings.…”
Section: Methodsmentioning
confidence: 99%
“…The visual knowledge consists in the features of the union of the subject and object bounding boxes. In [2] the background knowledge is statistical information (learnt with statistical link prediction methods [21]) about the training set triples. Contextual information between objects is used also in [23], [35] with different learning methods.…”
Section: Related Workmentioning
confidence: 99%
“…An example of this are the Logic Tensor Networks in [46], where the authors show that encoding prior knowledge in symbolic form allows for better learning results on fewer training data, as well as more robustness against noise. A similar example is given in [47], where knowledge graphs are successfully used as priors in a scene description task, and in [48] where logical rules are used as background knowledge for a gradient descent learning task in a high-dimensional real-valued vector space.…”
Section: Learning With Symbolic Information As a Priormentioning
confidence: 99%
“…Visual Appearance Features are extracted from the predicate box, i.e. the minimum rectangle that encompasses the subject box and the object box [1,12,2,13,14,3], the separate subject-object boxes [5,15,16,17], or both [18,19,4,20,21]. All the above train a single branch with visual features, while we jointly train two separate branches with different features, a predicate feature branch (P-branch) and an object-subject branch (OS-branch), and employ Deep Supervision to align their scores into a common space.…”
Section: Related Workmentioning
confidence: 99%
“…Linguistic and Semantic Features are employed in a feature-level integration with word embeddings [1,13,4,12,3], encoding of statistics [18,13,4,14], late-fusion with subject-object classemes (score vectors) [22,5,14,19] and loss-level fusion as regularization terms [1,3] or adaptive-margins [4,19,12]. Closest to us, [2] uses subject-object embeddings to train context-aware classifiers and [20] trains multimodal embeddings by projecting visual and linguistic features into a common space.…”
Section: Related Workmentioning
confidence: 99%