2019
DOI: 10.1111/coin.12202
|View full text |Cite
|
Sign up to set email alerts
|

A hierarchical recurrent approach to predict scene graphs from a visual‐attention‐oriented perspective

Abstract: A scene graph provides a powerful intermediate knowledge structure for various visual tasks, including semantic image retrieval, image captioning, and visual question answering. In this paper, the task of predicting a scene graph for an image is formulated as two connected problems, ie, recognizing the relationship triplets, structured as , and constructing the scene graph from the recognized relationship triplets. For relationship triplet recognition, we develop a novel hierarchical … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 19 publications
(9 citation statements)
references
References 45 publications
0
9
0
Order By: Relevance
“…Artificial neural networks have been developed as an effective statistical technique in the last 40 years (Dayhoff and DeLeo, 2001). They have been used in many fields and established as viable computational methodologies in computer science, biochemical and medical fields (Baxt and Skora, 1996;Milik et al, 1998;Gao et al, 2019;Yin et al, 2019;Deng et al, 2020;Yu et al, 2020). The network itself consists of an input layer, one or more hidden layers, and an output layer.…”
Section: Discussionmentioning
confidence: 99%
“…Artificial neural networks have been developed as an effective statistical technique in the last 40 years (Dayhoff and DeLeo, 2001). They have been used in many fields and established as viable computational methodologies in computer science, biochemical and medical fields (Baxt and Skora, 1996;Milik et al, 1998;Gao et al, 2019;Yin et al, 2019;Deng et al, 2020;Yu et al, 2020). The network itself consists of an input layer, one or more hidden layers, and an output layer.…”
Section: Discussionmentioning
confidence: 99%
“…In this work, we implemented sixteen widely used features, of which nine had morphological characteristics and seven had statistical characteristics ( Bodzas, 2019 , p. 51). Another approach to extract features is the use of a convolution neural network model, which extracts a collection of feature vectors ( Gao et al, 2019 ). In contrast to our approach, this feature space does not carry fully comprehensible information, and therefore cannot be interpreted in deep detail.…”
Section: Methodsmentioning
confidence: 99%
“…In any type of developing pattern classification algorithms, evaluation of network performance is essential in order to access the system accuracy [41]- [43]. In the present study, the system performance was evaluated using k-fold crossvalidation [44], [45].…”
Section: E System Performance Evaluationmentioning
confidence: 99%