SpatialVOC2K: A Multilingual Dataset of Images with Annotations and Features for Spatial Relations between Objects

Belz, Anja; Muscat, Adrian; Anguill, Pierre; Sow, Mouhamadou; Vincent, Gaetan; Zinessabah, Yassine

doi:10.18653/v1/w18-6516

Cited by 8 publications

(9 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…SpatialVOC2K [158] is the first multilingual image dataset with spatial relation annotations and object features for image-to-text generation. It consists of all 2,026 images…”

Section: D Datasetsmentioning

confidence: 99%

Scene Graph Generation: A Comprehensive Survey

Zhu¹,

Zhang²,

Jiang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Deep learning techniques have led to remarkable breakthroughs in the field of generic object detection and have spawned a lot of scene-understanding tasks in recent years. Scene graph has been the focus of research because of its powerful semantic representation and applications to scene understanding. Scene Graph Generation (SGG) refers to the task of automatically mapping an image into a semantic structural scene graph, which requires the correct labeling of detected objects and their relationships. Although this is a challenging task, the community has proposed a lot of SGG approaches and achieved good results. In this paper, we provide a comprehensive survey of recent achievements in this field brought about by deep learning techniques. We review 138 representative works that cover different input modalities, and systematically summarize existing methods of image-based SGG from the perspective of feature extraction and fusion. We attempt to connect and systematize the existing visual relationship detection methods, to summarize, and interpret the mechanisms and the strategies of SGG in a comprehensive way. Finally, we finish this survey with deep discussions about current existing problems and future research directions. This survey will help readers to develop a better understanding of the current research status and ideas.

show abstract

“…SpatialVOC2K [158] is the first multilingual image dataset with spatial relation annotations and object features for image-to-text generation. It consists of all 2,026 images…”

Section: D Datasetsmentioning

confidence: 99%

Scene Graph Generation: A Comprehensive Survey

Zhu¹,

Zhang²,

Jiang³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…The first dataset used is one containing spatial relations in images [9,10,11], consisting of 20 different objects and 17 different target values. It is split into two parts, of which the entitled best part is used, containing 5317 examples, further subdivided into five stratified folds.…”

Section: Spatialvoc2kmentioning

confidence: 99%

“…Two datasets are used to evaluate the paradigms against a baseline. The first is a real-world dataset of spatial relations in images, SpatialVOC2K [9,10,11]. The second is a synthetic dataset consisting of nine clusters, which can be seen in Figure 1, used to experiment freely with its characteristics.…”

Section: Introductionmentioning

confidence: 99%

One vs Previous and Similar Classes Learning -- A Comparative Study

Cauchi,

Muscat

2021

Preprint

Self Cite

View full text Add to dashboard Cite

When dealing with multi-class classification problems, it is common practice to build a model consisting of a series of binary classifiers using a learning paradigm which dictates how the classifiers are built and combined to discriminate between the individual classes. As new data enters the system and the model needs updating, these models would often need to be retrained from scratch. This work proposes three learning paradigms which allow trained models to be updated without the need of retraining from scratch. A comparative analysis is performed to evaluate them against a baseline. Results show that the proposed paradigms are faster than the baseline at updating, with two of them being faster at training from scratch as well, especially on larger datasets, while retaining a comparable classification performance.

show abstract

“…The SpatialVOC2K (Belz et al, 2018) dataset is used to train and test the pattern recognition models. This dataset consists of 2,026 images with object labels, bounding boxes annotations extracted from the PAS-CAL VOC2008 challenge dataset (Everingham et al, 2007), to which relations between objects and depth values were added (Belz et al, 2018).…”

Section: Datasetmentioning

confidence: 99%

Predicting Relative Depth between Objects from Semantic Features

Cassar,

Muscat,

Seychell

2021

Preprint

Self Cite

View full text Add to dashboard Cite

Vision and language tasks such as Visual Relation Detection and Visual Question Answering benefit from semantic features that afford proper grounding of language. The 3D depth of objects depicted in 2D images is one such feature. However it is very difficult to obtain accurate depth information without learning the appropriate features, which are scene dependent. The state of the art in this area are complex Neural Network models trained on stereo image data to predict depth per pixel. Fortunately, in some tasks, its only the relative depth between objects that is required. In this paper the extent to which semantic features can predict course relative depth is investigated. The problem is casted as a classification one and geometrical features based on object bounding boxes, object labels and scene attributes are computed and used as inputs to pattern recognition models to predict relative depth. i.e behind, in-front and neutral. The results are compared to those obtained from averaging the output of the monodepth neural network model, which represents the state-of-theart. An overall increase of 14% in relative depth accuracy over relative depth computed from the monodepth model derived results is achieved.

show abstract

SpatialVOC2K: A Multilingual Dataset of Images with Annotations and Features for Spatial Relations between Objects

Cited by 8 publications

References 15 publications

Scene Graph Generation: A Comprehensive Survey

Scene Graph Generation: A Comprehensive Survey

One vs Previous and Similar Classes Learning -- A Comparative Study

Predicting Relative Depth between Objects from Semantic Features

Contact Info

Product

Resources

About