2017 13th International Conference on Signal-Image Technology &Amp; Internet-Based Systems (SITIS) 2017
DOI: 10.1109/sitis.2017.27
|View full text |Cite
|
Sign up to set email alerts
|

Deep Interactive Region Segmentation and Captioning

Abstract: With recent innovations in dense image captioning, it is now possible to describe every object of the scene with a caption while objects are determined by bounding boxes. However, interpretation of such an output is not trivial due to the existence of many overlapping bounding boxes. Furthermore, in current captioning frameworks, the user is not able to involve personal preferences to exclude out of interest areas. In this paper, we propose a novel hybrid deep learning architecture for interactive region segme… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 64 publications
(114 reference statements)
0
4
0
Order By: Relevance
“…A fully convolutional network (FCN) is trained to predict the foreground/background from image-user interaction pairs. With similar imageuser interaction pairs as input to the network, Boroujerdi et al [17] use a lyncean fully convolutional network to predict foreground/background. This network replaces the last two convolutional layers in the FCN in [15] with three convolutional layers with gradually decreased kernel size to better capture the geometry of objects.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…A fully convolutional network (FCN) is trained to predict the foreground/background from image-user interaction pairs. With similar imageuser interaction pairs as input to the network, Boroujerdi et al [17] use a lyncean fully convolutional network to predict foreground/background. This network replaces the last two convolutional layers in the FCN in [15] with three convolutional layers with gradually decreased kernel size to better capture the geometry of objects.…”
Section: Related Workmentioning
confidence: 99%
“…This algorithm leads to improved performance, since it is more closely aligned with the patterns of real users. Essentially, all these networks [15,17,16,18,19,21,28] adopt early fusion structures. They combine the image and the user interaction features from the first layer of DNN.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…In these cases the cost of pixel-level annotation can be reduced by automating a portion of the task. Matting and object selection [50,33,34,6,58,57,10,30,59] generate tight boundaries from loosely annotated boundaries or few inside/outside clicks and scribbles. [44,38] introduced a predictive method which automatically infers a foreground mask from 4 boundary clicks, and was extended to full-image segmentation in [2].…”
Section: Related Workmentioning
confidence: 99%