2019
DOI: 10.1109/access.2019.2957513
|View full text |Cite
|
Sign up to set email alerts
|

Fully Convolutional CaptionNet: Siamese Difference Captioning Attention Model

Abstract: The generation of the textual description of the differences in images is a relatively new concept that requires the fusion of both computer vision and natural language techniques. In this paper, we present a novel Fully Convolutional CaptionNet (FCC) that employs an encoder-decoder framework to perform visual feature extractions, compute the feature distances, and generate new sentences describing the measured distances. After extracting the features of the images, a contrastive function is used to compute th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
20
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
3
1

Relationship

4
6

Authors

Journals

citations
Cited by 26 publications
(20 citation statements)
references
References 41 publications
0
20
0
Order By: Relevance
“…To validate the generalization ability of the proposed method, we conduct the experiments on a recent published Spot-the-Diff dataset, where the image pairs are mostly well aligned and their is no viewpoint change. We compare with eight SOTA methods and most of them cannot consider handling viewpoint changes: DDLA (Jhamtani and Berg-Kirkpatrick, 2018), DDUA (Park et al, 2019), SDCM (Oluwasanmi et al, 2019a), FCC (Oluwasanmi et al, 2019b), static rel-att / dyanmic rel-att (Tan et al, 2019), and M-VAM / M-VAM+RAF (Shi et al, 2020).…”
Section: Results On Spot-the-diffmentioning
confidence: 99%
“…To validate the generalization ability of the proposed method, we conduct the experiments on a recent published Spot-the-Diff dataset, where the image pairs are mostly well aligned and their is no viewpoint change. We compare with eight SOTA methods and most of them cannot consider handling viewpoint changes: DDLA (Jhamtani and Berg-Kirkpatrick, 2018), DDUA (Park et al, 2019), SDCM (Oluwasanmi et al, 2019a), FCC (Oluwasanmi et al, 2019b), static rel-att / dyanmic rel-att (Tan et al, 2019), and M-VAM / M-VAM+RAF (Shi et al, 2020).…”
Section: Results On Spot-the-diffmentioning
confidence: 99%
“…Furthermore, the Siamese Difference Captioning Model (SDCM) also combined techniques from deep Siamese convolutional neural network, soft attention mechanism, word embedding, and bidirectional long short-term memory [167]. e features in each image input are computed using the Siamese network, and their differences are obtained using a weighted L1 distance function.…”
Section: Unsupervised or Semisupervised Captioningmentioning
confidence: 99%
“…LOF achieves this by introducing a MinDist (k) parameter representing neighboring data in a particular region of consideration. In several other clustering algorithms such as K-means and fuzzy C-means, different techniques such as Euclidean distance or squared Euclidean distances [ 14 ] are established to compute the distance between data points [ 15 ].…”
Section: Related Workmentioning
confidence: 99%