2023
DOI: 10.1109/tip.2023.3268004
|View full text |Cite
|
Sign up to set email alerts
|

Viewpoint-Adaptive Representation Disentanglement Network for Change Captioning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 16 publications
(2 citation statements)
references
References 32 publications
0
2
0
Order By: Relevance
“…While all methods suffer significant performance drops when evaluated on a dataset from a different domain, VIXEN-Q shows a better ability to generalize to new data by scoring the highest on the PSBattles dataset. After finetuning the model on Image Editing Request, VIXEN-C outperforms previous methods on most metrics, except B@4 of VARD (Tu et al 2023a).…”
Section: Resultsmentioning
confidence: 96%
See 1 more Smart Citation
“…While all methods suffer significant performance drops when evaluated on a dataset from a different domain, VIXEN-Q shows a better ability to generalize to new data by scoring the highest on the PSBattles dataset. After finetuning the model on Image Editing Request, VIXEN-C outperforms previous methods on most metrics, except B@4 of VARD (Tu et al 2023a).…”
Section: Resultsmentioning
confidence: 96%
“…DUDA (Park, Darrell, and Rohrbach 2019) instead computes image difference at CNN semantic level, improving the robustness against slight global changes. In M-VAM (Shi et al 2020) and VACC (Kim et al 2021), a view-point encoder is proposed to mitigate potential view-point difference and VARD (Tu et al 2023a) proposes a viewpoint invariant representation network to explicitly capture the change. Meanwhile, (Sun et al 2022) uses bidirectional encoding to improve change localization and NCT (Tu et al 2023b) aggregates neighboring features with a transformer.…”
Section: Related Workmentioning
confidence: 99%