2020
DOI: 10.1109/lra.2020.3010735
|View full text |Cite
|
Sign up to set email alerts
|

Alleviating the Burden of Labeling: Sentence Generation by Attention Branch Encoder–Decoder Network

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
18
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3

Relationship

3
4

Authors

Journals

citations
Cited by 12 publications
(18 citation statements)
references
References 32 publications
0
18
0
Order By: Relevance
“…A representative work on the latter strand is [14], which describes a comprehension module trained by human-generated representations for use as a "critic" of the referring expression generation, and the re-evaluation of the generated representations. ABEN [15] is an encoder-decoder model that generates instructions about objects in an image. This model performs the processing in the opposite direction to our method.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…A representative work on the latter strand is [14], which describes a comprehension module trained by human-generated representations for use as a "critic" of the referring expression generation, and the re-evaluation of the generated representations. ABEN [15] is an encoder-decoder model that generates instructions about objects in an image. This model performs the processing in the opposite direction to our method.…”
Section: Related Workmentioning
confidence: 99%
“…In the field of VRE, datasets using real images include RefCLEF [22], RefCOCO [22], and GuessWhat [23], while those using synthetic images include CLEVR-Ref+ [24]. Public datasets for MLU-FI include PFN-PIC [21] and WRS-PV [15], both of which consist of images and fetching instructions for objects in the images. The PFN-PIC dataset contains real images with objects scattered in four boxes, whereas the WRS-PV dataset contains images collected in the standard simulator of World Robot Summit Partner Robot Challenge [25] by DSRs moving around the room.…”
Section: Related Workmentioning
confidence: 99%
“…Indeed, the quality of generated sentences for the FIG tasks was far worse than that for simple image captioning tasks [2]. In fact, there is a big gap in the quality between reference sentences and generated sentences by conventional methods, as shown in Section V. Although the FIG task was handled in previous studies [2], existing models did not handle the target object and destination simultaneously [2]. Therefore, they could not generate instruction sentences, such as "Move the blue flip flop to the lower left box.," where the target object is "the blue flip flop," and the destination is "the lower left box."…”
Section: Introductionmentioning
confidence: 98%
“…The FIG task is challenging because ambiguity in a sentence depends not only on the target object but also on surrounding objects. Indeed, the quality of generated sentences for the FIG tasks was far worse than that for simple image captioning tasks [2]. In fact, there is a big gap in the quality between reference sentences and generated sentences by conventional methods, as shown in Section V. Although the FIG task was handled in previous studies [2], existing models did not handle the target object and destination simultaneously [2].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation