Proceedings of the 55th Annual Meeting of the Association For Computational Linguistics (Volume 2: Short Papers) 2017
DOI: 10.18653/v1/p17-2066
|View full text |Cite
|
Sign up to set email alerts
|

STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset

Abstract: In recent years, automatic generation of image descriptions (captions), that is, image captioning, has attracted a great deal of attention. In this paper, we particularly consider generating Japanese captions for images. Since most available caption datasets have been constructed for English language, there are few datasets for Japanese. To tackle this problem, we construct a large-scale Japanese image caption dataset based on images from MS-COCO, which is called STAIR Captions. STAIR Captions consists of 820,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
76
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 95 publications
(76 citation statements)
references
References 16 publications
0
76
0
Order By: Relevance
“…The Japanese corpus we use is based on the newly created STAIR dataset [6]. Using the same methodology as [2], [6] collected 5 Japanese captions for each image of the original MSCOCO dataset. As for the original MSCOCO dataset, Japanese captions were written by native Japanese speakers.…”
Section: English and Japanese Corporamentioning
confidence: 99%
“…The Japanese corpus we use is based on the newly created STAIR dataset [6]. Using the same methodology as [2], [6] collected 5 Japanese captions for each image of the original MSCOCO dataset. As for the original MSCOCO dataset, Japanese captions were written by native Japanese speakers.…”
Section: English and Japanese Corporamentioning
confidence: 99%
“…For image captioning, we utilize the multi30k (Elliott et al 2016), COCO (Chen et al 2015) and STAIR (Yoshikawa, Shigeto, and Takeuchi 2017) datasets. The multi30k dataset contains 30k images and annotations under two tasks.…”
Section: Datasetsmentioning
confidence: 99%
“…MS-COCO (Lin et al, 2014) contains 123'287 images and five English captions per image. Yoshikawa et al (2017) proposed a model which generates Japanese descriptions for images. We divide the dataset based on .…”
Section: Datasetsmentioning
confidence: 99%
“…Previous works in image-caption task and learning a joint embedding space for texts and images are mostly related to English language, however, recently there is a large amount of research in other languages due to the availability of multilingual datasets (Funaki and Nakayama, 2015;Rajendran et al, 2015;Miyazaki and Shimizu, 2016;Young et al, 2014;Hitschler and Riezler, 2016;Yoshikawa et al, 2017). The aim of these models is to map images and their captions in a single language into a joint embedding space (Rajendran et al, 2015;Calixto et al, 2017).…”
Section: Introductionmentioning
confidence: 99%