2020
DOI: 10.1007/978-3-030-58520-4_25
|View full text |Cite
|
Sign up to set email alerts
|

Captioning Images Taken by People Who Are Blind

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
71
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 121 publications
(72 citation statements)
references
References 47 publications
1
71
0
Order By: Relevance
“…To recognize icons, previous work [50,76,77] trained image classication models from UI design datasets [27]. To describe content in pictures, prior work used deep learning models to generate natural language descriptions of images [44,46], and some accessibility improvement research has also leveraged crowdsourcing to generate image captions [35,39,40]. We use an existing Icon Recognition engine and Image Descriptions feature in iOS [4] to generate alternative text for detected icons and pictures, respectively.…”
Section: Understanding Ui Semanticsmentioning
confidence: 99%
“…To recognize icons, previous work [50,76,77] trained image classication models from UI design datasets [27]. To describe content in pictures, prior work used deep learning models to generate natural language descriptions of images [44,46], and some accessibility improvement research has also leveraged crowdsourcing to generate image captions [35,39,40]. We use an existing Icon Recognition engine and Image Descriptions feature in iOS [4] to generate alternative text for detected icons and pictures, respectively.…”
Section: Understanding Ui Semanticsmentioning
confidence: 99%
“…A particular challenge in this area has been the lack of an authentic dataset of photos taken by the blind. To address the issue, Gurari et al (2020) created VizWiz-Captions, a dataset that consists of descriptions of images taken by people who are blind. In addition, they analyzed how the SOTA image captioning algorithms performed on this dataset.…”
Section: Related Workmentioning
confidence: 99%
“…The Vizwiz Captions dataset (Gurari et al, 2020) consists of over 39, 000 images originating from people who are blind that are each paired with five captions. The dataset consists of 23, 431 training images, 7, 750 validation images and 8, 000 test images.…”
Section: Datasetmentioning
confidence: 99%
“…Thus, these models perform poorly on images clicked by blind people * Equal contribution largely because the images clicked by blind people differ dramatically from the images present in the datasets. To encourage solving this problem, Gurari et al (2020) released the VizWiz dataset, a dataset comprising of images taken by the blind. Current work on captioning images for the blind do not use the text detected in the image when generating captions (Figures 1a and 1b show two images from the VizWiz dataset that contain text).…”
Section: Introductionmentioning
confidence: 99%