Proceedings of the 12th International ACM SIGACCESS Conference on Computers and Accessibility 2010
DOI: 10.1145/1878803.1878833
|View full text |Cite
|
Sign up to set email alerts
|

Are synthesized video descriptions acceptable?

Abstract: We conducted a series of experiments to assess the feasibility of synthesized narrations to describe online videos. To reduce the cultural bias, we included adult blind or low-vision participants from Japan and the U.S. in the main study. Our research also includes a follow-up study we conducted in Japan to assess the effectiveness of synthesized video descriptions in realistic situations. The results showed that synthesized video descriptions were generally accepted in both countries. We also found that appro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 28 publications
(6 citation statements)
references
References 7 publications
0
6
0
Order By: Relevance
“…Rapid successive shots from players may have led to short time intervals between bounces of the ball, which in turn would reduce the time between descriptions. Another issue may be the synthesized nature of descriptions: although generally considered acceptable, human speech is both preferred and easier to understand for audio descriptions [23]. We used a generic HRTF for rendering 3D audio because it was unfeasible to capture individual HRTFs.…”
Section: Discussionmentioning
confidence: 99%
“…Rapid successive shots from players may have led to short time intervals between bounces of the ball, which in turn would reduce the time between descriptions. Another issue may be the synthesized nature of descriptions: although generally considered acceptable, human speech is both preferred and easier to understand for audio descriptions [23]. We used a generic HRTF for rendering 3D audio because it was unfeasible to capture individual HRTFs.…”
Section: Discussionmentioning
confidence: 99%
“…Gagnon et al's tool also provides authors with timeline-based visualisations tailored to the production of cinematic audio descriptions including recognition of scenes, characters, and important locations [18]. While prior work suggests that people prefer human-narrated audio descriptions to speech-to-text audio descriptions when available [15,28], the aforementioned systems do not support narrated audio descriptions. Two tools that do allow for spoken audio descriptions are LiveDescribe [9] and YouDescribe [25].…”
Section: Describing Videosmentioning
confidence: 99%
“…To simplify audio descriptions, we follow an approach similar to prior work in sentence simplification by first generating simplified candidates for each description based on the parse tree and later ranking the simplified candidates [57]. We constrain our generated candidate descriptions to ones that contain a subset of the words contained in the original description, such that we can later automatically generate the audio for a candidate description by concatenating existing audio for each word (i.e., without re-recording), as human narrated descriptions are preferred by users [28]. In order to generate subset candidate descriptions, we first parse the descriptions to find parts of speech and dependencies for each word in the description (using SpaCy's part of speech and dependency Figure 6.…”
Section: Generating Candidate Descriptionsmentioning
confidence: 99%
See 1 more Smart Citation
“…However, this involves two problems: First, assuming presence of audio description for the videos is unrealistic. Second, approaches that automatically generate audio description (Kobayashi et al 2010) wrongly assume any video as a candidate to their approach. For instance, a video showing a robot machine in response to the query : "Artificial Intelligence" is of little use to the visually impaired user even with audio description.…”
Section: Introductionmentioning
confidence: 99%