Proceedings of the SIGCHI Conference on Human Factors in Computing Systems 2004
DOI: 10.1145/985692.985730
|View full text |Cite
|
Sign up to set email alerts
|

Improving speech playback using time-compression and speech recognition

Abstract: Despite the ready availability of digital recording technology and the continually decreasing cost of digital storage, browsing audio recordings remains a tedious task. This paper presents evidence in support of a system designed to assist with information comprehension and retrieval tasks from a large collection of recorded speech. Two techniques are employed to assist users with these tasks. First, a speech recognizer creates necessarily error-laden transcripts of the recorded speech. Second, audio playback … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

2
21
1

Year Published

2004
2004
2021
2021

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 30 publications
(24 citation statements)
references
References 12 publications
2
21
1
Order By: Relevance
“…In Vermuri et al [9], an audio playback interface was tested using recognition results with and without confidence visualization. No difference in users' comprehension rate was found.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…In Vermuri et al [9], an audio playback interface was tested using recognition results with and without confidence visualization. No difference in users' comprehension rate was found.…”
Section: Related Workmentioning
confidence: 99%
“…[1,8,9]), in this paper, we focus on the first part of the correction problem only: finding errors. Detection of errors can be tricky for users as errors made by a recognizer are all valid words in a language.…”
Section: Introductionmentioning
confidence: 99%
“…In discussing recorded speech, Vemuri and colleagues discuss one reason why: aural speech delivery presents unique challenges [17]. The average speech rate of an English speaker is over twice as slow as the average reading rate.…”
Section: Introductionmentioning
confidence: 99%
“…This large disparity suggests that automatically transcribing audio and then accessing it as a written document would be most effective for information retrieval tasks. However, in reading a text transcript, the prosodic cues, which make speech rich in meaning and subtlety, are lost [17].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation