Proceedings of the 2020 International Conference on Multimodal Interaction 2020
DOI: 10.1145/3382507.3420054
|View full text |Cite
|
Sign up to set email alerts
|

Speech, Voice, Text, and Meaning

Abstract: Interview data is multimodal data: it consists of speech sound, facial expression and gestures, captured in a particular situation, and containing textual information and emotion.This workshop shows how a multidisciplinary approach may exploit the full potential of interview data. The workshop first gives a systematic overview of the research fields working with interview data. It then presents the speech technology currently available to support transcribing and annotating interview data, such as automatic sp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 1 publication
0
1
0
Order By: Relevance
“…Even in recent times, a comparatively high word error rate (WER) on the oral history domain, compared to other ASR tasks, characterizes most works. Hessen et al (2013) describe the use of ASR to transcribe Dutch oral history archives. The authors state the WER is above 40 % for Dutch oral history interviews at the time of publishing.…”
Section: Related Workmentioning
confidence: 99%
“…Even in recent times, a comparatively high word error rate (WER) on the oral history domain, compared to other ASR tasks, characterizes most works. Hessen et al (2013) describe the use of ASR to transcribe Dutch oral history archives. The authors state the WER is above 40 % for Dutch oral history interviews at the time of publishing.…”
Section: Related Workmentioning
confidence: 99%