2021
DOI: 10.48550/arxiv.2101.03769
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Review of Evaluation Practices of Gesture Generation in Embodied Conversational Agents

Pieter Wolfert,
Nicole Robinson,
Tony Belpaeme

Abstract: Embodied Conversational Agents (ECA) take on different forms, including virtual avatars or physical agents, such as a humanoid robot. ECAs are often designed to produce nonverbal behaviour to complement or enhance its verbal communication. One form of nonverbal behaviour is co-speech gesturing, which involves movements that the agent makes with its arms and hands that is paired with verbal communication. Co-speech gestures for ECAs can be created using different generation methods, such as rule-based and data-… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3
1

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(8 citation statements)
references
References 52 publications
0
8
0
Order By: Relevance
“…Human gesture perception is highly subjective, and there are currently no widely accepted objective measures of gesture perception, so most publications have conducted human assessments instead. However, previous subjective evaluations, as reviewed in [59], have several drawbacks, with major ones being the coverage of systems being compared and the scale of the studies. Like in [2,30,31,45], proposed models are at most compared to one or two prior approaches (often a highly similar baseline) or possibly only to ablated versions of the same model.…”
Section: Related Workmentioning
confidence: 99%
“…Human gesture perception is highly subjective, and there are currently no widely accepted objective measures of gesture perception, so most publications have conducted human assessments instead. However, previous subjective evaluations, as reviewed in [59], have several drawbacks, with major ones being the coverage of systems being compared and the scale of the studies. Like in [2,30,31,45], proposed models are at most compared to one or two prior approaches (often a highly similar baseline) or possibly only to ablated versions of the same model.…”
Section: Related Workmentioning
confidence: 99%
“…[6,12,21]) and pairwise preference tests (cf. [2,13]) are commonplace; see [20] for a comprehensive review. In MOS tests, participants rate individual stimuli (in our case, videos) on a discrete scale, e.g., 1 through 5.…”
Section: Related Workmentioning
confidence: 99%
“…After that, the evaluation of the proposed work has been performed relying on subjective evaluation metrics: in particular, a questionnaire based on [45] and [46] has been developed to assess how the cultural background of the user may impact on the way in which the robot's gestures are perceived (Sect. 4.2).…”
Section: Evaluation and Discussionmentioning
confidence: 99%
“…It must be here pointed out that one of the main problems emerged in similar research works is the difficulty of finding evaluation metrics, i.e. quantitative measurements to assess the naturalness or dynamism of the generated gestures [45]. However, the average jerk of keypoints is usually considered in Literature as a reliable indicator for evaluating autonomous generated gestures [16,45,47].…”
Section: Ablation Studymentioning
confidence: 99%