22nd International Conference on Human-Computer Interaction With Mobile Devices and Services 2020
DOI: 10.1145/3379503.3403535
|View full text |Cite
|
Sign up to set email alerts
|

Augmenting Conversational Agents with Ambient Acoustic Contexts

Abstract: Conversational agents are rich in content today. However, they are entirely oblivious to users' situational context, limiting their ability to adapt their response and interaction style. To this end, we explore the design space for a context augmented conversational agent, including analysis of input segment dynamics and computational alternatives. Building on these, we propose a solution that redesigns the input segment intelligently for ambient context recognition, achieved in a two-step inference pipeline. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 8 publications
(12 citation statements)
references
References 30 publications
0
12
0
Order By: Relevance
“…For all datasets, we use the suggested train/test split for comparability purposes. For ambient sound classiication, we use the Ambient Acoustic Contexts dataset [33], in which sounds from ten distinct events are present. For the keyword spotting task, we use the second version of the Speech Commands dataset [41], where the objective is to detect when a particular keyword is spoken out of a set of twelve target classes.…”
Section: Datasets and Audio Pre-processingmentioning
confidence: 99%
See 2 more Smart Citations
“…For all datasets, we use the suggested train/test split for comparability purposes. For ambient sound classiication, we use the Ambient Acoustic Contexts dataset [33], in which sounds from ten distinct events are present. For the keyword spotting task, we use the second version of the Speech Commands dataset [41], where the objective is to detect when a particular keyword is spoken out of a set of twelve target classes.…”
Section: Datasets and Audio Pre-processingmentioning
confidence: 99%
“…Ambient Context [33] Event classiication 10 Speech Commands [41] Keyword spotting 12 VoxForge [29] Language identiication 6…”
Section: Datasetmentioning
confidence: 99%
See 1 more Smart Citation
“…Note that modelling all these context primitives, especially around the head, is not possible with other wearables. The acoustic channel of an earable enables us to understand environment ambience and audio events [19,23], thereby modelling a patient's proxemic, social context as well patient's interaction with physical objects [10]. Combining these primitives and their thoughtful synthesis is key in modelling patient activities and creating digital memories through encoding for future recall.…”
Section: Earable As Memory Aidmentioning
confidence: 99%
“…• We introduce adaptive confidence thresholding for generating pseudo-labels, which effectively reduces the number of inaccurate predicted annotations. • We demonstrate through extensive evaluation that our technique is able to effectively learn generalizable audio models under a variety of federated settings and label availability on diverse public datasets, namely Speech Commands [40], Ambient Context [32] and Vox-Forge [28]. • We exploit self-supervised models pre-trained on FSD-50K corpus [6] for significantly improving training convergence in federated settings.…”
Section: Introductionmentioning
confidence: 99%