2019
DOI: 10.48550/arxiv.1907.09238
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Crowdsourcing a Dataset of Audio Captions

Abstract: Audio captioning is a novel field of multi-modal translation and it is the task of creating a textual description of the content of an audio signal (e.g. "people talking in a big room"). The creation of a dataset for this task requires a considerable amount of work, rendering the crowdsourcing a very attractive option. In this paper we present a three steps based framework for crowdsourcing an audio captioning dataset, based on concepts and practises followed for the creation of widely used image captioning an… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 13 publications
0
2
0
Order By: Relevance
“…The audio captioning task is firstly introduced in [1], which proposed the commercial ProSound Effects [6] audio corpus as a proof of concept. The paper proposed a BiGRU [7] based encoder-decoder model to generate audio captions.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The audio captioning task is firstly introduced in [1], which proposed the commercial ProSound Effects [6] audio corpus as a proof of concept. The paper proposed a BiGRU [7] based encoder-decoder model to generate audio captions.…”
Section: Related Workmentioning
confidence: 99%
“…Even for people, precisely distinguishing events in audio can be difficult, let alone effectively describing the contents of given audio, because the description is often dependent on the situation or context as much as the audio itself. Therefore, due to the ambiguity of audio, different persons may have varying perceptions of the same audio, which will result in the semantic disparity of audio captions [2], for example, a thin plastic rattling could be perceived as a fire crackling [6] (as shown in Fig. 1).…”
Section: Introductionmentioning
confidence: 99%