2022
DOI: 10.48550/arxiv.2206.01948
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(8 citation statements)
references
References 0 publications
0
8
0
Order By: Relevance
“…We adopted the development sets of DCASE Task 3 from 2020 to 2022 [14][15][16] to compare the proposed method with other SELD approaches [6,11,12]. Each includes 14, 12, and 13 sound event classes respectively, which are loosely shared.…”
Section: Experimental Setupsmentioning
confidence: 99%
See 1 more Smart Citation
“…We adopted the development sets of DCASE Task 3 from 2020 to 2022 [14][15][16] to compare the proposed method with other SELD approaches [6,11,12]. Each includes 14, 12, and 13 sound event classes respectively, which are loosely shared.…”
Section: Experimental Setupsmentioning
confidence: 99%
“…We adapt the framework of "You Only Look Once" (YOLO) [13], renowned for multiple object detection from images, to the SELD by using the notion of angular distance, namely proposing angular-distance-based YOLO (AD-YOLO). The results of an experiment using the series of DCASE 2020-2022 Task 3 (SELD) datasets [14][15][16] demonstrated that AD-YOLO outperformed existing SELD formats in both overall evaluation and polyphonic circumstances.…”
Section: Introductionmentioning
confidence: 99%
“…DCASE 2019–2021 were conducted using the synthesized data. However, DCASE 2022 Task 3 differs from previous competitions in that it includes a relatively small amount of real spatial acoustic scene data and a relatively large amount of synthetic data generated using specific indoor impulse responses [ 17 ]. There is a difference with the dataset.…”
Section: Introductionmentioning
confidence: 99%
“…Especially, labeling DOAs cannot be achieved using audio input alone but requires additional inputs such as optical tracking data and 360°videos. A new real-recorded SELD dataset has been released for the DCASE2022 SELD Challenge [65]. However, the size of this dataset is still small, with a total recording length of around 7 hours.…”
Section: Challenges In Seldmentioning
confidence: 99%
“…More importantly, multichannel audio data are dependent on the array geometry and cannot easily be shared among different applications. Examples of some publicly available datasets for multichannel SED are TUT Sound Events 2016 [77], TAU-NIGENS Spatial Sound Events 2021 [48], and Sony-TAU Realistic Spatial Soundscapes 2022 [65]. One method to improve the multichannel SED performance is transfer learning from single-channel SED models [78].…”
Section: Network Architecture and Datasetsmentioning
confidence: 99%