ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
DOI: 10.1109/icassp43922.2022.9746872
|View full text |Cite
|
Sign up to set email alerts
|

L3DAS22 Challenge: Learning 3D Audio Sources in a Real Office Environment

Abstract: The L3DAS22 Challenge is aimed at encouraging the development of machine learning strategies for 3D speech enhancement and 3D sound localization and detection in office-like environments. This challenge improves and extends the tasks of the L3DAS21 edition 1 . We generated a new dataset, which maintains the same general characteristics of L3DAS21 datasets, but with an extended number of data points and adding constrains that improve the baseline model's efficiency and overcome the major difficulties encountere… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 33 publications
(6 citation statements)
references
References 26 publications
0
5
0
Order By: Relevance
“…We demonstrate the improved performance, low data bias, and environment bias of the proposed model through various simulated datasets. Performance comparison with state-ofthe-art models on three datasets (spatialized WSJCAM0 [8], spatialized DNS challenge [5], and L3DAS22 [52]) confirms that our proposed model has lower computational complexity and higher performance enhancement. To further investigate the real-world applicability and scalability of our model, we conduct experiments on real noisy and reverberant speech recorded in an office environment.…”
Section: Introductionmentioning
confidence: 56%
See 1 more Smart Citation
“…We demonstrate the improved performance, low data bias, and environment bias of the proposed model through various simulated datasets. Performance comparison with state-ofthe-art models on three datasets (spatialized WSJCAM0 [8], spatialized DNS challenge [5], and L3DAS22 [52]) confirms that our proposed model has lower computational complexity and higher performance enhancement. To further investigate the real-world applicability and scalability of our model, we conduct experiments on real noisy and reverberant speech recorded in an office environment.…”
Section: Introductionmentioning
confidence: 56%
“…Since the original DNS challenge dataset contains single-channel data, we spatialized both speeches and noises using a similar procedure as in the spatialized WSJCAM0 dataset described in [5]. 3) L3DAS22 Challenge dataset: The last dataset used for evaluation is the L3DAS22 challenge [52] dataset proposed as part of the recent ICASSP 2022 challenges. This dataset includes speech recordings simulated in 3D office environments with varying speaker positions.…”
Section: A Datasetsmentioning
confidence: 99%
“…We believe this shortage of studies to be at least in part due to the lack of an architecture capable of incorporating the scene's metadata, which is addressed by our proposed DI-NN. We also refer to the recent L3DAS22 challenge [24], where practitioners were invited to develop 3D PSSL algorithms for a realistic office environment containing a pair of microphone arrays.…”
Section: Neural-based Methodsmentioning
confidence: 99%
“…For the second edition of this project, L3DAS22 [6], we maintained a similar setting to that proposed in L3DAS21 but with some substantial improvements. Firstly, we generated a new dataset containing an augmented number of datapoints, increasing the total length of the dataset from 65 to more than 94 hours.…”
Section: Introductionmentioning
confidence: 99%