Jorge Bennasar Vázquez scite author profile

Target sound extraction consists of extracting the sound of a target acoustic event (AE) class from a mixture of AE sounds. It can be realized using a neural network that extracts the target sound conditioned on a 1-hot vector that represents the desired AE class. With this approach, embedding vectors associated with the AE classes are directly optimized for the extraction of sound classes seen during training. However, it is not easy to extend this framework to new AE classes, i.e. unseen during training. Recently, speech, music, or AE sound extraction based on enrollment audio of the desired sound offers the potential of extracting any target sound in a mixture given only a short audio signal of a similar sound. In this work, we propose combining 1-hot-and enrollment-based target sound extraction, allowing optimal performance for seen AE classes and simple extension to new classes. In experiments with synthesized sound mixtures generated with the Freesound Dataset (FSD) datasets, we demonstrate the benefit of the combined framework for both seen and new AE classes. Besides, we also propose adapting the embedding vectors obtained from a few enrollment audio samples (few-shot) to further improve performance on new classes.

show abstract

SoundBeam: Target Sound Extraction Conditioned on Sound-Class Labels and Enrollment Clues for Increased Performance and Continuous Learning

Delcroix

Vázquez

Ochiai

et al. 2023

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

SoundBeam: Target sound extraction conditioned on sound-class labels and enrollment clues for increased performance and continuous learning

Delcroix¹,

Vázquez²,

Ochiai³

et al. 2022

Preprint

View full text Add to dashboard Cite

In many situations, we would like to hear desired sound events (SEs) while being able to ignore interference. Target sound extraction (TSE) aims at tackling this problem by estimating the sound of target SE classes in a mixture while suppressing all other sounds. We can achieve this with a neural network that extracts the target SEs by conditioning it on clues representing the target SE classes. Two types of clues have been proposed, i.e., target SE class labels and enrollment sound samples similar to the target sound. Systems based on SE class labels can directly optimize embedding vectors representing the SE classes, resulting in high extraction performance. However, extending these systems to the extraction of new SE classes not encountered during training is not easy. Enrollment-based approaches extract SEs by finding sounds in the mixtures that share similar characteristics to the enrollment. These approaches do not explicitly rely on SE class definitions and can thus handle new SE classes. In this paper, we introduce a TSE framework, SoundBeam, that combines the advantages of both approaches. We also perform an extensive evaluation of the different TSE schemes using synthesized and real mixtures, which shows the potential of SoundBeam.

show abstract

Spectral information in noise-mapping: An exploratory study

Pasch

Mosconi

Yanitelli

et al. 2002

View full text Add to dashboard Cite

International standards such as ISO 1996 and ISO 717 as well as noise regulations in several countries are increasingly relying on spectral information in order to assess the acoustical behavior of materials and structures and the effects of noise on people. Nevertheless, the new European Union Directive on the assessment and management of environmental noise reinforces the A-weighted equivalent level (with appropriate night and evening corrections) as the preferred indicator for noise mapping. Considering that noise maps are a powerful zoning and planning resource, the idea of reporting the mean spectrum of noise at each selected location at different times is proposed and thoroughly justified. Arguments in favor of its feasibility are given, showing that, in spite of the widespread opinion, costs and required time may be reduced considerably by the use of low-priced, new-technology auxiliary equipment. Then an exploratory study is reported, in which (a) the spectrum of traffic noise in Rosario (Argentina) is compared with the internationally standardized traffic noise spectrum, and (b) the noise spectrum at an open street is compared with the noise spectrum at a street with a U-profile owing to the same vehicles.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.