Degramnet: effective audio analysis based on a fully learnable time–frequency representation

Foggia, Pasquale; Greco, Antonio; Roberto, Antonio; Saggese, Alessia; Vento, Mario

doi:10.1007/s00521-023-08849-7

Neural Comput & Applic

2023

DOI: 10.1007/s00521-023-08849-7

|View full text |Cite

Degramnet: effective audio analysis based on a fully learnable time–frequency representation

Pasquale Foggia

Antonio Greco

Antonio Roberto

et al.

Abstract: Current state-of-the-art audio analysis algorithms based on deep learning rely on hand-crafted Spectrogram-like audio representations, that are more compact than descriptors obtained from the raw waveform; the latter are, in turn, far from achieving good generalization capabilities when few data are available for the training. However, Spectrogram-like representations have two main limitations: (1) The parameters of the filters are defined a priori, regardless of the specific audio analysis task; (2) such repr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2024

Publication Types

Select...

Article1

Relationship

Self Cite0

Independent1

Authors

Journals

Cited by 1 publication

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Identity, Gender, Age, and Emotion Recognition from Speaker Voice with Multi-task Deep Networks for Cognitive Robotics

Foggia,

Greco,

Roberto

et al. 2024

Cogn Comput

View full text Add to dashboard Cite

This paper presents a study on the use of multi-task neural networks (MTNs) for voice-based soft biometrics recognition, e.g., gender, age, and emotion, in social robots. MTNs enable efficient analysis of audio signals for various tasks on low-power embedded devices, thus eliminating the need for cloud-based solutions that introduce network latency. However, the strict dataset requirements for training limit the potential of MTNs, which are commonly used to optimize a single reference problem. In this paper, we propose three MTN architectures with varying accuracy-complexity trade-offs for voice-based soft biometrics recognition. In addition, we adopt a learnable voice representation, that allows to adapt the specific cognitive robotics application to the environmental conditions. We evaluate the performance of these models on standard large-scale benchmarks, and our results show that the proposed architectures outperform baseline models for most individual tasks. Furthermore, one of our proposed models achieves state-of-the-art performance on three out of four of the considered benchmarks. The experimental results demonstrate that the proposed MTNs have the potential for being part of effective and efficient voice-based soft biometrics recognition in social robots.

show abstract

Identity, Gender, Age, and Emotion Recognition from Speaker Voice with Multi-task Deep Networks for Cognitive Robotics

Foggia,

Greco,

Roberto

et al. 2024

Cogn Comput

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Degramnet: effective audio analysis based on a fully learnable time–frequency representation

Cited by 1 publication

References 30 publications

Identity, Gender, Age, and Emotion Recognition from Speaker Voice with Multi-task Deep Networks for Cognitive Robotics

Identity, Gender, Age, and Emotion Recognition from Speaker Voice with Multi-task Deep Networks for Cognitive Robotics

Contact Info

Product

Resources

About