Pruning Deep Neural Network Models of Guitar Distortion Effects

David, Südholt,; Wright, Alec; Erkut, Cumhur; Välimäki, Vesa

doi:10.1109/taslp.2022.3223257

IEEE/ACM Trans. Audio Speech Lang. Process.

2023

DOI: 10.1109/taslp.2022.3223257

|View full text |Cite

Pruning Deep Neural Network Models of Guitar Distortion Effects

Südholt, David

Alec Wright

Cumhur Erkut

et al.

Abstract: Deep neural networks have been successfully used in the task of black-box modeling of analog audio effects such as distortion. Improving the processing speed and memory requirements of the inference step is desirable to allow such models to be used on a wide range of hardware and concurrently with other software. In this paper, we propose a new application of recent advancements in neural network pruning methods to recurrent black-box models of distortion effects using a Long Short-Term Memory architecture. We… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2024

Publication Types

Select...

Article2

Relationship

Self Cite0

Independent2

Authors

Journals

Cited by 2 publications

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Sampling the user controls in neural modeling of audio devices

Mikkonen,

Wright,

Välimäki

2024

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

This work studies neural modeling of nonlinear parametric audio circuits, focusing on how the diversity of settings of the target device user controls seen during training affects network generalization. To study the problem, a large corpus of training datasets is synthetically generated using SPICE simulations of two distinct devices, an analog equalizer and an analog distortion pedal. A proven recurrent neural network architecture is trained using each dataset. The difference in the datasets is in the sampling resolution of the device user controls and in their overall size. Based on objective and subjective evaluation of the trained models, a sampling resolution of five for the device parameters is found to be sufficient to capture the behavior of the target systems for the types of devices considered during the study. This result is desirable, since a dense sampling grid can be impractical to realize in the general case when no automated way of setting the device parameters is available, while collecting large amounts of data using a sparse grid only incurs small additional costs. Thus, the result provides guidance for efficient collection of training data for neural modeling of other similar audio devices.

show abstract

Sampling the user controls in neural modeling of audio devices

Mikkonen,

Wright,

Välimäki

2024

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

show abstract

Reimagining speech: a scoping review of deep learning-based methods for non-parallel voice conversion

Bargum,

Serafin,

Erkut

2024

Front. Signal Process.

View full text Add to dashboard Cite

Research on deep learning-powered voice conversion (VC) in speech-to-speech scenarios are gaining increasing popularity. Although many of the works in the field of voice conversion share a common global pipeline, there is considerable diversity in the underlying structures, methods, and neural sub-blocks used across research efforts. Thus, obtaining a comprehensive understanding of the reasons behind the choice of the different methods included when training voice conversion models can be challenging, and the actual hurdles in the proposed solutions are often unclear. To shed light on these aspects, this paper presents a scoping review that explores the use of deep learning in speech analysis, synthesis, and disentangled speech representation learning within modern voice conversion systems. We screened 628 publications from more than 38 venues between 2017 and 2023, followed by an in-depth review of a final database of 130 eligible studies. Based on the review, we summarise the most frequently used approaches to voice conversion based on deep learning and highlight common pitfalls. We condense the knowledge gathered to identify main challenges, supply solutions grounded in the analysis and provide recommendations for future research directions.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Pruning Deep Neural Network Models of Guitar Distortion Effects

Cited by 2 publications

References 24 publications

Sampling the user controls in neural modeling of audio devices

Sampling the user controls in neural modeling of audio devices

Reimagining speech: a scoping review of deep learning-based methods for non-parallel voice conversion

Contact Info

Product

Resources

About