Comparison of wav2vec 2.0 models on three speech processing tasks
Marie Kunešová,
Zbyněk Zajíc,
Luboš Šmídl
et al.
Abstract:The current state-of-the-art for various speech processing problems is a sequence-to-sequence model based on a self-attention mechanism known as transformer. The widely used wav2vec 2.0 is a self-supervised transformer model pre-trained on large amounts of unlabeled speech and then fine-tuned for a specific task. The data used for training and fine-tuning, along with the size of the transformer model, play a crucial role in both of these training steps. The most commonly used wav2vec 2.0 models are trained on … Show more
Set email alert for when this publication receives citations?
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.