2022
DOI: 10.48550/arxiv.2212.04356
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Robust Speech Recognition via Large-Scale Weak Supervision

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
172
1
5

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
4
2

Relationship

0
10

Authors

Journals

citations
Cited by 162 publications
(179 citation statements)
references
References 0 publications
1
172
1
5
Order By: Relevance
“…When the recordings contained speech from the investigator at the beginning and end, we trimmed the recordings. All recordings from the SS task were automatically transcribed using OpenAI's Whisper (40), that is an Automatic Speech Recognition (ASR) system trained for 680,000 h of multilingual and multitask supervised data collected from the web.…”
Section: Data Pre-processingmentioning
confidence: 99%
“…When the recordings contained speech from the investigator at the beginning and end, we trimmed the recordings. All recordings from the SS task were automatically transcribed using OpenAI's Whisper (40), that is an Automatic Speech Recognition (ASR) system trained for 680,000 h of multilingual and multitask supervised data collected from the web.…”
Section: Data Pre-processingmentioning
confidence: 99%
“…We hereby introduce the multilingual multitask acoustic model Whisper [44] and the SSL XLSR-53 [45] model as the pre-trained encoders to explore if cross-lingual speech representations can advance the Dutch dysarthric speech SLU. Moreover, the investigation of the crosslingual implementations in which language properties are not explicitly exploited suggests our results hold for dysarthric SLU in other languages than Dutch, though we will not formally evaluate this.…”
Section: Contributionsmentioning
confidence: 99%
“…Whisper is a recent voice recognition model that trained for 68,000 h on various linguistic and multipurpose supervised data acquired from the internet. Whisper achieved high accuracy in speech recognition, although all size versions that were released were not fit for MCUs [25]. Large Language Models (LLMs), which were originally developed in the context of Natural Language Processing and Understanding, have recently become very popular as foundation models for various downstream tasks [26].…”
Section: Utilizing Audio Signals For Classification Problemsmentioning
confidence: 99%