2022
DOI: 10.48550/arxiv.2204.05419
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Wav2vec2-Based Experimental Study on Self-Supervised Learning Methods to Improve Child Speech Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
3
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(3 citation statements)
references
References 0 publications
0
3
0
Order By: Relevance
“…All the audio data was converted to a 16-bit mono channel with a 16Khz sampling rate and saved as '.wav' audio files, while the transcriptions were saved as '.txt' files. Child data-specific cleaning methodology was kept consistent with [34]. Given the low resource nature of non-native child speech datasets, we opted to split the available data into 80% for testing and 20% for training.…”
Section: A Dataset Cleaning and Descriptionmentioning
confidence: 99%
See 2 more Smart Citations
“…All the audio data was converted to a 16-bit mono channel with a 16Khz sampling rate and saved as '.wav' audio files, while the transcriptions were saved as '.txt' files. Child data-specific cleaning methodology was kept consistent with [34]. Given the low resource nature of non-native child speech datasets, we opted to split the available data into 80% for testing and 20% for training.…”
Section: A Dataset Cleaning and Descriptionmentioning
confidence: 99%
“…2) My Science Tutor (MyST) Corpus [29] is an American English child speech dataset containing over 393 hours of audio data out of which 197 hours are fully transcribed. We use the cleaned version of this dataset (as described in [34]), with 65 hours of speech divided into two subsets: 55 hours for training, called 'MyST_train' and 10 hours for testing, called 'MyST_test'.…”
Section: A Dataset Cleaning and Descriptionmentioning
confidence: 99%
See 1 more Smart Citation