2020
DOI: 10.3389/frai.2020.00010
|View full text |Cite
|
Sign up to set email alerts
|

Clearing the Transcription Hurdle in Dialect Corpus Building: The Corpus of Southern Dutch Dialects as Case Study

Abstract: This paper discusses how the transcription hurdle in dialect corpus building can be cleared. While corpus analysis has strongly gained in popularity in linguistic research, dialect corpora are still relatively scarce. This scarcity can be attributed to several factors, one of which is the challenging nature of transcribing dialects, given a lack of both orthographic norms for many dialects and speech technological tools trained on dialect data. This paper addresses the questions (i) how dialects can be transcr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
5
0
3

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(8 citation statements)
references
References 42 publications
0
5
0
3
Order By: Relevance
“…Tatman (2017) documents lower ASR performance for women, as well as speakers of non-American dialects like New Zealand English. Ghyselen et al (2020) found that commercial ASR had a 7% word error rate for Standard Dutch, versus 66% for the southern West Flemish dialect. Koenecke et al ( 2020) compared industrial ASR systems and found a 35% word error rate for African American English compared to 19% for white speakers of American English.…”
Section: Automatic Speech Recognition For Sociophoneticsmentioning
confidence: 98%
See 1 more Smart Citation
“…Tatman (2017) documents lower ASR performance for women, as well as speakers of non-American dialects like New Zealand English. Ghyselen et al (2020) found that commercial ASR had a 7% word error rate for Standard Dutch, versus 66% for the southern West Flemish dialect. Koenecke et al ( 2020) compared industrial ASR systems and found a 35% word error rate for African American English compared to 19% for white speakers of American English.…”
Section: Automatic Speech Recognition For Sociophoneticsmentioning
confidence: 98%
“…Ghyselen et al. (2020) found that commercial ASR had a 7% word error rate for Standard Dutch, versus 66% for the southern West Flemish dialect. Koenecke et al.…”
Section: Automatic Speech Recognition For Sociophoneticsmentioning
confidence: 99%
“…Due to ethical concerns, it is not always feasible for researchers to use the latter option. Even though some transcription applications achieve the desired level of accuracy in transcribing standard major languages such as English, their capacity in transcribing dialects of different languages varies significantly (e.g., Ghyselen et al, 2020;MacKenzie & Turton, 2020). Indeed, it is not hard to recognize the uneven progress that has been made toward embracing linguistic diversity.…”
Section: Folding and Unfolding Across Online Language Organizationmentioning
confidence: 99%
“…Usually, these interviews are accompanied by field notes, interview questions, questionnaires, reports, and other materials. It is desirable to maintain them in a format where in a given point of an audio or video file the correlated notes, materials, and metadata are made available to the researcher [39]. The sheer size of the archive makes modern computer-based search and retrieval applications and interactive interfaces designed for multimedia retrieval very relevant.…”
Section: Introductionmentioning
confidence: 99%