2014
DOI: 10.1007/978-3-319-10816-2_51
|View full text |Cite
|
Sign up to set email alerts
|

Development of a Large Spontaneous Speech Database of Agglutinative Hungarian Language

Abstract: In this paper, a large Hungarian spoken language database is introduced. This phonetically-based multi-purpose database contains various types of spontaneous and read speech from 333 monolingual speakers (about 50 minutes of speech sample per speaker). This study presents the background and motivation of the development of the BEA Hungarian database, describes its protocol and the transcription procedure, and also presents existing and proposed research using this database. Due to its recording protocol and th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
18
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
3
3
2

Relationship

2
6

Authors

Journals

citations
Cited by 26 publications
(18 citation statements)
references
References 14 publications
0
18
0
Order By: Relevance
“…10 conversations and 10 narratives were selected for the study from a Hungarian Database called BEA (Neuberger et al, 2014). Three speakers participated in each conversation; the interviewer (Int) and one speaker (henceforth: the second speaker S2) were constant.…”
Section: Methodsmentioning
confidence: 99%
“…10 conversations and 10 narratives were selected for the study from a Hungarian Database called BEA (Neuberger et al, 2014). Three speakers participated in each conversation; the interviewer (Int) and one speaker (henceforth: the second speaker S2) were constant.…”
Section: Methodsmentioning
confidence: 99%
“…For English, we used the TEDLium dataset (Rousseau et al, 2012); we made use of the utterances of 100 speakers (approximately 15 hours of recordings). For Hungarian, we chose the BEA Database (Neuberger et al, 2014); we trained our DNNs on the speech of 116 subjects (44 hours of recordings overall). We made sure that the annotation suited our needs for both corpora, i.e.…”
Section: Asr Parametersmentioning
confidence: 99%
“…For this, we collected the spontaneous speech of English-speaking and Hungarian-speaking MCI patients and healthy controls. Then we trained two ASR models for the automatic speech analysis step: for English, we used a subset of the TEDLium corpus (Rousseau et al, 2012), while for Hungarian we used a subset of the BEA Hungarian Spoken Language Database (Neuberger et al, 2014). We carried out classification experiments to determine the indicativeness of the different attributes.…”
Section: Introductionmentioning
confidence: 99%
“…The x-vectors scores are given in accord with the corpus used to train the DNN they were extracted with. .256 .301 SWBD + SRE (pre-trained model, [12]) .300 .355 utterances) of the BEA Corpus, which contains Hungarian spontaneous speech (for more details, see [19]). This corpus has a relevant size (in comparison with the SLEEP Corpus), which is convenient when training DNNs.…”
Section: Dnn Training Datamentioning
confidence: 99%