2013
DOI: 10.1007/978-3-319-01931-4_42
|View full text |Cite
|
Sign up to set email alerts
|

Speech and Language Resources within Speech Recognition and Synthesis Systems for Serbian and Kindred South Slavic Languages

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
9
0

Year Published

2014
2014
2017
2017

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 9 publications
(9 citation statements)
references
References 3 publications
0
9
0
Order By: Relevance
“…Speech technologies for Serbian have been developed over the past decade and a half at the Faculty of Technical Sciences of the University of Novi Sad, in cooperation with the company “AlfaNum” from Novi Sad. During this period, a respectable amount of data for training ASR and TTS systems has been acquired [ 25 ], and both technologies are still being constantly improved by introducing new techniques and gathering new resources. The quality of ASR and TTS that is sufficient for most practical applications has been reached several years ago and, even though further research and development are needed in order to follow the state of the art, the research is now mainly focused on creating natural-like dialogue systems.…”
Section: Speech Technologies For Serbianmentioning
confidence: 99%
See 1 more Smart Citation
“…Speech technologies for Serbian have been developed over the past decade and a half at the Faculty of Technical Sciences of the University of Novi Sad, in cooperation with the company “AlfaNum” from Novi Sad. During this period, a respectable amount of data for training ASR and TTS systems has been acquired [ 25 ], and both technologies are still being constantly improved by introducing new techniques and gathering new resources. The quality of ASR and TTS that is sufficient for most practical applications has been reached several years ago and, even though further research and development are needed in order to follow the state of the art, the research is now mainly focused on creating natural-like dialogue systems.…”
Section: Speech Technologies For Serbianmentioning
confidence: 99%
“…The feature vector, which defines the acoustic models of triphones, consists of 15 mel-frequency cepstral coefficients (MFCCs), normalized energy, and their derivatives. Feature vectors are extracted every 10 ms from 30 ms speech segments centered around extraction time instants [ 25 ].…”
Section: Speech Technologies For Serbianmentioning
confidence: 99%
“…The feature vectors are extracted from 30 ms speech segments, every 10 ms. The training set for the acoustic model contains recordings of both scripted and spontaneous utterances produced by several dozen speakers, with a total duration of about 200 hours [23].…”
Section: Speech Recognitionmentioning
confidence: 99%
“…Our language model is a combination of 3 N-gram models. The first model is based on tokens (surface forms), the second on lemmata, and the third on classes [23]. The size of vocabulary causes data sparsity problems, resulting in the need for significantly greater language corpora, sufficient for obtaining a robust language model.…”
Section: Speech Recognitionmentioning
confidence: 99%
See 1 more Smart Citation