Automatic Close Captioning for Live Hungarian Television Broadcast Speech: A Fast and Resource-Efficient Approach

Varga, Ádám; Tarján, Balázs; Tobler, Zoltán; Szaszák, György; Fegyó, Tibor; Bordás, Csaba; Mihajlik, Péter

doi:10.1007/978-3-319-23132-7_13

Cited by 12 publications

(8 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This dataset contains various genres (weather forecasts, news, conversations, magazines, sport). The dataset used for pre-training the character-and word-level models is a subset with manual transcription including punctuation containing 12M, 3M and 136k words for the train, validation and test sets, respectively [20]. The punctuation marks addressed in the experiments include commas, periods, question marks and exclamation marks.…”

Section: Datamentioning

confidence: 99%

Leveraging a Character, Word and Prosody Triplet for an ASR Error Robust and Agglutination Friendly Punctuation Approach

Szaszák

Tündik

2019

Interspeech 2019

Self Cite

View full text Add to dashboard Cite

Punctuating ASR transcript has received increasing attention recently, and well-performing approaches were presented based on sequence-to-sequence modelling, exploiting textual (word and character) and/or acoustic-prosodic features. In this work we propose to consider character, word and prosody based features all at once to provide a robust and highly language independent platform for punctuation recovery, which can deal also well with highly agglutinating languages with less constrained word order. We demonstrate that using such a feature triplet improves ASR error robustness of punctuation in two quite differently organized languages, English and Hungarian. Moreover, in the highly agglutinating Hungarian, where word-based approaches suffer from the exploding vocabulary (poorer semantic representation through embeddings) and less constrained word order, we show that prosodic cues and the character-based model can powerfully counteract this loss of information. We also perform a deep analysis of punctuation w.r.t. both ASR errors and agglutination to explain the improvements we observed on a solid basis.

show abstract

Section: Datamentioning

confidence: 99%

Leveraging a Character, Word and Prosody Triplet for an ASR Error Robust and Agglutination Friendly Punctuation Approach

Szaszák

Tündik

2019

Interspeech 2019

Self Cite

View full text Add to dashboard Cite

show abstract

“…We have overall 500 sentences and 8k word tokens in total. We use the Kaldi version of the ASR in [17] (with Kaldi decoder) by 6.8%, 10.1%, and 21.4% Word Error Rates (WER) on weather forecasts, broadcast news and sport news, respectively. For AP (automatic punctuation) we use the model from [10] and obtain F1-measures in the range of 60-70% on MT (manual transcript) and 45-50% on AT (ASR transcript).…”

Section: Datasetsmentioning

confidence: 99%

Assessing the Semantic Space Bias Caused by ASR Error Propagation and its Effect on Spoken Document Summarization

2019

Self Cite

View full text Add to dashboard Cite

Ambitions in artificial intelligence involve machine understanding of human language. The state-of-the-art approach for Spoken Language Understanding is using an Automatic Speech Recognizer (ASR) to generate transcripts, which are further processed with text-based tools. ASR yields error prone transcripts, these errors then propagate further into the processing pipeline. Subjective tests show on the other hand, that humans understand quite well ASR closed captions despite the word and punctuation errors. Our goal is to assess and quantify the loss in the semantic space resulting from error propagation and also analyze error propagation into speech summarization as a special use-case. We show, that word errors cause a slight shift in the semantic space, which is fairly below the average semantic distance between the sentences within a document. We also show, that punctuation errors have higher impact on summarization performance, which suggests that proper sentence level tokenization is crucial for this task.

show abstract

“…In left-marked style (+m), a subword is prefixed with a character to indicate that there was no word boundary directly preceding the subword. This style has been used for Turkish [5] and Hungarian [6,7]. In [7], it was shown to outperform word boundary tags.…”

Section: Boundary Markersmentioning

confidence: 99%

“…This style has been used for Turkish [5] and Hungarian [6,7]. In [7], it was shown to outperform word boundary tags. In right-marked style (m+), a suffix marker is added to a subword if there is no word boundary after it.…”

Section: Boundary Markersmentioning

confidence: 99%

Improved Subword Modeling for WFST-Based Speech Recognition

Smit¹,

Virpioja²,

Kurimo³

2017

Interspeech 2017

View full text Add to dashboard Cite

Because in agglutinative languages the number of observed word forms is very high, subword units are often utilized in speech recognition. However, the proper use of subword units requires careful consideration of details such as silence modeling, position-dependent phones, and combination of the units. In this paper, we implement subword modeling in the Kaldi toolkit by creating modified lexicon by finite-state transducers to represent the subword units correctly. We experiment with multiple types of word boundary markers and achieve the best results by adding a marker to the left or right side of a subword unit whenever it is not preceded or followed by a word boundary, respectively. We also compare three different toolkits that provide data-driven subword segmentations. In our experiments on a variety of Finnish and Estonian datasets, the best subword models do outperform word-based models and naive subword implementations. The largest relative reduction in WER is a 23% over word-based models for a Finnish read speech dataset. The results are also better than any previously published ones for the same datasets, and the improvement on all datasets is more than 5%.

show abstract

Automatic Close Captioning for Live Hungarian Television Broadcast Speech: A Fast and Resource-Efficient Approach

Cited by 12 publications

References 5 publications

Leveraging a Character, Word and Prosody Triplet for an ASR Error Robust and Agglutination Friendly Punctuation Approach

Leveraging a Character, Word and Prosody Triplet for an ASR Error Robust and Agglutination Friendly Punctuation Approach

Assessing the Semantic Space Bias Caused by ASR Error Propagation and its Effect on Spoken Document Summarization

Improved Subword Modeling for WFST-Based Speech Recognition

Contact Info

Product

Resources

About