2002
DOI: 10.1016/s0167-6393(01)00060-7
|View full text |Cite
|
Sign up to set email alerts
|

Automatic transcription of Broadcast News

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
22
0
3

Year Published

2008
2008
2022
2022

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 30 publications
(31 citation statements)
references
References 20 publications
0
22
0
3
Order By: Relevance
“…Speech segmentation algorithms can be typically categorised into decoder-guided, model-based, and metric-based approaches (Chen and Gopalakrishnam, 1998;Chen et al, 2002). In the decoderguided method, the speech changes are determined according to information provided by a speech recognition system, which decodes the spoken audio stream at first (Woodland et al, 1997).…”
Section: Phonemic Segmentationmentioning
confidence: 99%
See 1 more Smart Citation
“…Speech segmentation algorithms can be typically categorised into decoder-guided, model-based, and metric-based approaches (Chen and Gopalakrishnam, 1998;Chen et al, 2002). In the decoderguided method, the speech changes are determined according to information provided by a speech recognition system, which decodes the spoken audio stream at first (Woodland et al, 1997).…”
Section: Phonemic Segmentationmentioning
confidence: 99%
“…The BIC scheme presented here, while inherently threshold-free can also be viewed as a dynamic thresholding scheme on the log-likelihood distance. There is still a penalty factor λ that depends on the type of analysed data and must be estimated heuristically (Chen et al, 2002;Tritschler and Gopinath, 1999). This allows for reducing Type II errors without increasing the number of Type I…”
Section: Phonemic Segmentation Using the Bayesian Information Criterionmentioning
confidence: 99%
“…Most conventional methods [1]- [8] belong to a family of batch processing methods that require all utterances to be present before clustering can be executed, and thus, they do not meet the requirements for real-time applications. Furthermore, they often requires a large memory capacity since they have to store all speech utterances in memory.…”
Section: Introductionmentioning
confidence: 99%
“…Some speech segments, especially when the commentators are engaged in heated discussion, can be wrongly classified as crowd cheering. Also, for future indexing such as speech transcription, distinction between speech segments with different background audio environments is important in order to improve word error rates [5].…”
Section: Pattern Classesmentioning
confidence: 99%
“…MFCC's are well represented by multivariate Gaussian distributions and have been shown to be robust to noise. When applied in a statistical based framework, MFCC's are effective in discriminating between speech and other sound classes such as crowd cheering, music and speech [2,5]. Hence, our Feature set consisted of 14 uncorrelated MFCC coefficients and the Log Energy [10].…”
Section: Feature Setmentioning
confidence: 99%