Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96
DOI: 10.1109/icslp.1996.607750
|View full text |Cite
|
Sign up to set email alerts
|

On the robust automatic segmentation of spontaneous speech

Abstract: The results from applying an improved algorithm in the task of automatic segmentation of spontaneoustelephone quality speech are presented, and compared to the results from those resulting from superimposing white noise. Three segmentation algorithms are compared which are all based on variants of the Spectral Variation Function. Experimental results are obtained on the OGI multi-language telephone speech corpus (OGI TS). We show that the use of the auditory forward and backward masking effects prior to the SV… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0
2

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(9 citation statements)
references
References 8 publications
0
7
0
2
Order By: Relevance
“…A detected boundary is labelled correct if it is placed within a range of ± 20 ms around a manual boundary. The objective measures for evaluation are [38, 39]: Over‐segmentation coefficient: C OVER = N DET / N MAN − 1. False alarm rate: FAR = N FA /( N SUBS − N MAN ). Missed detection rate: MS = N MISS / N MAN . Correct detection rate: PC = N CORR / N MAN . In which N SUBS , N DET , and N MAN stand for the number of sub‐segments, detected and manual boundaries, respectively. Moreover, the number of missed and correctly detected boundaries are quantified through N MISS and N CORR .…”
Section: Segmentation Procedures and Simulation Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…A detected boundary is labelled correct if it is placed within a range of ± 20 ms around a manual boundary. The objective measures for evaluation are [38, 39]: Over‐segmentation coefficient: C OVER = N DET / N MAN − 1. False alarm rate: FAR = N FA /( N SUBS − N MAN ). Missed detection rate: MS = N MISS / N MAN . Correct detection rate: PC = N CORR / N MAN . In which N SUBS , N DET , and N MAN stand for the number of sub‐segments, detected and manual boundaries, respectively. Moreover, the number of missed and correctly detected boundaries are quantified through N MISS and N CORR .…”
Section: Segmentation Procedures and Simulation Resultsmentioning
confidence: 99%
“…A detected boundary is labelled correct if it is placed within a range of ± 20 ms around a manual boundary. The objective measures for evaluation are [38,39] In which N SUBS , N DET , and N MAN stand for the number of sub-segments, detected and manual boundaries, respectively. Moreover, the number of missed and correctly detected boundaries are quantified through N MISS and N CORR .A detected boundary, which is not assigned to a manual boundary is considered as a false alarm with false alarm rate N FA .…”
Section: Objective Evaluation On Timit Databasementioning
confidence: 99%
“…HR is inversely proportional to the miss (or error) rate, which is also sometimes used to indicate segmentation accuracy. Another central measure, especially in the case of blind methods, is the over-segmentation (OS) rate ( 7), which can be obtained if the total number of algorithmically produced boundaries N f is included in the analysis (Petek et al, 1996). Different authors have used varying symbols for the above measures, originating from, e.g., signal detection theory.…”
Section: Evaluation Measuresmentioning
confidence: 99%
“…Some of them include detection of variations/similarities in spectral (Svendsen and Soong 1987;Dalsgaard et al 1991;van Hemert 1991;Fig. 1 Block diagram of the HMM-based phonetic segmentation method for the explicit case Grayden and Scordilis 1994;Petek et al 1996;Aversano et al 2001) or prosodic (Adami and Hermansky 2003) parameters of speech, template matching using dynamic programming and/or synthetic speech (Bajwa et al 1996;Paulo and Oliveira 2003;Malfrere et al 2003) and discriminative learning segmentation (Keshet et al 2007).…”
Section: Introductionmentioning
confidence: 99%