1995
DOI: 10.1007/978-3-642-79980-8_7
|View full text |Cite
|
Sign up to set email alerts
|

Adaptiver stochastischer Sprache/Pause-Detektor

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2001
2001
2001
2001

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 1 publication
0
4
0
Order By: Relevance
“…pants for completeness. 1 The main result to note from Table 1 is that overall word error rates are not dramatically worse than for Switchboard-style data. This is particularly impressive since, as described earlier, no meeting data were used in training, and no modifications of the acoustic or language models were made.…”
Section: Recognition Results and Discussionmentioning
confidence: 91%
See 1 more Smart Citation
“…pants for completeness. 1 The main result to note from Table 1 is that overall word error rates are not dramatically worse than for Switchboard-style data. This is particularly impressive since, as described earlier, no meeting data were used in training, and no modifications of the acoustic or language models were made.…”
Section: Recognition Results and Discussionmentioning
confidence: 91%
“…Segment boundary times were determined either by an automatic segmentation of the mixed signal followed by hand-correction, or by hand-correction alone. For the automatic case, the data was segmented with a speech/nonspeech detector consisting of an extension of an approach using an ergodic hidden Markov model (HMM) [1]. In this approach, the HMM consists of two main states, one representing "speech" and one representing "nonspeech" and a number of intermediate states that are used to model the time constraints of the transitions between the two main states.…”
Section: Speech Segmentationmentioning
confidence: 99%
“…The S/NS detection module is based on a hidden Markov model (HMM) S/NS detector designed for automatic speech recognition on close-talking microphone data of a single speaker [2]. The baseline detector is similar to the one used in [3], and consists of an ergodic HMM with two main states -"speech" and "nonspeech" -and a number of intermediate state pairs to impose time constraints on transitions between the two main states.…”
Section: Baseline Architecturementioning
confidence: 99%
“…The peak normalized short-time crosscorrelation, (2) between the active channels i and j are used to estimate the similarity between the two signals. For "real" overlaps (two speakers speaking at the same time) the crosscorrelation is expected to be lower than for "false" overlaps (one speaker coupled into both microphones).…”
Section: Crosscorrelation Analysismentioning
confidence: 99%