2012
DOI: 10.1121/1.4763995
|View full text |Cite
|
Sign up to set email alerts
|

Automatic measurement of voice onset time using discriminative structured prediction

Abstract: A discriminative large-margin algorithm for automatic measurement of voice onset time (VOT) is described, considered as a case of predicting structured output from speech. Manually labeled data are used to train a function that takes as input a speech segment of an arbitrary length containing a voiceless stop, and outputs its VOT. The function is explicitly trained to minimize the difference between predicted and manually measured VOT; it operates on a set of acoustic feature functions designed based on spectr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
46
0

Year Published

2014
2014
2025
2025

Publication Types

Select...
4
3

Relationship

2
5

Authors

Journals

citations
Cited by 40 publications
(48 citation statements)
references
References 25 publications
2
46
0
Order By: Relevance
“…This shows that, using our method, swift correction can easily be transferred to a new annotator, without introducing bias into the analysis. The overall performance of Auto-VOT on the Glaswegian corpus was good and similar to the results presented in Sonderegger and Keshet (2012), who carried out an evaluation of the performance of the algorithm on four different datasets comparing it to that of human transcribers for the same data. The results for the two datasets closest to our sample are the Switchboard corpus of American speech and the Big Brother UK dataset of spontaneous British speech; for both corpora, VOT detection windows were placed manually, rather than using force-aligned segment boundaries.…”
Section: Discussionmentioning
confidence: 74%
See 3 more Smart Citations
“…This shows that, using our method, swift correction can easily be transferred to a new annotator, without introducing bias into the analysis. The overall performance of Auto-VOT on the Glaswegian corpus was good and similar to the results presented in Sonderegger and Keshet (2012), who carried out an evaluation of the performance of the algorithm on four different datasets comparing it to that of human transcribers for the same data. The results for the two datasets closest to our sample are the Switchboard corpus of American speech and the Big Brother UK dataset of spontaneous British speech; for both corpora, VOT detection windows were placed manually, rather than using force-aligned segment boundaries.…”
Section: Discussionmentioning
confidence: 74%
“…We developed a semi-automatic procedure to process large numbers of reliable VOT measures, using the Auto-VOT algorithm developed by Sonderegger and Keshet (2012). It was trained on an initial set of some 500 hand-measured tokens, applied to the force-aligned data, and then the algorithm's VOT predictions when applied to the full dataset were manually checked.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…Methods for the measurement of VOT fall into two categories: (a) those which explicitly identify the locations of the burst and voicing onsets through a set of customized acoustic-phonetic rules (knowledge-based), 4,6 and (b) those which train a learning machine (such as random forest, support vector machine) to estimate the VOT using some acoustic features corresponding to the stop-to-voiced-phone transition event. 8,9 Many of the high performing methods require phonetic transcription either to identify the segment of the speech signal containing the stop consonant through forced-alignment 4,9 or to focus the analysis on segments of the signal containing only one stop consonant. 8 Such methods are difficult to employ in a scenario where there is no transcription available.…”
Section: Motivationmentioning
confidence: 99%