Using LSTM neural networks for cross‐lingual phonetic speech segmentation with an iterative correction procedure

Hanzlíček, Zdeněk; Matoušek, Jindřich; Vít, Jakub

doi:10.1111/coin.12602

Computational Intelligence

2023

DOI: 10.1111/coin.12602

|View full text |Cite

Using LSTM neural networks for cross‐lingual phonetic speech segmentation with an iterative correction procedure

Zdeněk Hanzlíček,

Jindřich Matoušek,

Jakub Vít

Abstract: This article describes experiments on speech segmentation using long short‐term memory recurrent neural networks. The main part of the paper deals with multi‐lingual and cross‐lingual segmentation, that is, it is performed on a language different from the one on which the model was trained. The experimental data involves large Czech, English, German, and Russian speech corpora designated for speech synthesis. For optimal multi‐lingual modeling, a compact phonetic alphabet was proposed by sharing and clustering… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2024

Publication Types

Select...

Article1

Relationship

Self Cite0

Independent1

Authors

Journals

Cited by 1 publication

References 95 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

The Mason-Alberta Phonetic Segmenter: a forced alignment system based on deep neural networks and interpolation

Kelley,

Perry,

Tucker

2024

Phonetica

View full text Add to dashboard Cite

Given an orthographic transcription, forced alignment systems automatically determine boundaries between segments in speech, facilitating the use of large corpora. In the present paper, we introduce a neural network-based forced alignment system, the Mason-Alberta Phonetic Segmenter (MAPS). MAPS serves as a testbed for two possible improvements we pursue for forced alignment systems. The first is treating the acoustic model as a tagger, rather than a classifier, motivated by the common understanding that segments are not truly discrete and often overlap. The second is an interpolation technique to allow more precise boundaries than the typical 10 ms limit in modern systems. During testing, all system configurations we trained significantly outperformed the state-of-the-art Montreal Forced Aligner in the 10 ms boundary placement tolerance threshold. The greatest difference achieved was a 28.13 % relative performance increase. The Montreal Forced Aligner began to slightly outperform our models at around a 30 ms tolerance. We also reflect on the training process for acoustic modeling in forced alignment, highlighting how the output targets for these models do not match phoneticians’ conception of similarity between phones and that reconciling this tension may require rethinking the task and output targets or how speech itself should be segmented.

show abstract

The Mason-Alberta Phonetic Segmenter: a forced alignment system based on deep neural networks and interpolation

Kelley,

Perry,

Tucker

2024

Phonetica

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Using LSTM neural networks for cross‐lingual phonetic speech segmentation with an iterative correction procedure

Cited by 1 publication

References 95 publications

The Mason-Alberta Phonetic Segmenter: a forced alignment system based on deep neural networks and interpolation

The Mason-Alberta Phonetic Segmenter: a forced alignment system based on deep neural networks and interpolation

Contact Info

Product

Resources

About