We present the Montreal Forced Aligner (MFA), a new opensource system for speech-text alignment. MFA is an update to the Prosodylab-Aligner, and maintains its key functionality of trainability on new data, as well as incorporating improved architecture (triphone acoustic models and speaker adaptation), and other features. MFA uses Kaldi instead of HTK, allowing MFA to be distributed as a stand-alone package, and to exploit parallel processing for computationally-intensive training and scaling to larger datasets. We evaluate MFA's performance on aligning word and phone boundaries in English conversational and laboratory speech, relative to human-annotated boundaries, focusing on the effects of aligner architecture and training on the data to be aligned. MFA performs well relative to two existing open-source aligners with simpler architecture (Prosodylab-Aligner and FAVE), and both its improved architecture and training on data to be aligned generally result in more accurate boundaries.
Studies on perceptual learning are motivated by phonetic variation that listeners encounter across speakers, items, and context. In this study, the authors investigate what control the listener has over the perceptual learning of ambiguous /s/ pronunciations through inducing changes in their attentional set. Listeners' attention is manipulated during a lexical decision exposure task such that their attention is directed at the word-level for comprehension-oriented listening or toward the signal for perception-oriented listening. In a categorization task with novel words, listeners in the condition that maximally biased listeners toward comprehension-oriented attentional sets showed the most perceptual learning. Focus on higher levels of linguistic meaning facilitated generalization to new words. These results suggest that the way in which listeners attend to the speech stream affects how linguistic categories are updated, providing insight into the qualitative differences in perceptual learning between the psychophysics and language-focused literatures.
This study examines spontaneous phonetic accommodation of a dialect with distinct categories by speakers who are in the process of merging those categories. We focus on the merger of the NEAR and SQUARE lexical sets in New Zealand English, presenting New Zealand participants with an unmerged speaker of Australian English. Mergers-in-progress are a uniquely interesting sound change as they showcase the asymmetry between speech perception and production. Yet, we examine mergers using spontaneous phonetic imitation, which is phenomenon that is necessarily a behavior where perceptual input influences speech production. Phonetic imitation is quantified by a perceptual measure and an acoustic calculation of mergedness using a Pillai-Bartlett trace. The results from both analyses indicate spontaneous phonetic imitation is moderated by extra-linguistic factors such as the valence of assigned conditions and social bias. We also find evidence for a decrease in the degree of mergedness in post-exposure productions. Taken together, our results suggest that under the appropriate conditions New Zealanders phonetically accommodate to Australian English and that in the process of speech imitation, mergers-in-progress can, but do not consistently, become less merged.
Background/Aims: Lexically guided perceptual learning in speech is the updating of linguistic categories based on novel input disambiguated by the structure provided in a recognized lexical item. We test the range of variation that allows for perceptual learning by presenting listeners with items that vary from subtle within-category variation to fully remapped cross-category variation. Methods: Experiment 1 uses a lexically guided perceptual learning paradigm with words containing noncanonical /s/ realizations from s/ʃ continua that correspond to “typical,” “ambiguous,” “atypical,” and “remapped” steps. Perceptual learning is tested in an s/ʃ categorization task. Experiment 2 addresses listener sensitivity to variation in the exposure items using AX discrimination tasks. Results: Listeners in experiment 1 showed perceptual learning with the maximally ambiguous tokens. Performance of listeners in experiment 2 suggests that tokens which showed the most perceptual learning were not perceptually salient on their own. Conclusion: These results demonstrate that perceptual learning is enhanced with maximally ambiguous stimuli. Excessively atypical pronunciations show attenuated perceptual learning, while typical pronunciations show no evidence for perceptual learning. AX discrimination illustrates that the maximally ambiguous stimuli are not perceptually unique. Together, these results suggest that perceptual learning relies on an interplay between confidence in phonetic and lexical predictions and category typicality.
Prosody simultaneously encodes different kinds of information about an utterance, including the type of speech act (which, in English, often affects the choice of intonational tune), the syntactic constituent structure (which mainly affects prosodic phrasing), and the location of semantic focus (which mainly affects the relative prosodic prominence between words). The syntactic and semantic functional dimensions (speech act, constituency, focus) are orthogonal to each other, but to which extent their prosodic correlates are remains controversial. This paper reports on a production experiment that crosses these three dimensions to look for interactions, concentrating on interactions between focus prominence and phrasing. The results provide evidence that interactions are more limited than many current theories of sentence prosody would predict, and support a theory that keeps different prosodic dimensions representationally separate.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.