Machine learning, including neural network techniques, have been applied to virtually every domain in natural language processing. One problem that has been somewhat resistant to effective machine learning solutions is text normalization for speech applications such as text-to-speech synthesis (TTS). In this application, one must decide, for example, that 123 is verbalized as one hundred twenty three in 123 pages but as one twenty three in 123 King Ave. For this task, state-of-the-art industrial systems depend heavily on hand-written language-specific grammars. We propose neural network models that treat text normalization for TTS as a sequence-to-sequence problem, in which the input is a text token in context, and the output is the verbalization of that token. We find that the most effective model, in accuracy and efficiency, is one where the sentential context is computed once and the results of that computation are combined with the computation of each token in sequence to compute the verbalization. This model allows for a great deal of flexibility in terms of representing the context, and also allows us to integrate tagging and segmentation into the process. These models perform very well overall, but occasionally they will predict wildly inappropriate verbalizations, such as reading 3 cm as three kilometers. Although rare, such verbalizations are a major issue for TTS applications. We thus use finite-state covering grammars to guide the neural models, either during training and decoding, or just during decoding, away from such “unrecoverable” errors. Such grammars can largely be learned from data.
It is standard practice in speech & language technology to rank systems according to performance on a test set held out for evaluation. However, few researchers apply statistical tests to determine whether differences in performance are likely to arise by chance, and few examine the stability of system ranking across multiple training-testing splits. We conduct replication and reproduction experiments with nine part-of-speech taggers published between 2000 and 2018, each of which reports state-of-the-art performance on a widely-used "standard split". We fail to reliably reproduce some rankings using randomly generated splits. We suggest that randomly generated splits should be used in system comparison.
How language-agnostic are current state-ofthe-art NLP tools? Are there some types of language that are easier to model with current methods? In prior work (Cotterell et al., 2018) we attempted to address this question for language modeling, and observed that recurrent neural network language models do not perform equally well over all the highresource European languages found in the Europarl corpus. We speculated that inflectional morphology may be the primary culprit for the discrepancy. In this paper, we extend these earlier experiments to cover 69 languages from 13 language families using a multilingual Bible corpus. Methodologically, we introduce a new paired-sample multiplicative mixed-effects model to obtain language difficulty coefficients from at-least-pairwise parallel corpora. In other words, the model is aware of inter-sentence variation and can handle missing data. Exploiting this model, we show that "translationese" is not any easier to model than natively written language in a fair comparison. Trying to answer the question of what features difficult languages have in common, we try and fail to reproduce our earlier (Cotterell et al., 2018) observation about morphological complexity and instead reveal far simpler statistics of the data that seem to drive complexity in a much larger sample.
BackgroundA subgroup of young children with autism spectrum disorders (ASD) have significant language impairments (phonology, grammar, vocabulary), although such impairments are not considered to be core symptoms of and are not unique to ASD. Children with specific language impairment (SLI) display similar impairments in language. Given evidence for phenotypic and possibly etiologic overlap between SLI and ASD, it has been suggested that language-impaired children with ASD (ASD + language impairment, ALI) may be characterized as having both ASD and SLI. However, the extent to which the language phenotypes in SLI and ALI can be viewed as similar or different depends in part upon the age of the individuals studied. The purpose of the current study is to examine differences in memory abilities, specifically those that are key “markers” of heritable SLI, among young school-age children with SLI, ALI, and ALN (ASD + language normal).MethodsIn this cross-sectional study, three groups of children between ages 5 and 8 years participated: SLI (n = 18), ALI (n = 22), and ALN (n = 20). A battery of cognitive, language, and ASD assessments was administered as well as a nonword repetition (NWR) test and measures of verbal memory, visual memory, and processing speed.ResultsNWR difficulties were more severe in SLI than in ALI, with the largest effect sizes in response to nonwords with the shortest syllable lengths. Among children with ASD, NWR difficulties were not associated with the presence of impairments in multiple ASD domains, as reported previously. Verbal memory difficulties were present in both SLI and ALI groups relative to children with ALN. Performance on measures related to verbal but not visual memory or processing speed were significantly associated with the relative degree of language impairment in children with ASD, supporting the role of verbal memory difficulties in language impairments among early school-age children with ASD.ConclusionsThe primary difference between children with SLI and ALI was in NWR performance, particularly in repeating two- and three-syllable nonwords, suggesting that shared difficulties in early language learning found in previous studies do not necessarily reflect the same underlying mechanisms.Electronic supplementary materialThe online version of this article (doi:10.1186/s11689-015-9111-z) contains supplementary material, which is available to authorized users.
Atypical pragmatic language is often present in individuals with autism spectrum disorders (ASD), along with delays or deficits in structural language. This study investigated the use of the “fillers” uh and um by children ages 4–8 during the autism diagnostic observation schedule. Fillers reflect speakers’ difficulties with planning and delivering speech, but they also serve communicative purposes, such as negotiating control of the floor or conveying uncertainty. We hypothesized that children with ASD would use different patterns of fillers compared to peers with typical development or with specific language impairment (SLI), reflecting differences in social ability and communicative intent. Regression analyses revealed that children in the ASD group were much less likely to use um than children in the other two groups. Filler use is an easy-to-quantify feature of behavior that, in concert with other observations, may help to distinguish ASD from SLI.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.