Most computational models of word segmentation are trained and tested on transcripts of speech, rather than the speech itself, and assume that speech is converted into a sequence of symbols prior to word segmentation. We present a way of representing speech corpora that avoids this assumption, and preserves acoustic variation present in speech. We use this new representation to re-evaluate a key computational model of word segmentation. One finding is that high levels of phonetic variability degrade the model's performance. While robustness to phonetic variability may be intrinsically valuable, this finding needs to be complemented by parallel studies of the actual abilities of children to segment phonetically variable speech.
Several computational simulations of how children solve the word segmentation problem have been proposed, but most have been applied only to a limited number of languages. One model with some experimental support uses distributional statistics of sound sequence predictability (Saffran et al. 1996). However, the experimental design does not fully specify how predictability is best measured or modeled in a simulation. Saffran et al. (1996) assume transitional probability, but Brent (1999a) claims mutual information (MI) is more appropriate. Both assume predictability is measured locally, relative to neighboring segment-pairs. This paper replicates Brent's (1999a) mutualinformation model on a corpus of childdirected speech in Modern Greek, and introduces a variant model using a global threshold. Brent's finding regarding the superiority of MI is confirmed; the relative performance of local comparisons and global thresholds depends on the evaluation metric.
The “Did You Mean...?” system, described in this article, is a spelling corrector for Arabic that is designed specifically for L2 learners of dialectal Arabic in the context of dictionary lookup. The authors use an orthographic density metric to motivate the need for a finer-grained ranking method for candidate words than unweighted Levenshtein edit distance. The Did You Mean...? architecture is described, and the authors show that mean reciprocal rank can be improved by tuning operation weights according to sound confusions, and by anticipating likely spelling variants.
Social media users are often exposed to cute content that evokes emotional reactions and influences them to feel or behave certain ways. The cuteness phenomenon in social media has been scarcely studied despite its prevalence and potential to spread quickly and affect large audiences. The main framework for understanding cuteness and emotions related to cuteness outside of social media is baby schema (having juvenile characteristics), which triggers parental instincts. We propose that baby schema is a necessary but not sufficient component of explaining what constitutes cuteness and how people react to it in the social media context. Cute social media content may also have characteristics that evoke approach motivations (a desire to interact with an entity, generally with the expectation of having a positive experience) that can manifest behaviorally in sharing and other prosocial online behaviors. We developed and performed initial validation for measures in social media contexts of: (1) cute attributes that encompass both baby schema and other proposed cuteness characteristics (the Cuteness Attributes Taxonomy, CAT) and (2) the emotional reactions they trigger (Heartwarming Social Media, HSM). We used the Kama Muta Multiplex Scale (KAMMUS Two), as previously validated measure of kama muta (an emotion akin to tenderness; from Sanskrit, “moved by love”) as a measure of emotional reaction to cute stimuli and the dimension Cute Content of the Social Media Emotions Annotation Guide (SMEmo-Cute Content) as a developed measure of gestalt cute content to help validate our newly developed measures. Using 1,875 Polish tweets, our results confirmed that cute social media content predicted a kama muta response, but not all KAMMUS Two subscales were sensitive to cute content, and that the HSM measure was a better indicator of the presence of cute content. Further, the CAT measure is an effective means of categorizing cute attributes of social media content. These results suggest potential differences between in-person, online, and social media experiences evoking cute emotional reactions, and the need for metrics that are developed and validated for use in social media contexts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.