Blinded listener ratings are essential for valid assessment of interventions for speech disorders, but collecting these ratings can be time-intensive and costly. This study evaluated the validity of speech ratings obtained through online crowdsourcing, a potentially more efficient approach. 100 words from children with /r/ misarticulation were electronically presented for binary rating by 35 phonetically trained listeners and 205 naïve listeners recruited through the Amazon Mechanical Turk (AMT) crowdsourcing platform. Bootstrapping was used to compare different-sized samples of AMT listeners against a “gold standard” (mode across all trained listeners) and an “industry standard” (mode across bootstrapped samples of 3 trained listeners). There was strong overall agreement between trained and AMT listeners. The “industry standard” level of performance was matched by bootstrapped samples with n = 9 AMT listeners. These results support the hypothesis that valid ratings of speech data can be obtained in an efficient manner through AMT. Researchers in communication disorders could benefit from increased awareness of this method.
Recent research has demonstrated that perceptual ratings aggregated across multiple non-expert listeners can reveal gradient degrees of covert contrast between target and error sounds that listeners might transcribe identically. Aggregated ratings have been found to correlate strongly with acoustic gold standard measures both when individual raters use a continuous rating scale such as visual analog scaling (Munson, Johnson, & Edwards, 2012) and when individual raters provide binary ratings (McAllister Byun, Halpin, & Szeredi, 2015). In light of evidence that inexperienced listeners use continuous scales less consistently than experienced listeners, this study investigated the relative merits of binary versus continuous rating scales when aggregating responses over large numbers of naive listeners recruited through online crowdsourcing. Stimuli were words produced by children in treatment for misarticulation of North American English /r/. Each listener rated the same 40 tokens two times: once using Visual Analog Scaling (VAS) and once using a binary rating scale. The gradient rhoticity of each item was then estimated using (a) VAS click location, averaged across raters; (b) the proportion of raters who assigned the “correct /r/” label to each item in the binary rating task (p̂). First, we validate these two measures of rhoticity against each other and against an acoustic gold standard. Second, we explore the range of variability in individual response patterns that underlie these group-level data. Third, we integrate statistical, theoretical, and practical considerations to offer guidelines for determining which measure to use in a given situation.
Structured Abstract Background Maintaining an external direction of focus during practice is reported to facilitate acquisition of nonspeech motor skills, but it is not known whether these findings also apply to treatment for speech errors. This question has particular relevance for treatment incorporating visual biofeedback, where clinician cueing can direct the learner’s attention either internally (i.e., to the movements of the articulators) or externally (i.e., to the visual biofeedback display). Aims This study addressed two objectives. First, it aimed to use single-subject experimental methods collect additional evidence regarding the efficacy of visual-acoustic biofeedback treatment for children with /r/ misarticulation. Second, the study compared the efficacy of this biofeedback intervention under two cueing conditions. In the external focus (EF) condition, participants’ attention was directed exclusively to the external biofeedback display. In the internal focus (IF) condition, participants viewed a biofeedback display, but they also received articulatory cues encouraging an internal direction of attentional focus. Methods & Procedures Nine school-aged children were pseudorandomly assigned to receive either internal or external focus cues during eight weeks of visual-acoustic biofeedback intervention. Accuracy in /r/ production at the word level was probed in three to five pre-treatment baseline sessions and three post-treatment maintenance sessions. Outcomes were assessed using visual inspection and calculation of effect sizes for individual treatment trajectories. In addition, a mixed logistic model was used to examine across-subjects effects including phase (pre/post-treatment), /r/ variant (treated/untreated), and focus cue condition (internal/external). Outcomes & Results Six out of nine participants showed sustained improvement on at least one treated /r/ variant; these six participants were evenly divided across EF and IF treatment groups. Regression results indicated that /r/ productions were significantly more likely to be rated accurate post-treatment than pre-treatment. Internal versus external direction of focus cues was not a significant predictor of accuracy, nor did it interact significantly with other predictors. Conclusions The present results are consistent with previous literature reporting that visual-acoustic biofeedback can produce measurable treatment gains in children who have not responded to previous intervention. These findings are also in keeping with previous research suggesting that biofeedback may be sufficient to establish an external attentional focus, independent of verbal cues provided. The finding that explicit articulator placement cues were not necessary for progress in treatment has implications for intervention practices for speech sound disorders in children.
Perceptual ratings aggregated across multiple non-expert listeners can be used to measure covert contrast in child speech. Online crowdsourcing provides access to a large pool of raters, but for practical purposes, researchers may wish to use smaller samples. The ratings obtained from these smaller samples may not maintain the high levels of validity seen in larger samples. This study aims to measure the validity and reliability of crowdsourced continuous ratings of child speech, obtained through Visual Analog Scaling, and to identify ways to improve these measurements. We first assess overall validity and interrater reliability for measurements obtained from a large set of raters. Second, we investigate two rater-level measures of quality, individual validity and intrarater reliability, and examine the relationship between them. Third, we show these estimates may be used to establish guidelines for the inclusion of raters, thus impacting the quality of results obtained when smaller samples are used.
In Hungarian, stems containing only front unrounded (neutral) vowels fall into two groups: one group taking front suffixes, the other taking back suffixes in vowel harmony. The distinction is traditionally thought of as purely lexical. Beňuš and Gafos (2007) have recently challenged this position, claiming that there are significant articulatory differences between the vowels in the two groups.Neutral vowels also occur in vacillating stems. These typically contain one back vowel and one or more neutral vowels, and accept both front and back suffixes, with extensive inter-and intra-speaker variation. Based on Beňuš and Gafos's line of argument, the expectation is that vacillating stems will display a kind of phonetic realisation that is distinct from both harmonic and anti-harmonic stems.We present the results of an ongoing acoustic study on the acoustics of neutral vowels, partly recreating Beňuš and Gafos's conditions, but also including vacillating stems. To map the extent of individual and dialectal variation regarding vacillating stems, a grammaticality judgement test was also carried out on speakers of two dialects of Hungarian, crucially differing in the surface inventory of neutral vowels. We present our first findings about how this phonetic difference influences the phonological behaviour of vacillating stems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.