Purpose This study examined the accuracy and potential clinical utility of two expedited transcription methods for narrative language samples elicited from school-age children (7;5–11;10 [years;months]) with developmental language disorder. Transcription methods included real-time transcription produced by speech-language pathologists (SLPs) and trained transcribers (TTs) as well as Google Cloud Speech automatic speech recognition. Method The accuracy of each transcription method was evaluated against a gold-standard reference corpus. Clinical utility was examined by determining the reliability of scores calculated from the transcripts produced by each method on several language sample analysis (LSA) measures. Participants included seven certified SLPs and seven TTs. Each participant was asked to produce a set of six transcripts in real time, out of a total 42 language samples. The same 42 samples were transcribed using Google Cloud Speech. Transcription accuracy was evaluated through word error rate. Reliability of LSA scores was determined using correlation analysis. Results Results indicated that Google Cloud Speech was significantly more accurate than real-time transcription in transcribing narrative samples and was not impacted by speech rate of the narrator. In contrast, SLP and TT transcription accuracy decreased as a function of increasing speech rate. LSA metrics generated from Google Cloud Speech transcripts were also more reliably calculated. Conclusions Automatic speech recognition showed greater accuracy and clinical utility as an expedited transcription method than real-time transcription. Though there is room for improvement in the accuracy of speech recognition for the purpose of clinical transcription, it produced highly reliable scores on several commonly used LSA metrics. Supplemental Material https://doi.org/10.23641/asha.15167355
Background The SF-36 questionnaire is perhaps the most widely used quality of life instrument in the world today, while the PROMIS instruments continue to gain popularity. Given their continued use in chiropractic research and practice, we examined their latent domain structure using exploratory factor analysis (EFA). Methods To uncover latent structures of a large series of measured variables from the PROMIS-29, PROMIS Global Health and RAND SF-36 domains, we defined a factor analysis model represented by the equation $$X = \mu + \Lambda F +\epsilon$$ X = μ + Λ F + ϵ , where $$X = (X_{1} , \ldots ,X_{p} )^{T}$$ X = ( X 1 , … , X p ) T is the matrix of random vectors corresponding to the domains with a mean of $$\mu$$ μ and the covariance matrix $$\Sigma ,$$ Σ , $$\Lambda = \{ l_{jk} \}_{pxm}$$ Λ = { l jk } pxm denotes the matrix of factor loadings, $$F = (F_{1} , \ldots ,F_{m} )^{T}$$ F = ( F 1 , … , F m ) T denotes the matrix of unobserved latent variables that influence the collection of domains and $$\epsilon = (_{1} , \ldots ,_{p} )^{T}$$ ϵ = ( 1 , … , p ) T is the vector of latent error terms. The matrix of item responses X was the only observed quantity with restrictions such that variable scores were uncorrelated and of unit variance with the latent errors being independent with the variance vector $$\psi$$ ψ . The inherited structure of X was expressed simply by $$\Sigma = \Lambda \Lambda^{T} + \psi$$ Σ = Λ Λ T + ψ . Orthogonal and oblique rotations were performed on the $$\Lambda$$ Λ matrix with this equation to improve clarity of the latent structure. Model parameters $$\left( {\mu ,\Lambda ,\psi } \right)$$ μ , Λ , ψ were optimized using the method of minimum residuals. Each EFA model was constructed with Pearson and Polychoric correlation. Results For the PROMIS-29, domains were confirmed to be strongly correlated with Factor 1 (i.e., mental health) or Factor 2 (i.e., physical health). Satisfaction with participation in social roles was highly correlated with a 3rd factor (i.e., social health). For the PROMIS Global Health Scale, a 2-factor EFA confirmed the GPH and GMH domains. For the RAND SF-36, an apparent lack of definable structure was observed except for physical function which had a high correlational relationship with Factor 2. The remaining domains lacked correlation with any factors. Conclusion Distinct separation in the latent factors between presumed physical, mental and social health domains were found with the PROMIS instruments but relatively indistinguishable domains in the RAND SF-36. We encourage continued efforts in this area of research to improving patient reported outcomes.
The accuracy of four machine learning methods in predicting narrative macrostructure scores was compared to scores obtained by human raters utilizing a criterion-referenced progress monitoring rubric. The machine learning methods that were explored covered methods that utilized hand-engineered features, as well as those that learn directly from the raw text. The predictive models were trained on a corpus of 414 narratives from a normative sample of school-aged children (5;0-9;11) who were given a standardized measure of narrative proficiency. Performance was measured using Quadratic Weighted Kappa, a metric of inter-rater reliability. The results indicated that one model, BERT, not only achieved significantly higher scoring accuracy than the other methods, but was consistent with scores obtained by human raters using a valid and reliable rubric. The findings from this study suggest that a machine learning method, specifically, BERT, shows promise as a way to automate the scoring of narrative macrostructure for potential use in clinical practice.
Language sample analysis (LSA) is an important practice for providing a culturally sensitive and accurate assessment of a child's language abilities. A child's usage of literate language devices in narrative samples has been shown to be a critical target for evaluation. While automated scoring systems have begun to appear in the field, no such system exists for conducting progress-monitoring on literate language usage within narratives. The current study aimed to develop a hard-coded scoring system called the Literate Language Use in Narrative Assessment (LLUNA), to automatically evaluate six aspects of literate language in non-coded narrative transcripts. LLUNA was designed to individually score six literate language elements (e.g., coordinating and subordinating conjunctions, meta-linguistic and meta-cognitive verbs, adverbs, and elaborated noun phrases). The interrater reliability of LLUNA with an expert scorer, as well as its' reliability compared to certified undergraduate scorers was calculated using a quadratic weighted kappa (Kqw). Results indicated that LLUNA met strong levels of interrater reliability with an expert scorer on all six elements. LLUNA also surpassed the reliability levels of certified, but non-expert scorers on four of the six elements and came close to matching reliability levels on the remaining two. LLUNA shows promise as means for automating the scoring of literate language in LSA and narrative samples for the purpose of assessment and progress-monitoring.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.