The purpose of this study was to explore developmental changes, in terms of spectral fluctuations and temporal periodicity with Japanese- and English-learning infants. Three age groups (15, 20, and 24 months) were selected, because infants diversify phonetic inventories with age. Natural speech of the infants was recorded. We utilized a critical-band-filter bank, which simulated the frequency resolution in adults’ auditory periphery. First, the correlations between the power fluctuations of the critical-band outputs represented by factor analysis were observed in order to see how the critical bands should be connected to each other, if a listener is to differentiate sounds in infants’ speech. In the following analysis, we analyzed the temporal fluctuations of factor scores by calculating autocorrelations. The present analysis identified three factors as had been observed in adult speech at 24 months of age in both linguistic environments. These three factors were shifted to a higher frequency range corresponding to the smaller vocal tract size of the infants. The results suggest that the vocal tract structures of the infants had developed to become adult-like configuration by 24 months of age in both language environments. The amount of utterances with periodic nature of shorter time increased with age in both environments. This trend was clearer in the Japanese environment.
The present study aimed at exploring the responses of listeners in conversational speech between parents and toddlers. Children's responses toward parents and parents' responses toward children were the focus of this study. Participants included ive dyads each of typically developing two-year-old toddlers and their parents from Japanese-and English-speaking families. Responses of a mother/father toward a child or a child toward a mother/father were classiied into three categories: non-lexical backchannels (e.g., hoo, nn, hai), phrasal backchannels (e.g., hontoo "really," soo desu ka "is that right?"), and repetition. The results showed that the average ratio of overall backchannels and repetitions produced by parents was quite similar in both languages and was much greater than that produced by children in both languages. Among Japanese-speaking parents, non-lexical backchannels and repetitions were preferred to phrasal backchannels, while among English-speaking parents non-lexical backchannels were most frequently used. With Japanese-speaking parents, almost half of the repetitions were exact repetitions.They frequently repeated what a child had said and added the sentence-inal particle "ne" or content words. These indings are expected to be useful in understanding response behaviors in spoken communication between parents and their children.
This study examines connections between semantic structure and speech units and characteristics of facial movements in English as a Foreign Language (EFL) learners' public speech. The data were obtained from a multimodal corpus of English public speaking constructed from digital audio and video data from an English speech contest held at a Japanese high school. Evaluation data of contest judges were also included. For the audio data, speech pauses were extracted using acoustic analysis software. The spoken content (i.e. text) of each speech unit between two pauses was then annotated. The semantic structures of the speech units were analysed based on segmental chunks of clauses. Motion capturing was applied to the video data. 42 tracking points were set on the speaker's eyes, eyebrows, nose, lips and jawline. The results indicated: (1) Speakers with higher evaluations showed similar semantic structure patterns in speech units. Pause patterns and evaluation scores showed a strong correlation. (2) Face roll movement frequencies and the angles of face rolls for eye contact suggest that speakers with higher performance evaluations shared characteristic facial movement frequencies and degrees. These results may allow us to define model patterns for inserting pauses into public speech and develop facial movement criteria that effectively describe good eye contact patterns in public speaking.
Public speaking is an essential skill for a large variety of professions and in everyday life. However, it is difficult to master the skills. This paper focuses on the automatic assessment of nonverbal facial behavior of public speaking and proposes simple and efficient method of head pose estimation and motion analysis. We collected nine speech scenes of the recitation contest in a Japanese high school, and applied the proposed method to evaluate the performance. As for the head pose estimation, our method was obtained acceptable accuracy for the speech scene. Proposed motion analysis method can be calculated frequencies and moving ranges of head motion. As the result, it was found that there is correlation between the moving range and eye contact score.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.