Measurements of the physical outputs of speech—vocal tract geometry and acoustic energy—are high-dimensional, but linguistic theories posit a low-dimensional set of categories such as phonemes and phrase types. How can it be determined when and where in high-dimensional articulatory and acoustic signals there is information related to theoretical categories? For a variety of reasons, it is problematic to directly quantify mutual information between hypothesized categories and signals. To address this issue, a multi-scale analysis method is proposed for localizing category-related information in an ensemble of speech signals using machine learning algorithms. By analyzing how classification accuracy on unseen data varies as the temporal extent of training input is systematically restricted, inferences can be drawn regarding the temporal distribution of category-related information. The method can also be used to investigate redundancy between subsets of signal dimensions. Two types of theoretical categories are examined in this paper: phonemic/gestural categories and syntactic relative clause categories. Moreover, two different machine learning algorithms were examined: linear discriminant analysis and neural networks with long short-term memory units. Both algorithms detected category-related information earlier and later in signals than would be expected given standard theoretical assumptions about when linguistic categories should influence speech. The neural network algorithm was able to identify category-related information to a greater extent than the discriminant analyses.
An experiment was conducted to assess phonetic evidence for categorically distinct prosodic structures associated with two types of relative clauses in English. Non-restrictive relative clauses (NRRCs) and restrictive relative clauses (RRCs) have been argued to be typically produced with different prosodic phrase structures. To test whether there is evidence for this, productions of the two relative clauses were elicited. A wide range of variation in speech rate was elicited by using a moving visual analogue which cued participants for rate variation. Acoustic and articulatory data were collected from twelve participants. We assessed whether the functional relations between speech rate and various phonetic measures at phrase boundaries differed by syntactic context. In addition, linear and sigmoidal models were fit to each of the articulatory and acoustic measures within each syntactic context, and the corrected Akaike Information Criterion (AICc) was used to determine whether the sigmoidal model provides a substantially better fit than the linear model. Although most of the phonetic measures showed a significant difference between the two syntactic structures, which provides some evidence for distinct prosodic categories, the non-linearity analyses in both structures showed weak evidence for categorical variation in prosodic structure.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.