2020
DOI: 10.5334/labphon.216
|View full text |Cite
|
Sign up to set email alerts
|

From categories to gradience: Auto-coding sociophonetic variation with random forests

Abstract: The time-consuming nature of coding sociophonetic variables that are typically treated as categorical represents an impediment to addressing research questions around these variables that require large volumes of data. In this paper, we apply a machine learning method, random forest classification (Breiman, 2001), to automate coding (categorical prediction) of two English sociophonetic variables traditionally treated as categorical, non-prevocalic /r/ and word-medial intervocalic /t/, based on tokens' acoustic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
16
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 18 publications
(16 citation statements)
references
References 59 publications
0
16
0
Order By: Relevance
“…In this section we return to the idea implemented by Liberman (2009, 2011a) and McLarty et al (2019), where a variable classifier might be trained on related, non-variable but "variable-adjacent" phonetic material. As discussed earlier, this is a novel suggestion with much promise, although, as raised by Villarreal et al (2020), one that needs extensive validation before we know how much we might trust automated coding procedures that are not trained on data from the same variable contexts that they are used to classify. There are reasons to expect that these non-variable words will not form perfect approximations of the pronunciation of -in and -ing variants of (ING), however their basic phonological forms are close to the realizations relevant to variable (ING).…”
Section: (Ing) Classification Using Variable-adjacent Productions As Training Datamentioning
confidence: 99%
See 4 more Smart Citations
“…In this section we return to the idea implemented by Liberman (2009, 2011a) and McLarty et al (2019), where a variable classifier might be trained on related, non-variable but "variable-adjacent" phonetic material. As discussed earlier, this is a novel suggestion with much promise, although, as raised by Villarreal et al (2020), one that needs extensive validation before we know how much we might trust automated coding procedures that are not trained on data from the same variable contexts that they are used to classify. There are reasons to expect that these non-variable words will not form perfect approximations of the pronunciation of -in and -ing variants of (ING), however their basic phonological forms are close to the realizations relevant to variable (ING).…”
Section: (Ing) Classification Using Variable-adjacent Productions As Training Datamentioning
confidence: 99%
“…For variable (ING), a feature without standard acoustic measures, we believe that MFCCs are a useful acoustic representation, but we also acknowledge that further testing -into both other potential acoustic measures and the parameters for the MFCC extraction -would be beneficial. Additionally, while many of the previous studies emphasize the role of gradience in assigning values to sociolinguistic variables through the use of probability estimates of token classification (Yuan and Liberman, 2011b;McLarty et al, 2019;Villarreal et al, 2020), we limit our investigations to binary classification of (ING) to assess the general utility of different automated methods.…”
Section: Automated Approaches To Coding Pronunciation Variablesmentioning
confidence: 99%
See 3 more Smart Citations