Using LSTMs to Assess the Obligatoriness of Phonological Distinctive Features for Phonotactic Learning

Mirea, Nicole; Bicknell, Klinton

doi:10.18653/v1/p19-1155

Cited by 9 publications

(16 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The goal of this paper is to expand on the successes of this ongoing collective research programme. The algorithm described below shares many aspects with past work, such as vector embedding (Powers 1997, Calderone 2009, Goldsmith & Xanthos 2009, Nazarov 2014, 2016, Silfverberg et al 2018, Mirea & Bicknell 2019, normalisation (Powers 95 Learning phonological classes from distributional similarity 1997, Silfverberg et al 2018), matrix decomposition (Powers 1997, Calderone 2009, Goldsmith & Xanthos 2009, Silfverberg et al 2018 and clustering algorithms (Powers 1997, Nazarov 2014, 2016, Mirea & Bicknell 2019. The innovations that will be presented below are largely in the combination and extension of these techniques, but the clustering methodology presented is relatively novel.…”

Section: Previous Workmentioning

confidence: 98%

“…Their use of maximum entropy Hidden Markov Models also involves a kind of one-dimensional clustering on emission probability ratios, setting a threshold of 0 as the boundary between clusters. Powers (1997) and Mirea & Bicknell (2019) both use hierarchical clustering to extract classes from embeddings. Hierarchical clustering is simple, but not well suited to phonological class discovery: it cannot find multiple partitions of the same set of sounds, and requires the number of classes to be decided by an analyst.…”

Section: K-means Clusteringmentioning

confidence: 99%

“…I include Samoan because it has a relatively small segmental inventory and fairly restrictive phonotactics, providing a simple test case. English, French and Finnish are included for continuity with previous studies (Calderone 2009, Goldsmith & Xanthos 2009, Silfverberg et al 2018, Mirea & Bicknell 2019. In addition to simply exploring which classes are distributionally salient in each language, my 14 A reviewer wonders, following Archangeli et al (2011), whether certain types of noise are more disruptive to this algorithm than others, and whether these correspond to what we see in natural language.…”

Section: Testing the Algorithm On Real Language Datamentioning

confidence: 99%

“…Their non-neural comparison model employs vector embedding, singular value decomposition, and normalisation using positive pointwise mutual information, all of which are used in the algorithm presented in this paper, but it does not take phoneme ordering into account. Similarly, Mirea & Bicknell (2019) generate phoneme embeddings using a long-term short-term memory neural network, and perform hierarchical clustering on these embeddings. This clustering does not cleanly separate consonants and vowels, though some suggestive groupings are present.…”

Section: Previous Workmentioning

confidence: 99%

“…It is likely that considering additional aspects of context would improve performance on the real languages, although simply increasing the size of the contexts considered in an n-gram model will lead to issues of data paucity. Using sequential neural networks to generate phoneme embeddings is a particularly promising possibility, since they can produce vector representations of sounds without being explicitly told which features of the context to attend to (Silfverberg et al 2018, Mirea & Bicknell 2019, Mayer & Nelson 2020. Alternatively, integrating a mechanism for tier projection (e.g.…”

mentioning

confidence: 99%

See 4 more Smart Citations

An algorithm for learning phonological classes from distributional similarity

Mayer

2020

Phonology

View full text Add to dashboard Cite

An important question in phonology is to what degree the learner uses distributional information rather than substantive properties of speech sounds when learning phonological structure. This paper presents an algorithm that learns phonological classes from only distributional information: the contexts in which sounds occur. The input is a segmental corpus, and the output is a set of phonological classes. The algorithm is first tested on an artificial language, with both overlapping and nested classes reflected in the distribution, and retrieves the expected classes, performing well as distributional noise is added. It is then tested on four natural languages. It distinguishes between consonants and vowels in all cases, and finds more detailed, language-specific structure. These results improve on past approaches, and are encouraging, given the paucity of the input. More refined models may provide additional insight into which phonological classes are apparent from the distributions of sounds in natural languages.

show abstract

Section: Previous Workmentioning

confidence: 98%

Section: K-means Clusteringmentioning

confidence: 99%

Section: Testing the Algorithm On Real Language Datamentioning

confidence: 99%

Section: Previous Workmentioning

confidence: 99%

mentioning

confidence: 99%

See 3 more Smart Citations

An algorithm for learning phonological classes from distributional similarity

Mayer

2020

Phonology

View full text Add to dashboard Cite

show abstract

Gender classification of Korean personal names: Deep neural networks versus human judgments

Cho

2024

Lingua

View full text Add to dashboard Cite

The function/content word distinction and eye movements in reading.

Staub

2024

Journal of Experimental Psychology: Learning, Memory, and Cogni

View full text Add to dashboard Cite

A substantial quantity of research has explored whether readers’ eye movements are sensitive to the distinction between function and content words. No clear answer has emerged, in part due to the difficulty of accounting for differences in length, frequency, and predictability between the words in the two classes. Based on evidence that readers differentially overlook function word errors, we hypothesized that function words may be more frequently skipped or may receive shorter fixations. We present two very large-scale eyetracking experiments using selected sentences from a corpus of natural text, with each sentence containing a target function or content word. The target words in the two classes were carefully matched on length, frequency, and predictability, with the latter variable operationalized in terms of next-word probability obtained from the large language model GPT-2. While the experiments replicated a range of expected effects, word class did not have any clear influence on target word skipping probability, and there was some evidence for a content word advantage in fixation duration measures. These results indicate that readers’ tendency to overlook function word errors is not due to reduced time spent encoding these words. The results also broadly support the implicit assumption in prominent models of eye movement control in reading that a word’s syntactic category does not play an important role in decisions about when and where to move the eyes.

show abstract

Using LSTMs to Assess the Obligatoriness of Phonological Distinctive Features for Phonotactic Learning

Cited by 9 publications

References 22 publications

An algorithm for learning phonological classes from distributional similarity

An algorithm for learning phonological classes from distributional similarity

Gender classification of Korean personal names: Deep neural networks versus human judgments

The function/content word distinction and eye movements in reading.

Contact Info

Product

Resources

About