Word prediction in computational historical linguistics

Dekker, Peter; Zuidema, Willem

doi:10.15398/jlm.v8i2.268

Cited by 6 publications

(7 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The new framework has the advantage of being easy to use, easy to extend, and fast to apply, while at the same time yielding promising results on a newly compiled collection of datasets from three different languages families. Given that our framework can be easily extended, by varying the individual components of the worfklow, we hope that it will provide a solid basis for future work on phonological reconstruction, as well as the prediction of words from cognate reflexes (Bodt and List, 2022;Dekker and Zuidema, 2021;Beinborn et al, 2013;Fourrier et al, 2021) in computational historical linguistics.…”

Section: Discussionmentioning

confidence: 99%

A New Framework for Fast Automated Phonological Reconstruction Using Trimmed Alignments and Sound Correspondence Patterns

List¹,

Forkel²,

Hill³

2022

Proceedings of the 3rd Workshop on Computational Approaches to Historical Language Change

View full text Add to dashboard Cite

Computational approaches in historical linguistics have been increasingly applied during the past decade and many new methods that implement parts of the traditional comparative method have been proposed. Despite these increased efforts, there are not many easy-to-use and fast approaches for the task of phonological reconstruction. Here we present a new framework that combines state-of-the-art techniques for automated sequence comparison with novel techniques for phonetic alignment analysis and sound correspondence pattern detection to allow for the supervised reconstruction of word forms in ancestral languages. We test the method on a new dataset covering six groups from three different language families. The results show that our method yields promising results while at the same time being not only fast but also easy to apply and expand.

show abstract

Section: Discussionmentioning

confidence: 99%

A New Framework for Fast Automated Phonological Reconstruction Using Trimmed Alignments and Sound Correspondence Patterns

List¹,

Forkel²,

Hill³

2022

Proceedings of the 3rd Workshop on Computational Approaches to Historical Language Change

View full text Add to dashboard Cite

show abstract

“…Automatic cognate prediction has been studied using character-level machine translation techniques (Beinborn et al, 2013;Wu and Yarowsky, 2018;Dekker, 2018;Hämäläinen and Rueter, 2019;Four- rier and Sagot, 2020a). Dekker and Zuidema (2021) provide an overview of the different neural approaches used to solve this task (including their own), as well as its applications to other historical linguistic tasks (such as phylogeny reconstruction). However, the current paper follows specifically the tracks of two previous works studying encoderdecoder models for Romance cognate prediction.…”

Section: Automatic Cognate Predictionmentioning

confidence: 99%

“…The cognate prediction task aims at predicting, from a phonetised word, the plausible phonetic form of its cognate in a related language, according to known sound correspondence patterns; this has many applications, from identifying new words with field linguists (Bodt et al, 2018;Bodt and List, 2019) to inducing translation lexicons for lowresourced languages (Mann and Yarowsky, 2001). 3 This task has been modelled as a sequence to sequence character level machine translation task in the most recent papers studying it (see the survey on cognate prediction in Dekker and Zuidema (2021)), which drew linguistic conclusions on the latent information learnt by such models by studying their outputs in a 'black-box' fashion. However, no paper that we know of tried to confirm or inform these conclusions by using modern interpretability tools, such as probing tasks, hidden representation analysis, or inner components analysis.…”

Section: Introductionmentioning

confidence: 99%

Probing Multilingual Cognate Prediction Models

Fourrier¹,

Sagot²

2022

Findings of the Association for Computational Linguistics: ACL 2022

View full text Add to dashboard Cite

Character-based neural machine translation models have become the reference models for cognate prediction, a historical linguistics task. So far, all linguistic interpretations about latent information captured by such models have been based on external analysis (accuracy, raw results, errors). In this paper, we investigate what probing can tell us about both models and previous interpretations, and learn that though our models store linguistic and diachronic information, they do not achieve it in previously assumed ways.

show abstract

“…The task can be seen as a form of zero-shot learning (Xian et al, 2018), where a model must learn to predict the "reflexes" of a potentially unknown ancestral word form, with no examples of the relevant cognate set provided during the training phase. When considering the landscape of machine learning methods available and the approaches so far proposed (Dinu and Ciobanu, 2014;Bodt and List, 2022;Meloni et al, 2021;Beinborn et al, 2013;Dekker and Zuidema, 2021;Fourrier et al, 2021;List et al, forthcoming(b)), including other submissions to this challenge (Jäger, 2022;Celano, 2022;Kirov et al, 2022), it is possible to identify two main strategies for the task. The first one treats the problem as one of classification, potentially refining sequence results with probabilities from a character model, while the second employs sequence transformation methods, especially those akin to seq2seq approaches (Sutskever et al, 2014), making the task one analogous to that of "translation".…”

Section: Introductionmentioning

confidence: 99%

Approaching Reflex Predictions as a Classification Problem Using Extended Phonological Alignments

Tresoldi¹

2022

Preprint

View full text Add to dashboard Cite

This work describes an implementation of the "extended alignment" (or "multitiers") approach for cognate reflex prediction, submitted to "Prediction of Cognate Reflexes" shared task. Similarly to List et al. forthcoming(b), the technique involves an automatic extension of sequence alignments with multilayered vectors that encode informational tiers on both site-specific traits, such as sound classes and distinctive features, as well as contextual and suprasegmental ones, conveyed by cross-site referrals and replication. The method allows to generalize the problem of cognate reflex prediction as a classification problem, with models trained using a parallel corpus of cognate sets. A model using random forests is trained and evaluated on the shared task for reflex prediction, and the experimental results are presented and discussed along with some differences to other implementations.

show abstract

Word prediction in computational historical linguistics

Cited by 6 publications

References 17 publications

A New Framework for Fast Automated Phonological Reconstruction Using Trimmed Alignments and Sound Correspondence Patterns

A New Framework for Fast Automated Phonological Reconstruction Using Trimmed Alignments and Sound Correspondence Patterns

Probing Multilingual Cognate Prediction Models

Approaching Reflex Predictions as a Classification Problem Using Extended Phonological Alignments

Contact Info

Product

Resources

About