incom.py – A Toolbox for Calculating Linguistic Distances and Asymmetries between Related Languages

Mosbach, Marius; Stenger, Irina; Avgustinova, Tania; Klakow, Dietrich

doi:10.26615/978-954-452-056-4_094

Cited by 6 publications

(27 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As a representation of the (dis-)similarity of the PL stimulus toward CS, a measure referred to as total pronunciation-based distance is determined for the whole sentence, the final 3-g, 2-g, and target word and examined for correlations with intelligibility. The distances are calculated automatically with the help of the incom.py toolbox (Mosbach et al, 2019 ) for each word. Distances of the 2-g, 3-g, and sentences are the mean distances of the individual words they consist of.…”

Section: Methodsmentioning

confidence: 99%

On the Correlation of Context-Aware Language Models With the Intelligibility of Polish Target Words to Czech Readers

et al. 2021

Self Cite

View full text Add to dashboard Cite

This contribution seeks to provide a rational probabilistic explanation for the intelligibility of words in a genetically related language that is unknown to the reader, a phenomenon referred to as intercomprehension. In this research domain, linguistic distance, among other factors, was proved to correlate well with the mutual intelligibility of individual words. However, the role of context for the intelligibility of target words in sentences was subject to very few studies. To address this, we analyze data from web-based experiments in which Czech (CS) respondents were asked to translate highly predictable target words at the final position of Polish sentences. We compare correlations of target word intelligibility with data from 3-g language models (LMs) to their correlations with data obtained from context-aware LMs. More specifically, we evaluate two context-aware LM architectures: Long Short-Term Memory (LSTMs) that can, theoretically, take infinitely long-distance dependencies into account and Transformer-based LMs which can access the whole input sequence at the same time. We investigate how their use of context affects surprisal and its correlation with intelligibility.

show abstract

Section: Methodsmentioning

confidence: 99%

On the Correlation of Context-Aware Language Models With the Intelligibility of Polish Target Words to Czech Readers

et al. 2021

Self Cite

View full text Add to dashboard Cite

show abstract

“…In the present study we extend the incom.py toolbox 4 (Mosbach et al, 2019) focusing on mutual intelligibility aspects in oral intercomprehension. First, we compare the available measuring methods for linguistic distances and asymmetries -i.e., Levenshtein distance and word adaptation surprisal -as predictors of mutual intelligibility in auditory perception and add word adaptation entropy as an additional metric for asymmetric intelligibility.…”

Section: This Papermentioning

confidence: 99%

incom.py 2.0 – Calculating Linguistic Distances and Asymmetries in Auditory Perception of Closely Related Languages

Mosbach

Stenger

Avgustinova

et al. 2021

Proceedings of the Conference Recent Advances in Natural Language Processing - Deep Learning for Natural Language Processing Me

Self Cite

View full text Add to dashboard Cite

We present an extended version of a tool developed for calculating linguistic distances and asymmetries in auditory perception of closely related languages. Along with evaluating the metrics available in the initial version of the tool, we introduce word adaptation entropy as an additional metric of linguistic asymmetry. Potential predictors of speech intelligibility are validated with human performance in spoken cognate recognition experiments for Bulgarian and Russian. Special attention is paid to the possibly different contributions of vowels and consonants in oral intercomprehension. Using incom.py 2.0 it is possible to calculate, visualize, and validate three measurement methods of linguistic distances and asymmetries as well as carrying out regression analyses in speech intelligibility between related languages.

show abstract

“…Employing a modified Levenshtein algorithm [Levenshtein 1965], which disallows matching between a vowel and a consonant; we have calculated the orthographic and the phonetic 9 distances between 120 BG-RU cognate pairs. This objective measure, we calculated automatically using the incompy tool of [Mosbach et al 2019]. While in the basic form of the algorithm all string operations have the same cost, we use 0 for the cost of mapping a character/sound to itself, e.g.…”

Section: Predictors Of Mutual Intelligibility 41 Levenshtein Distancementioning

confidence: 99%

“…For example, CAS is defined as in (1). Since WAS between two words is computed by summing up the CAS and the SAS values of the contained characters and sounds in the aligned word pair, it strongly depends on the number of available word pairs (for more details see [Mosbach et al 2019], [Stenger 2019]). Finally, we normalize the WAS based on the set of 120 BG-RU cognates.…”

Section: Word Adaptation Surprisalmentioning

confidence: 99%

Visual vs. Auditory Perception of Bulgarian Stimuli by Russian Native Speakers

Stenger¹,

Avgustinova²

2020

Computational Linguistics and Intellectual Technologies

View full text Add to dashboard Cite

This study contributes to a better understanding of receptive multilingualism by determining similarities and differences in successful processing of written and spoken cognate words in an unknown but (closely) related language. We investigate two Slavic languages with regard to their mutual intelligibility. The current focus is on the recognition of isolated Bulgarian words by Russian native speakers in a cognate guessing task, considering both written and audio stimuli. The experimentally obtained intercomprehension scores show a generally high degree of intelligibility of Bulgarian cognates to Russian subjects, as well as processing difficulties in case of visual vs. auditory perception. In search of an explanation, we examine the linguistic factors that can contribute to various degrees of written and spoken word intelligibility. The intercomprehension scores obtained in the online word translation experiments are correlated with (i) the identical and mismatched correspondences on the orthographic and phonetic level, (ii) the word length of the stimuli, and (iii) the frequency of Russian cognates. Additionally we validate two measuring methods: the Levenshtein distance and the word adaptation surprisal as potential pr

show abstract

incom.py – A Toolbox for Calculating Linguistic Distances and Asymmetries between Related Languages

Cited by 6 publications

References 21 publications

On the Correlation of Context-Aware Language Models With the Intelligibility of Polish Target Words to Czech Readers

On the Correlation of Context-Aware Language Models With the Intelligibility of Polish Target Words to Czech Readers

incom.py 2.0 – Calculating Linguistic Distances and Asymmetries in Auditory Perception of Closely Related Languages

Visual vs. Auditory Perception of Bulgarian Stimuli by Russian Native Speakers

Contact Info

Product

Resources

About