Phonetic Vector Representations for Sound Sequence Alignment

Sofroniev, Pavel; Çöltekin, Çağrı

doi:10.18653/v1/w18-5812

Cited by 6 publications

(6 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Asr and Jones (2017) use artificial language experiments to study the difference between similarity and relatedness in evaluating distributed semantic models. Phone embeddings induced from phonetic corpora have been used in tasks such as word inflection (Silfverberg et al, 2018) and sound sequence alignment (Sofroniev and Çöltekin, 2018). Silfverberg et al (2018) show that dense vector representations of phones learnt using various techniques are able to solve analogies such as p is to b as t is to X, where X = d. They also show that there is a significant correlation between distinctive feature space and the phone embedding space.…”

Section: Introductionmentioning

confidence: 99%

What do phone embeddings learn about Phonology?

Kolachina¹,

Magyar²

2019

Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology

View full text Add to dashboard Cite

Recent work has looked at evaluation of phone embeddings using sound analogies and correlations between distinctive feature space and embedding space. It has not been clear what aspects of natural language phonology are learnt by neural network inspired distributed representational models such as word2vec. To study the kinds of phonological relationships learnt by phone embeddings, we present artificial phonology experiments that show that phone embeddings learn paradigmatic relationships such as phonemic and allophonic distribution quite well. They are also able to capture co-occurrence restrictions among vowels such as those observed in languages with vowel harmony. However, they are unable to learn co-occurrence restrictions among the class of consonants.

show abstract

Section: Introductionmentioning

confidence: 99%

What do phone embeddings learn about Phonology?

Kolachina¹,

Magyar²

2019

Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology

View full text Add to dashboard Cite

show abstract

“…However, there are multiple ways to improve the results as our models do not incorporate much in terms of cross-lingual signal. In the future, it would be worth integrating this cross-lingual signal in the form of pretrained cross-lingual word embeddings (Artetxe et al, 2016;Lample et al, 2018) or sub-word, e.g., character, embeddings (Chaudhary et al, 2018;Sofroniev and Çöltekin, 2018), as this could lead to better generalization to new languages. Similarly, typological distance between source and target language often correlates with performance (Cotterell and Heigold, 2017), which could be exploited for weighting the contribution of source-language examples when learning a multilingual model.…”

Section: Discussionmentioning

confidence: 99%

Neural and Linear Pipeline Approaches to Cross-lingual Morphological Analysis

Çöltekin¹,

Barnes²

2019

Proceedings of the Sixth Workshop On

Self Cite

View full text Add to dashboard Cite

This paper describes Tübingen-Oslo team's participation in the cross-lingual morphological analysis task in the VarDial 2019 evaluation campaign. We participated in the shared task with a standard neural network model. Our model achieved analysis F1-scores of 31.48 and 23.67 on test languages Karachay-Balkar (Turkic) and Sardinian (Romance) respectively. The scores are comparable to the scores obtained by the other participants in both language families, and the analysis score on the Romance data set was also the best result obtained in the shared task. Besides describing the system used in our shared task participation, we describe another, simpler, model based on linear classifiers, and present further analyses using both models. Our analyses, besides revealing some of the difficult cases, also confirm that the usefulness of a source language in this task is highly correlated with the similarity of source and target languages.

show abstract

“…We suspect that the multilingual training with phonological supervision is a necessary ingredient for this to work -characters from different scripts are never mixed within a single sample, so the grapheme contexts in which they occur are completely disjoint. This idea differs from work on phoneme embeddings (Silfverberg et al, 2018;Sofroniev and Çöltekin, 2018) in that the focus is explicitly on the graphemes. Grapheme embeddings learned for phonological tasks may prove useful for transliteration, or for processing informally romanized text (Irvine et al, 2012) jointly with data from the official orthography.…”

Section: Crosslingual Character Embeddingsmentioning

confidence: 98%

One-Size-Fits-All Multilingual Models

Peters¹,

Martins²

2020

Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

View full text Add to dashboard Cite

This paper presents DeepSPIN's submissions to Tasks 0 and 1 of the SIGMORPHON 2020 Shared Task. For both tasks, we present multilingual models, training jointly on data in all languages. We perform no languagespecific hyperparameter tuning -each of our submissions uses the same model for all languages. Our basic architecture is the sparse sequence-to-sequence model with entmax attention and loss, which allows our models to learn sparse, local alignments while still being trainable with gradient-based techniques. For Task 1, we achieve strong performance with both RNN-and transformer-based sparse models. For Task 0, we extend our RNN-based model to a multi-encoder set-up in which separate modules encode the lemma and inflection sequences. Despite our models' lack of language-specific tuning, they tie for first in Task 0 and place third in Task 1.

show abstract

Phonetic Vector Representations for Sound Sequence Alignment

Cited by 6 publications

References 13 publications

What do phone embeddings learn about Phonology?

What do phone embeddings learn about Phonology?

Neural and Linear Pipeline Approaches to Cross-lingual Morphological Analysis

One-Size-Fits-All Multilingual Models

Contact Info

Product

Resources

About