Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology 2018
DOI: 10.18653/v1/w18-5812
|View full text |Cite
|
Sign up to set email alerts
|

Phonetic Vector Representations for Sound Sequence Alignment

Abstract: This study explores a number of data-driven vector representations of the IPA-encoded sound segments for the purpose of sound sequence alignment. We test the alternative representations based on the alignment accuracy in the context of computational historical linguistics. We show that the data-driven methods consistently do better than linguistically-motivated articulatoryacoustic features. The similarity scores obtained using the data-driven representations in a monolingual context, however, performs worse t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
6

Relationship

2
4

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 13 publications
0
6
0
Order By: Relevance
“…Asr and Jones (2017) use artificial language experiments to study the difference between similarity and relatedness in evaluating distributed semantic models. Phone embeddings induced from phonetic corpora have been used in tasks such as word inflection (Silfverberg et al, 2018) and sound sequence alignment (Sofroniev and Çöltekin, 2018). Silfverberg et al (2018) show that dense vector representations of phones learnt using various techniques are able to solve analogies such as p is to b as t is to X, where X = d. They also show that there is a significant correlation between distinctive feature space and the phone embedding space.…”
Section: Introductionmentioning
confidence: 99%
“…Asr and Jones (2017) use artificial language experiments to study the difference between similarity and relatedness in evaluating distributed semantic models. Phone embeddings induced from phonetic corpora have been used in tasks such as word inflection (Silfverberg et al, 2018) and sound sequence alignment (Sofroniev and Çöltekin, 2018). Silfverberg et al (2018) show that dense vector representations of phones learnt using various techniques are able to solve analogies such as p is to b as t is to X, where X = d. They also show that there is a significant correlation between distinctive feature space and the phone embedding space.…”
Section: Introductionmentioning
confidence: 99%
“…However, there are multiple ways to improve the results as our models do not incorporate much in terms of cross-lingual signal. In the future, it would be worth integrating this cross-lingual signal in the form of pretrained cross-lingual word embeddings (Artetxe et al, 2016;Lample et al, 2018) or sub-word, e.g., character, embeddings (Chaudhary et al, 2018;Sofroniev and Çöltekin, 2018), as this could lead to better generalization to new languages. Similarly, typological distance between source and target language often correlates with performance (Cotterell and Heigold, 2017), which could be exploited for weighting the contribution of source-language examples when learning a multilingual model.…”
Section: Discussionmentioning
confidence: 99%
“…We suspect that the multilingual training with phonological supervision is a necessary ingredient for this to work -characters from different scripts are never mixed within a single sample, so the grapheme contexts in which they occur are completely disjoint. This idea differs from work on phoneme embeddings (Silfverberg et al, 2018;Sofroniev and Çöltekin, 2018) in that the focus is explicitly on the graphemes. Grapheme embeddings learned for phonological tasks may prove useful for transliteration, or for processing informally romanized text (Irvine et al, 2012) jointly with data from the official orthography.…”
Section: Crosslingual Character Embeddingsmentioning
confidence: 98%