Pronunciation-Enhanced Chinese Word Embedding

Yang, Qinjuan; Xie, Haoran; Cheng, Gary; Wang, Fu Lee; Rao, Yanghui

doi:10.1007/s12559-021-09850-9

Cited by 9 publications

(8 citation statements)

References 51 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Irregularities in the phoneme-to-grapheme direction are even more prevalent: on average, five different characters represent one tone syllable ( Chen and Pasquarella, 2017 ). Furthermore, there are characters with more than one pronunciation called polyphonic characters ( Yang et al, 2021 ). Examples of polyphonic characters include “了,” which can be pronounced /le4/or/liao/, and “差,” which can be pronounced in four ways: /cha1/, /cha4/, /chai1/, and/ci1/.…”

Section: Literature Reviewmentioning

confidence: 99%

Effects of orthographic transparency on rhyme judgement

et al. 2023

View full text Add to dashboard Cite

This study investigated the influence of multiliteracy in opaque orthographies on phonological awareness. Using a visual rhyme judgement task in English, we assessed phonological processing in three multilingual and multiliterate populations who were distinguished by the transparency of the orthographies they can read in (N = 135; ages 18–40). The first group consisted of 45 multilinguals literate in English and a transparent Latin orthography like Malay; the second group consisted of 45 multilinguals literate in English and transparent orthographies like Malay and Arabic; and the third group consisted of 45 multilinguals literate in English, transparent orthographies, and Mandarin Chinese, an opaque orthography. Results showed that all groups had poorer performance in the two opaque conditions: rhyming pairs with different orthographic endings and non-rhyming pairs with similar orthographic endings, with the latter posing the greatest difficulty. Subjects whose languages consisted of half or more opaque orthographies performed significantly better than subjects who knew more transparent orthographies than opaque orthographies. The findings are consistent with past studies that used the visual rhyme judgement paradigm and suggest that literacy experience acquired over time relating to orthographic transparency may influence performance on phonological awareness tasks.

show abstract

Section: Literature Reviewmentioning

confidence: 99%

Effects of orthographic transparency on rhyme judgement

et al. 2023

View full text Add to dashboard Cite

show abstract

“…Due to its success in modelling English documents, word embedding has been applied to Chinese text. Benefiting from the internal structural information of Chinese characters, many studies tried to enhance the quality of Chinese word embeddings with radicals [30][31][32], subword components [33,34], glyph features [35], strokes [36], and pronunciation [37]. To limit the scope of this paper, we choose Skip-gram because, after comparing the word embedding model established by the two corpora used in this experiment, we found Skip-gram to have the best performance on average.…”

Section: The Model Architectures For Word Embeddingmentioning

confidence: 99%

An Evaluation Dataset for Legal Word Embedding: A Case Study on Chinese Codex

Lin¹,

Cheng²

2022

Embedded Systems and Applications

View full text Add to dashboard Cite

Word embedding is a modern distributed word representations approach and widely used in many natural language processing tasks. Converting the vocabulary in a legal document into a word embedding model facilitates subjecting legal documents to machine learning, deep learning, and other algorithms and subsequently performing the downstream tasks of natural language processing vis-à-vis, for instance, document classification, contract review, and machine translation. The most common and practical approach of accuracy evaluation with the word embedding model uses a benchmark set with linguistic rules or the relationship between words to perform analogy reasoning via algebraic calculation. This paper proposes establishing an 1,134 Legal Analogical Reasoning Questions Set (LARQS) from the 2,388 Chinese Codex corpus using five kinds of legal relations, which are then used to evaluate the accuracy of the Chinese word embedding model. Moreover, we discovered that legal relations might be ubiquitous in the word embedding model.

show abstract

“…Due to its success in modelling English documents, word embedding has been applied to Chinese text. Benefiting from the internal structural information of Chinese characters, many studies tried to enhance the quality of Chinese word embeddings with radicals [30][31][32], sub-word components [33,34], glyph features [35], strokes [36], and pronunciation [37]. To limit the scope of this paper, we choose Skip-gram because, after comparing the word embedding model established by the two corpora used in this experiment, we found Skip-gram to have the best performance on average.…”

Section: The Model Architectures For Word Embeddingmentioning

confidence: 99%

Untitled

2022

IJNLC

View full text Add to dashboard Cite

Applying natural language processing-related algorithms is currently a popular project in legal applications, for instance, document classification of legal documents, contract review and machine translation. Using the above machine learning algorithms, all need to encode the words in the document in the form of vectors. The word embedding model is a modern distributed word representation approach and the most common unsupervised word encoding method. It facilitates subjecting other algorithms and subsequently performing the downstream tasks of natural language processing vis-à-vis. The most common and practical approach of accuracy evaluation with the word embedding model uses a benchmark set with linguistic rules or the relationship between words to perform analogy reasoning via algebraic calculation. This paper proposes establishing a 1,256 Legal Analogical Reasoning Questions Set (LARQS) from the 2,388 Chinese Codex corpus using five kinds of legal relations, which are then used to evaluate the accuracy of the Chinese word embedding model. Moreover, we discovered that legal relations might be ubiquitous in the word embedding model.

show abstract

Pronunciation-Enhanced Chinese Word Embedding

Cited by 9 publications

References 51 publications

Effects of orthographic transparency on rhyme judgement

Effects of orthographic transparency on rhyme judgement

An Evaluation Dataset for Legal Word Embedding: A Case Study on Chinese Codex

Untitled

Contact Info

Product

Resources

About