Syllable-aware Neural Language Models: A Failure to Beat Character-aware Ones

Assylbekov, Zhenisbek; Takhanov, Rustem; Myrzakhmetov, Bagdat; Washington, Jonathan North

doi:10.18653/v1/d17-1199

Cited by 10 publications

(27 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, external fragmentation of words into morphemes propagate errors into the models, affecting the quality of the word embeddings [29]. Our work is similar to those of Assylbekov et al, Yu et al and Mikolov et al [9,21,30] on the basis of learning syllable and word representation. However, we utilized a defined syllabic alphabet instead of an external hyphenation algorithm to divide the words into syllables, which we hypothesize that may introduce errors.…”

Section: Introductionsupporting

confidence: 65%

“…This is motivated by the CNN's ability to extract high quality features, leading to CNN models posting significant results in sentiment analysis [49], parsing [50], search query retrieval [51] and part-of-speech tagging [52]. The recent trend is to combine the strengths of the CNN and the RNN to design superior models for NLP [21,44,49,53].…”

Section: Deep Learningmentioning

confidence: 99%

“…For this reason, we propose syllabic based word embeddings (WEFSE) to match Swahili's complex word morphology, as opposed to using characters or morphemes. This study generated word embeddings from syllable embeddings (WEFSE) but differently from Assylbekov et al [21], who used an external hyphenator to segment the words into syllables. We hypothesize that learning word representations from syllabic alphabets captures both semantic meaning of words and handles new words.…”

Section: Introductionmentioning

confidence: 99%

“…The architecture of our model resembles that of Assylbekov et al [21]. Both models apply a convolutional neural network [31] to extract features and compose the word embeddings, a highway network [32] to model interactions among the syllables and finally a recurrent neural network language model [3].…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Better Word Representation Vectors Using Syllabic Alphabet: A Case Study of Swahili

et al. 2019

View full text Add to dashboard Cite

Featured Application: This work is applicable in computer science, software engineering and computational linguistic specifically in natural language processing. Abstract: Deep learning has extensively been used in natural language processing with sub-word representation vectors playing a critical role. However, this cannot be said of Swahili, which is a low resource and widely spoken language in East and Central Africa. This study proposed novel word embeddings from syllable embeddings (WEFSE) for Swahili to address the concern of word representation for agglutinative and syllabic-based languages. Inspired by the learning methodology of Swahili in beginner classes, we encoded respective syllables instead of characters, character n-grams or morphemes of words and generated quality word embeddings using a convolutional neural network. The quality of WEFSE was demonstrated by the state-of-art results in the syllable-aware language model on both the small dataset (31.229 perplexity value) and the medium dataset (45.859 perplexity value), outperforming character-aware language models. We further evaluated the word embeddings using word analogy task. To the best of our knowledge, syllabic alphabets have not been used to compose the word representation vectors. Therefore, the main contributions of the study are a syllabic alphabet, WEFSE, a syllabic-aware language model and a word analogy dataset for Swahili.

show abstract

Section: Introductionsupporting

confidence: 65%

Section: Deep Learningmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Better Word Representation Vectors Using Syllabic Alphabet: A Case Study of Swahili

et al. 2019

View full text Add to dashboard Cite

show abstract

“…• CharCNN (Kim et al, 2016) is a characteraware convolutional model, which performs on par with the 2014-2015 state-of-the-art wordlevel LSTM model (Zaremba et al, 2014) despite having 60% fewer parameters. • SylConcat is a simple concatenation of syllable embeddings suggested by Assylbekov et al (2017), which underperforms CharCNN but has fewer parameters and is trained faster. • MorphSum is a summation of morpheme embeddings, which is similar to the approach of Botha and Blunsom (2014) with one important difference: the embedding of the word itself is not included into the sum.…”

Section: Data Setmentioning

confidence: 99%

Reusing Weights in Subword-Aware Neural Language Models

Assylbekov¹,

Takhanov²

2018

Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu

Self Cite

View full text Add to dashboard Cite

We propose several ways of reusing subword embeddings and other weights in subwordaware neural language models. The proposed techniques do not benefit a competitive character-aware model, but some of them improve the performance of syllable-and morpheme-aware models while showing significant reductions in model sizes. We discover a simple hands-on principle: in a multilayer input embedding model, layers should be tied consecutively bottom-up if reused at output. Our best morpheme-aware model with properly reused weights beats the competitive word-level model by a large margin across multiple languages and has 20%-87% fewer parameters.

show abstract

Major–Minor Long Short-Term Memory for Word-Level Language Model

Shuang

et al. 2020

IEEE Trans. Neural Netw. Learning Syst.

View full text Add to dashboard Cite

Language model plays an important role in natural language processing (NLP) systems like machine translation, speech recognition, learning token embeddings, natural language generation and text classification. Recently, the multi-layer Long Short-Term Memory (LSTM) models have been demonstrated to achieve promising performance on word-level language modeling. For each LSTM layer, larger hidden size usually means more diverse semantic features, which enables the language model to perform better. However, we have observed that when a certain LSTM layer reaches a sufficiently large scale, the promotion of overall effect will slow down as its hidden size increases. In this paper, we analyze that an important factor leading to this phenomenon is the high correlation between the newly extended hidden states and original hidden states, which hinders diverse feature expression of the LSTM. As a result, when the scale is large enough, simply lengthening the LSTM hidden states will cost tremendous extra parameters but has little effect. We propose a simple yet effective improvement on each LSTM layer consisting of a large-scale Major LSTM and a smallscale Minor LSTM to break the high correlation between the two parts of hidden states, which we call Major-Minor LSTMs (MMLSTMs). In experiments, we demonstrate the language model with MMLSTMs surpasses the existing state-of-the-art model on Penn Treebank (PTB) and WikiText-2 (WT2) datasets, and outperforms the baseline by 3.3 points in perplexity on WikiText-103 dataset without increasing model parameter counts.

show abstract

Syllable-aware Neural Language Models: A Failure to Beat Character-aware Ones

Cited by 10 publications

References 18 publications

Better Word Representation Vectors Using Syllabic Alphabet: A Case Study of Swahili

Better Word Representation Vectors Using Syllabic Alphabet: A Case Study of Swahili

Reusing Weights in Subword-Aware Neural Language Models

Major–Minor Long Short-Term Memory for Word-Level Language Model

Contact Info

Product

Resources

About