Lexical Simplification with Pretrained Encoders

Qiang, Jipeng; Li, Yun; Zhu, Yi; Yuan, Yunhao; Wu, Xindong

doi:10.1609/aaai.v34i05.6389

Cited by 62 publications

(57 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our approach to the MWLS problem, called the Plainifier, is an extension of the unsupervised method for single word lexical simplification by Qiang et al (2020). Following their work, we generate candidate replacements using BERT predictions for a given context and rank them according to language-model probability, simplicity and similarity of meaning to the original text.…”

Section: Simplification Methodsmentioning

confidence: 99%

“…In step (a), a gap created by removing the replaced token is filled with a single [MASK], for which the predictions are acquired from Terse-BERT. This method is used by Qiang et al (2020) to obtain one-word candidates, such as The cat sleeps on the mat, and their likelihoods. The multi-words setting in Plainifer requires step (b), in which the gap is filled with two [MASK] elements and the best (according to ranking described in section 4.3) K predictions for the first position are obtained.…”

Section: Candidate Generationmentioning

confidence: 99%

“…Secondly, we design a method for generating such simplifications automatically. Our solution, called the Plainifier, is inspired by a recently proposed method for LS utilising language models (Qiang et al, 2020), which we extend so that multi-word simplifications can be obtained. In order to encourage more research on the problem, we make the dataset 1 , the language model 2 and the Plainifer code 3 openly available.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Multi-Word Lexical Simplification

Przybyła

Shardlow

2020

Proceedings of the 28th International Conference on Computational Linguistics

View full text Add to dashboard Cite

In this work we propose the task of multi-word lexical simplification, in which a sentence in natural language is made easier to understand by replacing its fragment with a simpler alternative, both of which can consist of many words. In order to explore this new direction, we contribute a corpus (MWLS1), including 1462 sentences in English from various sources with 7059 simplifications provided by human annotators. We also propose an automatic solution (Plainifier) based on a purpose-trained neural language model and evaluate its performance, comparing to human and resource-based baselines.

show abstract

Section: Simplification Methodsmentioning

confidence: 99%

Section: Candidate Generationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Multi-Word Lexical Simplification

Przybyła

Shardlow

2020

Proceedings of the 28th International Conference on Computational Linguistics

View full text Add to dashboard Cite

show abstract

“…They also tend to change the meaning of the sentence and have problems dealing with ambiguous words [30] [31] [32]. However, recently, unsupervised approaches have been improved in this regard by allowing more detailed context information to be obtained [33]. On the other hand, hybrid strategies employ methods from both the previous two approaches, such as [34], which uses a corpus-based approach and a combination of a free lexicon, decision trees, and context-based rules.…”

Section: Nlp Approaches To Lexical Simplificationmentioning

confidence: 99%

Lexical Simplification System to Improve Web Accessibility

2021

View full text Add to dashboard Cite

People with intellectual, language and learning disabilities face accessibility barriers when reading texts with complex words. Following accessibility guidelines, complex words can be identified, and easy synonyms and definitions can be provided for them as reading aids. To offer support to these reading aids, a lexical simplification system for Spanish has been developed and is presented in this article. The system covers the complex word identification (CWI) task and offers replacement candidates with the substitute generation and selection (SG/SS) task. These tasks have followed machine learning techniques and contextual embeddings using Easy Reading and Plain Language resources, such as dictionaries and corpora. Additionally, due to the polysemy present in the language, the system provides definitions for complex words, which are disambiguated by a rule-based method supported by a state-of-the-art embedding resource. This system is integrated into a web system that provides an easy way to improve the readability and comprehension of Spanish texts. The results obtained are satisfactory; in the CWI task, better results were obtained than with other systems that used the same dataset. The SG/SS task results are comparable to similar works in the English language and provide a solid starting point to improve this task for the Spanish language. Finally, the results of the disambiguation process evaluation were good when evaluated by a linguistic expert. These findings represent an additional advancement in the lexical simplification of texts in Spanish and in a generic domain using easy-to-read resources, among others, to provide systematic support to compliance with accessibility guidelines.

show abstract

“…1 (Petersen and Ostendorf 2007) ( 3 http://www.dianamccarthy.co.uk/task10index.html 4 https://www.cs.york.ac.uk/semeval-2012/task1/ 5 http://people.cs.kuleuven.be/˜jan.debelder/lseval.zip 6 https://simple.wikipedia.org/wiki/Wikipedia:Basic English combined wordlist 7 https://cs.pomona.edu/˜dkauchak/simplification/ 1 (Zhu et al 2010) PWKP / WikiSmall 108K Wikipedia (Coster and Kauchak 2011) 137K Wikipedia (Xu, Callison-Burch, and Napoles 2015) Newsela 96K (Hwang, Hajishirzi, Ostendorf, and Wu 2015) 392K Wikipedia (Kajiwara and Komachi 2016) 493K Wikipedia (Zhang and Lapata 2017) WikiLarge 286K Wikipedia (Maruyama and Yamamoto 2018) SNOW T15 50K (Katsuta and Yamamoto 2018) SNOW T23 35K (Paetzold and Specia 2017b;Qiang, Li, Zhu, Yuan, and Wu 2020) LSeval (3)…”

mentioning

confidence: 99%

Language Resources for Japanese Lexical Simplification

Kajiwara

Nishihara

Kodaira

et al. 2020

Journal of Natural Language Processing

View full text Add to dashboard Cite

This study introduces three language resources for Japanese lexical simplification: 1) an evaluation dataset, 2) lexica, and 3) a toolkit that can be used to develop and benchmark Japanese lexical simplification systems. The word complexity lexicon adopted in this study was automatically expanded using a classifier trained on a small word complexity lexicon created by Japanese language teachers. Based on this word complexity estimator, simplified word pairs were extracted from a large-scale synonym lexicon, and a simplified synonym lexicon that is useful for lexical simplification was developed. In addition, a Python library, which implements automatic evaluation and key methods in each subtask to ease the construction process of a lexical simplification pipeline, was developed. The experimental results on the developed evaluation dataset revealed that the proposed method, which is based on the developed lexicon, achieves the highest performance of Japanese lexical simplification.

show abstract

Lexical Simplification with Pretrained Encoders

Cited by 62 publications

References 12 publications

Multi-Word Lexical Simplification

Multi-Word Lexical Simplification

Lexical Simplification System to Improve Web Accessibility

Language Resources for Japanese Lexical Simplification

Contact Info

Product

Resources

About