We present the first attempt at using sequence to sequence neural networks to model text simplification (TS). Unlike the previously proposed automated TS systems, our neural text simplification (NTS) systems are able to simultaneously perform lexical simplification and content reduction. An extensive human evaluation of the output has shown that NTS systems achieve almost perfect grammaticality and meaning preservation of output sentences and higher level of simplification than the state-of-the-art automated TS systems.
Simplification of lexically complex texts, by replacing complex words with their simpler synonyms, helps non-native speakers, children, and language-impaired people understand text better. Recent lexical simplification methods rely on manually simplified corpora, which are expensive and time-consuming to build. We present an unsupervised approach to lexical simplification that makes use of the most recent word vector representations and requires only regular corpora. Results of both automated and human evaluation show that our simple method is as effective as systems that rely on simplified corpora.
We report the findings of the second Complex Word Identification (CWI) shared task organized as part of the BEA workshop colocated with NAACL-HLT'2018. The second CWI shared task featured multilingual and multi-genre datasets divided into four tracks: English monolingual, German monolingual, Spanish monolingual, and a multilingual track with a French test set, and two tasks: binary classification and probabilistic classification. A total of 12 teams submitted their results in different task/track combinations and 11 of them wrote system description papers that are referred to in this report and appear in the BEA workshop proceedings.
The way in which a text is written can be a barrier for many people. Automatic text simplification is a natural language processing technology that, when mature, could be used to produce texts that are adapted to the specific needs of particular users. Most research in the area of automatic text simplification has dealt with the English language. In this article, we present results from the Simplext project, which is dedicated to automatic text simplification for Spanish. We present a modular system with dedicated procedures for syntactic and lexical simplification that are grounded on the analysis of a corpus manually simplified for people with special needs. We carried out an automatic evaluation of the system’s output, taking into account the interaction between three different modules dedicated to different simplification aspects. One evaluation is based on readability metrics for Spanish and shows that the system is able to reduce the lexical and syntactic complexity of the texts. We also show, by means of a human evaluation, that sentence meaning is preserved in most cases. Our results, even if our work represents the first automatic text simplification system for Spanish that addresses different linguistic aspects, are comparable to the state of the art in English Automatic Text Simplification.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.