NILC at CWI 2018: Exploring Feature Engineering and Feature Learning

Hartmann, Nathan Siegle; Santos, Leandro Borges dos

doi:10.18653/v1/w18-0540

Cited by 21 publications

(22 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In the BEA workshop [39], on the CWI task for uni/multiword phrase classification, most participant teams preferred ML approaches for their systems. For example, [40] presented three approaches for CWI, one using the traditional classification algorithms of ML based on lexical features (word length, number of syllables, and others) and N-gram features (probabilities of n-gram). Other works outside workshops have also been carried out, such as [41], which used the task dataset to train a convolutional neural network (CNN) with word embeddings and engineered features.…”

Section: Nlp Approaches To Lexical Simplificationmentioning

confidence: 99%

Lexical Simplification System to Improve Web Accessibility

2021

View full text Add to dashboard Cite

People with intellectual, language and learning disabilities face accessibility barriers when reading texts with complex words. Following accessibility guidelines, complex words can be identified, and easy synonyms and definitions can be provided for them as reading aids. To offer support to these reading aids, a lexical simplification system for Spanish has been developed and is presented in this article. The system covers the complex word identification (CWI) task and offers replacement candidates with the substitute generation and selection (SG/SS) task. These tasks have followed machine learning techniques and contextual embeddings using Easy Reading and Plain Language resources, such as dictionaries and corpora. Additionally, due to the polysemy present in the language, the system provides definitions for complex words, which are disambiguated by a rule-based method supported by a state-of-the-art embedding resource. This system is integrated into a web system that provides an easy way to improve the readability and comprehension of Spanish texts. The results obtained are satisfactory; in the CWI task, better results were obtained than with other systems that used the same dataset. The SG/SS task results are comparable to similar works in the English language and provide a solid starting point to improve this task for the Spanish language. Finally, the results of the disambiguation process evaluation were good when evaluated by a linguistic expert. These findings represent an additional advancement in the lexical simplification of texts in Spanish and in a generic domain using easy-to-read resources, among others, to provide systematic support to compliance with accessibility guidelines.

show abstract

Section: Nlp Approaches To Lexical Simplificationmentioning

confidence: 99%

Lexical Simplification System to Improve Web Accessibility

2021

View full text Add to dashboard Cite

show abstract

“…Some of these works found WordNet (Miller, 1998) as a valuable source of lexical features. The main extracted feature is the number of synsets, but also information on hypernyms, hyponyms, holonym, and meronym is useful (Gooding and Kochmar, 2018;Hartmann and Dos Santos, 2018;Wani et al, 2018).…”

Section: Related Workmentioning

confidence: 99%

JCT at SemEval-2021 Task 1: Context-aware Representation for Lexical Complexity Prediction

Liebeskind¹,

Elkayam²,

Liebeskind³

2021

Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

View full text Add to dashboard Cite

In this paper, we present our contribution in SemEval-2021 Task 1: Lexical Complexity Prediction, where we integrate linguistic, statistical, and semantic properties of the target word and its context as features within a Machine Learning (ML) framework for predicting lexical complexity. In particular, we use BERT contextualized word embeddings to represent the semantic meaning of the target word and its context. We participated in the sub-task of predicting the complexity score of single words.

show abstract

“…Com base na nossa experiência no CWI 2018 14 (Hartmann & dos Santos, 2018), em que obtivemos a segunda melhor colocação na tarefa de classificação e terceira melhor colocação na tarefa de classificação probabilística para a língua inglesa (Yimam et al, 2018) Eventuais interseções entre os léxicos dos dicionários foram tratadas. Se uma palavraé complexa para um ano escolar T +2, ela naturalmenté e complexa para os anos escolares T e T + 1.…”

Section: Simplificação Lexicalunclassified

“…Em relaçãoàs etapas de Identificação de Palavras Complexas e Simplificação Lexical, trabalhos recentes têm mostrado que métodos que fazem uso de Feature Learning estão desempenhando melhor do que os métodos que utilizam Feature Engineering (Glavaš &Štajner, 2015;Paetzold & Specia, 2017;Hartmann & dos Santos, 2018;Štajner et al, 2019). Esse cenário está alinhado com os resultados obtidos nesta avaliação.…”

Section: Avaliação Dos Métodos Propostos Para Identificação De Palavrunclassified

Adaptação Lexical Automática em Textos Informativos do Português Brasileiro para o Ensino Fundamental

Hartmann

Aluísio²

2021

Linguamática

View full text Add to dashboard Cite

A Adaptação Textual é uma grande área de pesquisa do Processamento de Línguas Naturais (PLN), bastante conhecida como prática educacional, e possui duas grandes abordagens: a Simplificação e a Elaboração Textual. Não há muitos trabalhos na literatura de PLN que tratam todas as fases da Adaptação Lexical para implementação de sistemas. Vários trabalhos tratam independentemente as tarefas de Simplificação e Elaboração Lexicais, trazendo contribuições parciais, já que cada uma das tarefas possuem seus próprios desafios. Este trabalho propôs um pipeline para a Adaptação Lexical e apresenta contribuições para três das quatro etapas do pipeline, sendo elas: (i) proposta e avaliação de métodos para a tarefa de Identificação de Palavras Complexas; (ii) análise de córpus para levantamento de padrões de Elaboração Lexical do tipo definição; (iii) disponibilização do córpus SIMPLEX-PB 3.0, contendo em sua nova versão definições curtas extraídas de dicionário que foram revisadas manualmente, anotações de termos técnicos extraídas de dicionário, e métricas linguísticas de complexidade lexical; e (iv) proposta e avaliação de métodos para Simplificação Lexical, estabelecendo um novo SOTA para a tarefa aplicada no Português Brasileiro.

show abstract

NILC at CWI 2018: Exploring Feature Engineering and Feature Learning

Cited by 21 publications

References 20 publications

Lexical Simplification System to Improve Web Accessibility

Lexical Simplification System to Improve Web Accessibility

JCT at SemEval-2021 Task 1: Context-aware Representation for Lexical Complexity Prediction

Adaptação Lexical Automática em Textos Informativos do Português Brasileiro para o Ensino Fundamental

Contact Info

Product

Resources

About