Linguistic ambiguity is and has always been one of the main challenges in Natural Language Processing (NLP) systems.Modern Transformer architectures like BERT, T5 or more recently InstructGPT have achieved some impressive improvements in many NLP fields, but there is still plenty of work to do. Motivated by the uproar caused by ChatGPT, in this paper we provide an introduction to linguistic ambiguity, its varieties and their relevance in modern NLP, and perform an extensive empiric analysis. ChatGPT strengths and weaknesses are revealed, as well as strategies to get the most of this model.
Dictionaries are one of the oldest and most used linguistic resources. Building them is a complex task that, to the best of our knowledge, has yet to be explored with generative Large Language Models (LLMs). We introduce the "Spanish Built Factual Freectianary" (Spanish-BFF) as the first Spanish AIgenerated dictionary. This first-of-its-kind free dictionary uses GPT-3. We also define future steps we aim to follow to improve this initial commitment to the field, such as more additional languages.
This paper presents a novel and linguistic-driven system for the Spanish Reverse Dictionary task of SemEval-2022 Task 1. The aim of this task is the automatic generation of a word using its gloss. The conclusion is that this task results could improve if the quality of the dataset did as well by incorporating high-quality lexicographic data. Therefore, in this paper we analyze the main gaps in the proposed dataset and describe how these limitations could be tackled.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.