Abstract:Recognizing textual entailment comprises the task of determining semantic entailment relations between text fragments. A text fragment entails another text fragment if, from the meaning of the former, one can infer the meaning of the latter. If such relation is bidirectional, then we are in the presence of a paraphrase. Automatically recognizing textual entailment relations captures major semantic inference needs in several natural language processing (NLP) applications. As in many NLP tasks, textual entailment corpora for English abound, while the same is not true for more resource-scarce languages such as Portuguese. Exploiting what seems to be the only Portuguese corpus for textual entailment and paraphrases (the ASSIN corpus), in this paper, we address the task of automatically recognizing textual entailment (RTE) and paraphrases from text written in the Portuguese language, by employing supervised machine learning techniques. We employ lexical, syntactic and semantic features, and analyze the impact of using semantic-based approaches in the performance of the system. We then try to take advantage of the bi-dialect nature of ASSIN to compensate its limited size. With the same aim, we explore modeling the task of recognizing textual entailment and paraphrases as a binary classification problem by considering the bidirectional nature of paraphrases as entailment relationships. Addressing the task as a multi-class classification problem, we achieve results in line with the winner of the ASSIN Challenge. In addition, we conclude that semantic-based approaches are promising in this task, and that combining data from European and Brazilian Portuguese is less straightforward than it may initially seem. The binary classification modeling of the problem does not seem to bring advantages to the original multi-class model, despite the outstanding results obtained by the binary classifier for recognizing textual entailments.
Governmental institutions are employing artificial intelligence techniques to deal with their specific problems and exploit their huge amounts of both structured and unstructured information. In particular, natural language processing and machine learning techniques are being used to process citizen feedback. In this paper, we report on the use of such techniques for analyzing and classifying complaints, in the context of the Portuguese Economic and Food Safety Authority. Grounded in its operational process, we address three different classification problems: target economic activity, implied infraction severity level, and institutional competence. We show promising results obtained using feature-based approaches and traditional classifiers, with accuracy scores above 70%, and analyze the shortcomings of our current results and avenues for further improvement, taking into account the intended use of our classifiers in helping human officers to cope with thousands of yearly complaints.
To overcome the lack of annotated resources in less-resourced languages, recent approaches have been proposed to perform unsupervised language adaptation. In this paper, we explore three recent proposals: Adversarial Training, Sentence Encoder Alignment and Shared-Private Architecture. We highlight the differences of these approaches in terms of unlabeled data requirements and capability to overcome additional domain shift in the data. A comparative analysis in two different tasks is conducted, namely on Sentiment Classification and Natural Language Inference. We show that adversarial training methods are more suitable when the source and target language datasets contain other variations in content besides the language shift. Otherwise, sentence encoder alignment methods are very effective and can yield scores on the target language that are close to the source language scores.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.