Diego Garat scite author profile

Diego Garat

5Publications

50Citation Statements Received

27Citation Statements Given

How they've been cited

How they cite others

Affiliations

University of the Republic

Publications

Order By: Most citations

Is This a Joke? Detecting Humor in Spanish Tweets

Castro

Cubero

Garat

et al. 2016

View full text Add to dashboard Cite

While humor has been historically studied from a psychological, cognitive and linguistic standpoint, its study from a computational perspective is an area yet to be explored in Computational Linguistics. There exist some previous works, but a characterization of humor that allows its automatic recognition and generation is far from being specified. In this work we build a crowdsourced corpus of labeled tweets, annotated according to its humor value, letting the annotators subjectively decide which are humorous. A humor classifier for Spanish tweets is assembled based on supervised learning, reaching a precision of 84% and a recall of 69%.Comment: Preprint version, without referra

show abstract

A Crowd-Annotated Spanish Corpus for Humor Analysis

Castro¹,

Chiruzzo²,

Rosá³

et al. 2018

View full text Add to dashboard Cite

Computational Humor involves several tasks, such as humor recognition, humor generation, and humor scoring, for which it is useful to have human-curated data. In this work we present a corpus of 27,000 tweets written in Spanish and crowd-annotated by their humor value and funniness score, with about four annotations per tweet, tagged by 1,300 people over the Internet. It is equally divided between tweets coming from humorous and non-humorous accounts. The interannotator agreement Krippendorff's alpha value is 0.5710. The dataset is available for general use and can serve as a basis for humor detection and as a first step to tackle subjectivity.

show abstract

Automatic Curation of Court Documents: Anonymizing Personal Data

Garat

Wonsever

2022

Information

View full text Add to dashboard Cite

In order to provide open access to data of public interest, it is often necessary to perform several data curation processes. In some cases, such as biological databases, curation involves quality control to ensure reliable experimental support for biological sequence data. In others, such as medical records or judicial files, publication must not interfere with the right to privacy of the persons involved. There are also interventions in the published data with the aim of generating metadata that enable a better experience of querying and navigation. In all cases, the curation process constitutes a bottleneck that slows down general access to the data, so it is of great interest to have automatic or semi-automatic curation processes. In this paper, we present a solution aimed at the automatic curation of our National Jurisprudence Database, with special focus on the process of the anonymization of personal information. The anonymization process aims to hide the names of the participants involved in a lawsuit without losing the meaning of the narrative of facts. In order to achieve this goal, we need, not only to recognize person names but also resolve co-references in order to assign the same label to all mentions of the same person. Our corpus has significant differences in the spelling of person names, so it was clear from the beginning that pre-existing tools would not be able to reach a good performance. The challenge was to find a good way of injecting specialized knowledge about person names syntax while taking profit of previous capabilities of pre-trained tools. We fine-tuned an NER analyzer and we built a clusterization algorithm to solve co-references between named entities. We present our first results, which, for both tasks, are promising: We obtained a 90.21% of F1-micro in the NER task—from a 39.99% score before retraining the same analyzer in our corpus—and a 95.95% ARI score in clustering for co-reference resolution.

show abstract

A Crowd-Annotated Spanish Corpus for Humor Analysis

Castro¹,

Chiruzzo²,

Rosá³

et al. 2017

Preprint

View full text Add to dashboard Cite

A constraint parser for contextual rules

Garat

Wonsever

View full text Add to dashboard Cite

In this paper we describe a constraint analyser for contextual rules. Contextual rules constitute a rule-based formalism that allows rewriting of terminals and/or non terminal sequences taking in account their context. The formalism allows also to refer to portions of text by means of exclusion zones, that is, patterns that are only specified by a maximun length and a set of not allowed categories. The constraint approach reveals particulary useful for this type of rules, as decisions can be taken under the hypothesis of non existence of the excluded categories. If these categories are finally deduced, all other categories inferred upon the non existence of the former ones are automatically eliminated. The parser has been implemented using Constraint Handling Rules. Some results with a set of rules oriented to the segmentation of texts in propositions are shown.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Diego Garat

Is This a Joke? Detecting Humor in Spanish Tweets

A Crowd-Annotated Spanish Corpus for Humor Analysis

Automatic Curation of Court Documents: Anonymizing Personal Data

A Crowd-Annotated Spanish Corpus for Humor Analysis

A constraint parser for contextual rules

Contact Info

Product

Resources

About