It has never been easier to get access to content. Rather, we are facing an ever increasing overload which undermines the ability to identify high-quality content relevant to the user. Automatic summarization techniques have been developed to distil down the content to the key points and shorten therewith the time required to grasp the essence and judge about the relevance of the document. Summarization is not a deterministic task and depends very much on the writing style of the person creating the summary. In this work we present a method to, given a set of human-created summaries for a corpus, establishes which automatic extractive summarization technique preserves best the style of the human summary writer. To prove our approach, we use a corpus of 1000 articles by Science Daily with the corresponding human-written summaries and benchmark 3 extractive summarization techniques (BERT-based, keyword-scoring-based and a Luhn summarizer), indicating the best style-preserving method and discussing the results.
English. We describe the creation of HurtLex, a multilingual lexicon of hate words. The starting point is the Italian hate lexicon developed by the linguist Tullio De Mauro, organized in 17 categories. It has been expanded through the link to available synset-based computational lexical resources such as Mul-tiWordNet and BabelNet, and evolved in a multi-lingual perspective by semiautomatic translation and expert annotation. A twofold evaluation of HurtLex as a resource for hate speech detection in social media is provided: a qualitative evaluation against an Italian annotated Twitter corpus of hate against immigrants, and an extrinsic evaluation in the context of the AMI@Ibereval2018 shared task, where the resource was exploited for extracting domain-specific lexicon-based features for the supervised classification of misogyny in English and Spanish tweets.Italiano. L'articolo descrive lo sviluppo di Hurtlex, un lessico multilingue di parole per ferire. Il punto di partenzaè il lessico di parole d'odio italiane sviluppato dal linguista Tullio De Mauro, organizzato in 17 categorie. Il lessicoè stato espanso sfruttando risorse lessicali sviluppate dalla comunità di Linguistica Computazionale come MultiWordNet e Babel-Net e le sue controparti in altre lingue sono state generate semi-automaticamente con traduzione ed annotazione manuale di esperti. Viene presentata sia un'analisi qualitativa della nuova risorsa, mediante l'analisi di corpus di tweet italiani annotati per odio nei confronti dei migranti e una valutazione estrinseca, mediante l'uso della risorsa nell'ambito dello sviluppo di un sistema Automatic Misogyny Identification in tweet in spagnolo ed inglese.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.