Giovanni Bonisoli scite author profile

Giovanni Bonisoli

3Publications

8Citation Statements Received

63Citation Statements Given

How they've been cited

How they cite others

Affiliations

University of Modena and Reggio Emilia, Ferrari (Italy)

Publications

Order By: Most citations

Supervised and Unsupervised Categorization of an Imbalanced Italian Crime News Dataset

Rollo

Bonisoli

2022

View full text Add to dashboard Cite

The automatic categorization of crime news is useful to create statistics on the type of crimes occurring in a certain area. This assignment can be treated as a text categorization problem. Several studies have shown that the use of word embeddings improves outcomes in many Natural Language Processing (NLP), including text categorization. The scope of this paper is to explore the use of word embeddings for Italian crime news text categorization. The approach followed is to compare different document pre-processing, Word2Vec models and methods to obtain word embeddings, including the extraction of bigrams and keyphrases. Then, supervised and unsupervised Machine Learning categorization algorithms have been applied and compared. In addition, the imbalance issue of the input dataset has been addressed by using Synthetic Minority Oversampling Technique (SMOTE) to oversample the elements in the minority classes. Experiments conducted on an Italian dataset of 17,500 crime news articles collected from 2011 till 2021 show very promising results. The supervised categorization has proven to be better than the unsupervised categorization, overcoming 80% both in precision and recall, reaching an accuracy of 0.86. Furthermore, lemmatization, bigrams and keyphrase extraction are not so decisive. In the end, the availability of our model on GitHub together with the code we used to extract word embeddings allows replicating our approach to other corpus either in Italian or other languages.

show abstract

Using Word Embeddings for Italian Crime News Categorization

Bonisoli

Rollo

2021

View full text Add to dashboard Cite

Several studies have shown that the use of embeddings improves outcomes in many Natural Language Processing (NLP) activities, including text categorization. This paper focuses on how word embeddings can be used on newspaper articles related to crimes. The scope is the categorization of the news articles based on the type of crime they report. We compare different Word2Vec models and methods to obtain word embeddings. Then, we exploit both supervised and unsupervised Machine Learning categorization algorithms. Experiments were conducted on an Italian dataset of 15,361 crime news articles showing very promising results.

show abstract

DICE: a Dataset of Italian Crime Event news

Bonisoli

Buono²,

et al. 2023

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.