Proceedings of the 3rd Workshop on Computational Approaches to Historical Language Change 2022
DOI: 10.18653/v1/2022.lchange-1.1
|View full text |Cite
|
Sign up to set email alerts
|

A Multilingual Benchmark to Capture Olfactory Situations over Time

Abstract: We present a benchmark in six European languages containing manually annotated information about olfactory situations and events following a FrameNet-like approach. The documents selection covers ten domains of interest to cultural historians in the olfactory domain and includes texts published between 1620 to 1920, allowing a diachronic analysis of smell descriptions. With this work, we aim to foster the development of olfactory information extraction approaches as well as the analysis of changes in smell des… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 7 publications
0
6
0
Order By: Relevance
“…Cultural Heritage, such as the study literature, is a knowledge-intensive domain. Datasets that focus on literature typically require expert or near-expert annotators (e.g., graduate students and their advisors) [14,15], or at least, more substantial efforts beyond simple crowdsourcing. This often results in small, imbalanced datasets [16].…”
Section: Motivationmentioning
confidence: 99%
“…Cultural Heritage, such as the study literature, is a knowledge-intensive domain. Datasets that focus on literature typically require expert or near-expert annotators (e.g., graduate students and their advisors) [14,15], or at least, more substantial efforts beyond simple crowdsourcing. This often results in small, imbalanced datasets [16].…”
Section: Motivationmentioning
confidence: 99%
“…As regards the development of structured re-sources to investigate the evolution of sensory language, Menini et al (2022a) present a multilingual taxonomy for olfactory-related terms, which was created semi-automatically, with the goal to describe the evolution of odours and smell sources' descriptions. Furthermore, in Menini et al (2022b), the authors present a multilingual benchmark, manually annotated with smell-related information, to support the development of olfactory information extraction systems.…”
Section: Related Workmentioning
confidence: 99%
“…The need for an ontology that can enable us to be consistent in the annotation of olfactory information across studies was reported by Tonelli and Menini et al (2021) [89] and [90]. On the basis of this ontology, the multilingual Odeuropa benchmark dataset was released [91]. The Odeuropa benchmark dataset is multilingual and consists of historical texts.…”
Section: Computational Linguistics: Extracting Emotions and Smellsmentioning
confidence: 99%
“…For smell extraction, we utilized the English part of the Odeuropa benchmark [91,106] to create a machine learning-based text classification model that predicts whether a sentence is smell-related or not. Although the Odeuropa is annotated at token level, we converted the sentences that contain any smell event annotation to be smell-related and remaining sentences as not-smell-related.…”
Section: Implementation Detailsmentioning
confidence: 99%