Proceedings of the Conference Recent Advances in Natural Language Processing - Deep Learning for Natural Language Processing Me 2021
DOI: 10.26615/978-954-452-072-4_108
|View full text |Cite
|
Sign up to set email alerts
|

TREMoLo-Tweets: a Multi-Label Corpus of French Tweets for Language Register Characterization

Abstract: The casual, neutral, and formal language registers are highly perceptible in discourse productions. However, they are still poorly studied in Natural Language Processing (NLP), especially outside English, and for new textual types like tweets.To stimulate research, this paper introduces a large corpus of 228,505 French tweets (6M words) annotated in language registers. Labels are provided by a multi-label CamemBERT classifier trained and checked on a manually annotated subset of the corpus, while the tweets ar… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
5
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(5 citation statements)
references
References 12 publications
0
5
0
Order By: Relevance
“…Furthermore, due to the larger number of annotators and output labels, as well as low IAA, the individual degrees of belonging for specific labels in our dataset become significantly small. As a result, the thresholding approach (>0.5) employed in [70] for converting degrees of belonging to one-hot encoded vectors is inadequate for emotions in our case. To address this, we adopt an alternative approach by selecting the top N emotion labels with the highest degrees of belonging.…”
Section: Post-processing Annotationsmentioning
confidence: 97%
See 4 more Smart Citations
“…Furthermore, due to the larger number of annotators and output labels, as well as low IAA, the individual degrees of belonging for specific labels in our dataset become significantly small. As a result, the thresholding approach (>0.5) employed in [70] for converting degrees of belonging to one-hot encoded vectors is inadequate for emotions in our case. To address this, we adopt an alternative approach by selecting the top N emotion labels with the highest degrees of belonging.…”
Section: Post-processing Annotationsmentioning
confidence: 97%
“…For example, two annotators assigning rank 1 to a text is not equivalent to one assigning rank 1 and the other assigning rank 2 or 3. This problem is addressed in [70] by introducing an approach that transforms ranked annotations from multiple annotators into normalized degrees of class membership. These values range from 0 to 1 and sum up to 1, where higher values correspond to a more pronounced presence of emotion.…”
Section: Post-processing Annotationsmentioning
confidence: 99%
See 3 more Smart Citations