Proceedings of the 28th International Conference on Computational Linguistics 2020
DOI: 10.18653/v1/2020.coling-main.79
|View full text |Cite
|
Sign up to set email alerts
|

Cross-lingual Annotation Projection in Legal Texts

Abstract: We study annotation projection in text classification problems where source documents are published in multiple languages and may not be an exact translation of one another. In particular, we focus on the detection of unfair clauses in privacy policies and terms of service. We present the first English-German parallel asymmetric corpus for the task at hand. We study and compare several language-agnostic sentence-level projection methods. Our results indicate that a combination of word embeddings and dynamic ti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2
2

Relationship

3
6

Authors

Journals

citations
Cited by 11 publications
(11 citation statements)
references
References 31 publications
0
11
0
Order By: Relevance
“…We employ the same methodology proposed in Galassi et al (2020), by extending the study to two additional languages, namely Italian and Polish. The general idea of the methodology is to transfer the annotations of a given English document onto the same document, given in a different language.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…We employ the same methodology proposed in Galassi et al (2020), by extending the study to two additional languages, namely Italian and Polish. The general idea of the methodology is to transfer the annotations of a given English document onto the same document, given in a different language.…”
Section: Methodsmentioning
confidence: 99%
“…From a machine learning point of view, the question is whether we shall necessarily train independent models for each and every language. This paper only partially answers the question, by extending previous work (Galassi et al, 2020) towards three different languages and by making a novel multilingual corpus available to the community. Future research will consider training and comparing the classifiers.…”
Section: Introductionmentioning
confidence: 99%
“…They experimented with monolingual SVM classifiers and their combination as a multilingual ensemble. More recently, Galassi et al (2020) transferred sentence-level gold labels from annotated English to non-annotated German sentences, for the task of identifying unfair clauses in Terms of Service (2.7k sentences) and Privacy Policy documents (1.8k). They experimented with similarity-based methods aligning the English sentences to machine-translations of the German sentences.…”
Section: Related Workmentioning
confidence: 99%
“…The similarity between two embeddings can be computed using any similarity function that operates on high-dimensional numerical vectors. We use the Bray-Curtis similarity [3] since it has led to satisfactory results in previous works [17], but other measures, such as cosine similarity [20], may be valid alternative. A possible alternative to the use of sentence embeddings combined with a similarity measure may be the use of neural architectures specifically trained to perform this task, such as cross-encoders [29].…”
Section: Language Modulementioning
confidence: 99%