2021
DOI: 10.48550/arxiv.2109.00904
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

MultiEURLEX -- A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer

Abstract: We introduce MULTI-EURLEX, a new multilingual dataset for topic classification of legal documents. The dataset comprises 65k European Union (EU) laws, officially translated in 23 languages, annotated with multiple labels from the EUROVOC taxonomy. We highlight the effect of temporal concept drift and the importance of chronological, instead of random splits. We use the dataset as a testbed for zeroshot cross-lingual transfer, where we exploit annotated training documents in one language (source) to classify do… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 28 publications
0
5
0
Order By: Relevance
“…Link ECHR [78] 11,000 116 US Law [79] 7,800 328 EU Law [80] 65,000 492 Contracts [81] 80,000 62 Contracts [82] 9,414 3 Harvard Law case [77] 52,800 86 CaseHOLD [77] CaseLaw-BERT [77] Harvard Law case […”
Section: ) Court View Generation Datasetsmentioning
confidence: 99%
“…Link ECHR [78] 11,000 116 US Law [79] 7,800 328 EU Law [80] 65,000 492 Contracts [81] 80,000 62 Contracts [82] 9,414 3 Harvard Law case [77] 52,800 86 CaseHOLD [77] CaseLaw-BERT [77] Harvard Law case […”
Section: ) Court View Generation Datasetsmentioning
confidence: 99%
“…Link ECHR [63] 1,1000 116 US Law [64] 7,800 328 EU Law [65] 65,000 492 Contracts [66] 80,000 62 Contracts [67] 9,414 3 Harvard Law case [62] 52,800 86 CaseHOLD [62] CaseLaw-BERT [62] Harvard Law case […”
Section: Charge Prediction Datasetsmentioning
confidence: 99%
“…While there can be up to hundreds of entries in a docket, the writers are required to whittle the long list down to typically a dozen or fewer of the documents most essential for understanding the case, which constitutes the source documents in Multi-LexSum. 7 Table 6 describes the rubric for whether to include a document based on its type.…”
Section: B Multi-lexsum Summary Writing and Reviewing Guidelines B1 R...mentioning
confidence: 99%
“…NLP has been applied to a variety of legal document types, including patents [52], legal provisions and contracts [38,49,54], legislative bills [34], and court documents [21,42,62]. The NLP tasks studied in this work range from document/sentence classification [5,7,54] to information extraction [4,25], question answering [30,32,49,63], and-most relevant to our work-automatic summarization [21,29,34,52]. As found in other specialized domains of language, legal NLP systems often benefit from starting from a large language model pre-trained on legal text [6,57,62].…”
Section: Introductionmentioning
confidence: 99%