Proceedings of the Conference Recent Advances in Natural Language Processing - Deep Learning for Natural Language Processing M 2021
DOI: 10.26615/978-954-452-072-4_012
|View full text |Cite
|
Sign up to set email alerts
|

PyEuroVoc: A Tool for Multilingual Legal Document Classification with EuroVoc Descriptors

Abstract: EuroVoc is a multilingual thesaurus that was built for organizing the legislative documentary of the European Union institutions. It contains thousands of categories at different levels of specificity and its descriptors are targeted by legal texts in almost thirty languages. In this work we propose a unified framework for Eu-roVoc classification on 22 languages by finetuning modern Transformer-based pretrained language models. We study extensively the performance of our trained models and show that they signi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
2
0
1

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
2
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 13 publications
(8 reference statements)
1
2
0
1
Order By: Relevance
“…Although not directly comparable due to the use of different datasets/annotation schemas, the performance of our models is in line with current stateof-the-art approaches to legal text classification (Chalkidis et al, 2019;Avram et al, 2021), which however consider the full text of legal acts.…”
Section: Evaluation and Error Analysissupporting
confidence: 59%
See 1 more Smart Citation
“…Although not directly comparable due to the use of different datasets/annotation schemas, the performance of our models is in line with current stateof-the-art approaches to legal text classification (Chalkidis et al, 2019;Avram et al, 2021), which however consider the full text of legal acts.…”
Section: Evaluation and Error Analysissupporting
confidence: 59%
“…In the legal domain, text classification has an established tradition, both in the monolingual (Šarić et al, 2014;Papaloukas et al, 2021) and in the multi-lingual setting (Steinberger et al, 2006(Steinberger et al, , 2012Chalkidis et al, 2019;Avram et al, 2021;Chalkidis et al, 2021). Moreover, the large availability of legal data, produced by national and supranational public institutions, set the stage for the development of domain-adapted models (Chalkidis et al, 2020;Douka et al, 2021;Masala et al, 2021;Licari and Comandè, 2022).…”
Section: Related Workmentioning
confidence: 99%
“…In terms of the former, KB-BERT has now been utilised by medical researchers seeking to develop new lifestyle treatments for diabetes patients; in attempts to automatically identify the presence of implants (ie. pacemakers or stents) in heart patients prior to MRI scans; and for the classification of legal documents (Dwibedi et al, 2022;Jerdhaf et al, 2020;Avram et al, 2021). In terms of the latter, the lab's models have been put to work in automating and streamlining the information handling processes of various public authorities, including local councils, the Swedish Tax Agency (Skatteverket), the Swedish Courts (Domstolsverket) and most recently, the support function of State administration (Statens servicecenter).…”
Section: The Value Of Collections-based Models In Practicementioning
confidence: 99%
“…и включает 6883 понятия. На практике, в частности, данный тезаурус используется для индексации документов в системах документооборота европейских учреждений, а также для классификации юридических документов [Caled et al, 2019;Avram et al, 2021].…”
Section: онтологические ресурсы для задач регионального управленияunclassified