2021
DOI: 10.48550/arxiv.2110.00976
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

Abstract: Law, interpretations of law, legal arguments, agreements, etc. are typically expressed in writing, leading to the production of vast corpora of legal text. Their analysis, which is at the center of legal practice, becomes increasingly elaborate as these collections grow in size. Natural language understanding (NLU) technologies can be a valuable tool to support legal practitioners in these endeavors. Their usefulness, however, largely depends on whether current state-of-the-art models can generalize across var… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
1
1
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 30 publications
0
3
0
Order By: Relevance
“…The GLUE benchmark, encompassing a variety of natural language understanding tasks, served as a comprehensive evaluation framework, driving significant advancements in model development [17], [31]. SuperGLUE extended the original GLUE tasks with more challenging benchmarks, pushing the limits of what models could achieve in terms of understanding and reasoning [32], [33]. The HaluEval benchmark specifically targeted the evaluation of hallucinations in LLMs, providing a large-scale framework for assessing the accuracy and reliability of model outputs [7].…”
Section: B Benchmarking and Evaluation Metricsmentioning
confidence: 99%
“…The GLUE benchmark, encompassing a variety of natural language understanding tasks, served as a comprehensive evaluation framework, driving significant advancements in model development [17], [31]. SuperGLUE extended the original GLUE tasks with more challenging benchmarks, pushing the limits of what models could achieve in terms of understanding and reasoning [32], [33]. The HaluEval benchmark specifically targeted the evaluation of hallucinations in LLMs, providing a large-scale framework for assessing the accuracy and reliability of model outputs [7].…”
Section: B Benchmarking and Evaluation Metricsmentioning
confidence: 99%
“…BBC News (Greene and Cunningham, 2006) and 20 NewsGroups (Lang, 1995) comprise a collection of public news articles on various topics. LEDGAR (Tuggener et al, 2020) includes a corpus of legal provisions in contract, which is part of the LexGLUE (Chalkidis et al, 2021) benchmark to evaluate the capabilities of legal text. arXiv is a digital archive that stores scholarly articles from a wide range of fields, such as mathematics, computer science, and physics.…”
Section: Datasetmentioning
confidence: 99%
“…Even though judiciary systems produce, consume and use massive volumes of textual information [9], they lack technological solutions to increase their efficiency. Moreover, legal documents are known to be complex and lengthy and use specialized vocabulary [1], which raises the technical challenge of developing NLP systems in that domain.…”
Section: Introductionmentioning
confidence: 99%