An Information Retrieval Pipeline for Legislative Documents from the Brazilian Chamber of Deputies

Souza, Ellen; Vitório, Douglas; Moriyama, Gyovana; Santos, Luiz Carlos; Martins, Lucas; Souza, Mariana Barbosa de; Fonseca, Márcio Alves da; Félix, Nádia; Carvalho, André Castro; Albuquerque, Hidelberg Oliveira; Oliveira, Adriano L. I.

doi:10.3233/faia210326

Cited by 9 publications

(19 citation statements)

References 16 publications

(20 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Souza et al [11,28] investigated IR methods and presented a pipeline for the retrieval of legislative documents within the context of the Brazilian Chamber of Deputies. Evaluating the use of three variants of the BM25 algorithm, along with different pre-processing techniques, they built the IR model currently employed by the Chamber to retrieve bills and other queries relevant to a parliamentarian's request.…”

Section: Legal Information Retrievalmentioning

confidence: 99%

“…Nowadays, the IR model used by Conle to automatically retrieve relevant documents is based on BM25L [38] and a combination of pre-processing techniques: punctuation, accentuation, and stopwords removal + Stemming, with the Savoy algorithm [39], + unigram and bigram; as presented by [11]. BM25L ranks the documents by estimating their relevance to a query.…”

Section: The Scenario Of the Brazilian Chamber Of Deputiesmentioning

confidence: 99%

“…Actual consultations could not be used due to their confidentiality, as they contain private information about the parliamentarians. This made it impossible for the datasets used by [11,22,28] to be made available.…”

Section: The Scenario Of the Brazilian Chamber Of Deputiesmentioning

confidence: 99%

“…Besides constructing a corpus that can be utilized to evaluate other IR techniques, we initially employed this feedback data to measure the performance of the model used by Conle [11] in a real-case scenario. To perform this evaluation, we considered both somewhat relevant and very relevant documents as the correct set for each query.…”

Section: Evaluation Of the Brazilian Chamber Of Deputies' Ir Modelmentioning

confidence: 99%

See 3 more Smart Citations

Building a Relevance Feedback Corpus for Legal Information Retrieval in the Real-Case Scenario of the Brazilian Chamber of Deputies

Vitório

Souza

Martins

et al. 2023

Preprint

Self Cite

View full text Add to dashboard Cite

The proper functioning of judicial and legislative institutions requires the efficient retrieval of legal documents from extensive datasets. Legal Information Retrieval focuses on investigating how to efficiently handle these datasets, enabling the retrieval of pertinent information from them. Relevance Feedback, an important aspect of Information Retrieval systems, utilizes the relevance information provided by the user to enhance document retrieval for a specific request. However, there is a lack of available corpora containing this information, particularly for the legislative scenario. Thus, this paper presents Ulysses-RFCorpus, a Relevance Feedback corpus for legislative information retrieval, built in the real-case scenario of the Brazilian Chamber of Deputies. To the best of our knowledge, this corpus is the first publicly available of its kind for the Brazilian Portuguese language. It is also the only corpus that contains feedback information for legislative documents, as the other corpora found in the literature primarily focus on judicial texts. We also used the corpus to evaluate the performance of the Brazilian Chamber of Deputies' Information Retrieval system. Thereby, we highlighted the model's strong performance and emphasized the dataset's significance in the field of Legal Information Retrieval.

show abstract

Section: Legal Information Retrievalmentioning

confidence: 99%

Section: The Scenario Of the Brazilian Chamber Of Deputiesmentioning

confidence: 99%

Section: The Scenario Of the Brazilian Chamber Of Deputiesmentioning

confidence: 99%

Section: Evaluation Of the Brazilian Chamber Of Deputies' Ir Modelmentioning

confidence: 99%

See 2 more Smart Citations

Building a Relevance Feedback Corpus for Legal Information Retrieval in the Real-Case Scenario of the Brazilian Chamber of Deputies

Vitório

Souza

Martins

et al. 2023

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Information Retrieval techniques are at the core of many legal research platforms, such as Lexis+ 2 and Westlaw Edge 3 , and have several critical application scenarios, such as legislative document retrieval [82], [83] and case retrieval [84], [85]. Depending on the application, the queries can be short (e.g., simple keywords), of medium length (e.g., Boolean or natural language queries) or long (e.g., whole documents).…”

Section: A An Overview Of Major Legal Nlp Tasksmentioning

confidence: 99%

On the Effectiveness of Pre-Trained Language Models for Legal Natural Language Processing: An Empirical Study

Song

Gao

et al. 2022

IEEE Access

View full text Add to dashboard Cite

We present the first comprehensive empirical evaluation of pre-trained language models (PLMs) for legal natural language processing (NLP) in order to examine their effectiveness in this domain.Our study covers eight representative and challenging legal datasets, ranging from 900 to 57K samples, across five NLP tasks: binary classification, multi-label classification, multiple choice question answering, summarization and information retrieval. We first run unsupervised, classical machine learning and/or non-PLM based deep learning methods on these datasets, and show that baseline systems' performance can be 4%∼35% lower than that of PLM-based methods. Next, we compare general-domain PLMs and those specifically pre-trained for the legal domain, and find that domain-specific PLMs demonstrate 1%∼5% higher performance than general-domain models, but only when the datasets are extremely close to the pretraining corpora. Finally, we evaluate six general-domain state-of-the-art systems, and show that they have limited generalizability to legal data, with performance gains from 0.1% to 1.2% over other PLM-based methods. Our experiments suggest that both general-domain and domain-specific PLM-based methods generally achieve better results than simpler methods on most tasks, with the exception of the retrieval task, where the best-performing baseline outperformed all PLM-based methods by at least 5%. Our findings can help legal NLP practitioners choose the appropriate methods for different tasks, and also shed light on potential future directions for legal NLP research.

show abstract

Ulysses-RFSQ: A Novel Method to Improve Legal Information Retrieval Based on Relevance Feedback

Vitório

Souza

Martins

et al. 2022

Lecture Notes in Computer Science

View full text Add to dashboard Cite

An Information Retrieval Pipeline for Legislative Documents from the Brazilian Chamber of Deputies

Cited by 9 publications

References 16 publications

Building a Relevance Feedback Corpus for Legal Information Retrieval in the Real-Case Scenario of the Brazilian Chamber of Deputies

Building a Relevance Feedback Corpus for Legal Information Retrieval in the Real-Case Scenario of the Brazilian Chamber of Deputies

On the Effectiveness of Pre-Trained Language Models for Legal Natural Language Processing: An Empirical Study

Ulysses-RFSQ: A Novel Method to Improve Legal Information Retrieval Based on Relevance Feedback

Contact Info

Product

Resources

About