Topic modeling is widely used in various domains for extracting latent topics underlying large corpora, including judicial texts. In the latter, topics tend to be made by and for domain experts, but remain unintelligible for laymen. In the framework of housing law court decisions in French which mixes abstract legal terminology with real-life situations described in common language, similarly to [1], we aim at identifying different situations that can cause a tenant to prosecute their landlord in court with the application of topic models. Upon quantitative evaluation, LDA and BERTopic deliver the best results, but a closer manual analysis reveals that the second embedding-based approach is much better at producing and even uncovering topics that describe a tenant’s real-life issues and situations.
The Régie du Logement du Québec (RDL) is a tribunal with exclusive jurisdiction in matters regarding rental leases. Within the framework of the ACT (Autonomy Through Cyberjustice Technologies) project, we processed an original collection of court decisions in French and performed a thorough analysis to reveal biases that may influence prediction experiments. We studied a multilabel classification task that consists in predicting the types of verdict in order to illustrate the importance of prior data analysis. Our best model, based on the FlauBERT language model, achieves F1 score micro averages of 93.7% and 84.9% in Landlord v. Tenant and Tenant v. Landlord cases respectively. However, with the support of our in-depth analysis, we emphasize that these results should be kept in perspective and that some metrics may not be suitable for evaluating systems in sensitive domains such as housing law.
Legal text summarization is generally formalized as an extractive text summarization task applied to court decisions from which the most relevant sentences are identified and returned as a gist meant to be read by legal experts. However, such summaries are not suitable for laymen seeking intelligible legal information. In the scope of the JusticeBot, a question-answering system in French that provides information about housing law, we intend to generate summaries of court decisions that are, on the one hand, conditioned by a question-answer-decision triplet, and on the other hand, intelligible for ordinary citizens not familiar with legal documents. So far, our best model, a further pre-trained BARThez, achieves an average ROUGE-1 score of 37.7 and a deepened manual evaluation of summaries reveals that there is still room for improvement.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.