In this work, we examine variations of the BERT model on the statute law retrieval task of the COLIEE competition. This includes approaches to leverage BERT's contextual word embeddings, finetuning the model, combining it with TF-IDF vectorization, adding external knowledge to the statutes and data augmentation. Our ensemble of Sentence-BERT with two different TF-IDF representations and document enrichment exhibits the best performance on this task regarding the F2 score. This is followed by a fine-tuned LEGAL-BERT with TF-IDF and data augmentation and our third approach with the BERTScore. As a result, we show that there are significant differences between the chosen BERT approaches and discuss several design decisions in the context of statute law retrieval.
Textual entailment classification is one of the hardest tasks for the Natural Language Processing community. In particular, working on entailment with legal statutes comes with an increased difficulty, for example in terms of different abstraction levels, terminology and required domain knowledge to solve this task. In course of the COLIEE competition, we develop three approaches to classify entailment. The first approach combines Sentence-BERT embeddings with a graph neural network, while the second approach uses the domain-specific model LEGAL-BERT, further trained on the competition’s retrieval task and fine-tuned for entailment classification. The third approach involves embedding syntactic parse trees with the KERMIT encoder and using them with a BERT model. In this work, we discuss the potential of the latter technique and why of all our submissions, the LEGAL-BERT runs may have outperformed the graph-based approach.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.