The automated identification of national implementations (NIMs) of European directives by text similarity techniques has shown promising preliminary results. Previous works have proposed and utilized unsupervised lexical and semantic similarity techniques based on vector space models, latent semantic analysis and topic models. However, these techniques were evaluated on a small multilingual corpus of directives and NIMs. In this paper, we utilize word and paragraph embedding models learned by shallow neural networks from a multilingual legal corpus of European directives and national legislation (from Ireland, Luxembourg and Italy) to develop unsupervised semantic similarity systems to identify transpositions. We evaluate these models and compare their results with the previous unsupervised methods on a multilingual test corpus of 43 Directives and their corresponding NIMs. We also develop supervised machine learning models to identify transpositions and compare their performance with different feature sets.
This paper presents a description about our adopted approach for the information retrieval and textual entailment tasks of the COLIEE 2017 competition. We address the information retrieval task by implementing a partial string matching and a topic clustering method. For the textual entailment task, we propose a Long Short-Term Memory (LSTM) -Convolutional Neural Network (CNN) model which utilizes word embeddings trained on the Google News vectors. We evaluated our approach for both tasks on the COLIEE 2017 dataset. The results demonstrate that the topic clustering method outperformed the partial string matching method in the information retrieval task. The performance of LSTM-CNN model was competitive with other textual entailment systems.
is paper presents a unifying text similarity measure (USM) for automated identi cation of national implementations of European Union (EU) directives. e proposed model retrieves the transposed provisions of national law at a ne-grained level for each article of the directive. USM incorporates methods for matching common words, common sequences of words and approximate string matching. It was used for identifying transpositions on a multilingual corpus of four directives and their corresponding national implementing measures (NIMs) in three di erent languages : English, French and Italian. We further utilized a corpus of four additional directives and their corresponding NIMs in English language for a thorough test of the USM approach. We evaluated the model by comparing our results with a gold standard consisting of o cial correlation tables (where available) or correspondences manually identi ed by domain experts. Our results indicate that USM was able to identify transpositions with average F-score values of 0.808, 0.736 and 0.708 for French, Italian and English Directive-NIM pairs respectively in the multilingual corpus. A comparison with stateof-the-art methods for text similarity illustrates that USM achieves a higher F-score and recall across both the corpora.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.