2020
DOI: 10.5817/mujlt2020-1-5
|View full text |Cite
|
Sign up to set email alerts
|

Document Similarity of Czech Supreme Court Decisions

Abstract: Retrieval of court decisions dealing with a similar legal matter is a prevalent task performed by lawyers as it is a part of a relevant decision-making practice review. In spite of the natural language processing methods that are currently available, this legal research is still mostly done through Boolean searches or by contextual retrieval. In this study, it is experimentally verified whether the doc2vec method together with cosine similarity, can automatically retrieve the Czech Supreme Court decisions deal… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 9 publications
(6 citation statements)
references
References 16 publications
0
6
0
Order By: Relevance
“…The first of the methods is the doc2vec model for semantically similar documents retrieval. The algorithm was used in standard settings and the model was trained for the whole dataset of the Czech Supreme Court decisions as described in [5]. The model provides for vector representations of court decisions and the similarity is computed as a cosine similarity measure between two vector representations.…”
Section: Semantic Similarity -Doc2vecmentioning
confidence: 99%
See 1 more Smart Citation
“…The first of the methods is the doc2vec model for semantically similar documents retrieval. The algorithm was used in standard settings and the model was trained for the whole dataset of the Czech Supreme Court decisions as described in [5]. The model provides for vector representations of court decisions and the similarity is computed as a cosine similarity measure between two vector representations.…”
Section: Semantic Similarity -Doc2vecmentioning
confidence: 99%
“…We again applied them to the dataset of the Supreme Court decisions and used the automatic coherence score metric to select the number of topics that the model should retrieve. The best models were the 30-topic LDA model and the 20-topic NMF model as described in [13]. I used the three most probable topics assigned by both models to court decisions and the relevance of the topics to the legal issues in presented decisions was evaluated by legal experts.…”
Section: Topic Modelling -Lda and Nmfmentioning
confidence: 99%
“…When training the model, we treated the 591 judgements as eviction cases and the 1182 judgements as non-eviction cases. 7 Of course this is a sub-optimal class distinction, as potentially many of the 1182 judgements may, in fact, be eviction cases. Consequently, from all cases that were classified as non-eviction cases, we only retained those which were (when included in the test set during the three-fold cross-validation procedure) assigned the non-eviction label with over 99% confidence (using Platt Scaling; 13).…”
Section: Datamentioning
confidence: 99%
“…30 Because precedents play an important role in legal argumentation, several studies have proposed Doc2Vec as a methodology to identify and measure case similarity. 31…”
Section: Document Clustering With Wordmentioning
confidence: 99%