2020
DOI: 10.1007/s41109-020-00321-y
|View full text |Cite
|
Sign up to set email alerts
|

Improving topic modeling through homophily for legal documents

Abstract: Topic modeling that can automatically assign topics to legal documents is very important in the domain of computational law. The relevance of the modeled topics strongly depends on the legal context they are used in. On the other hand, references to laws and prior cases are key elements for judges to rule on a case. Taken together, these references form a network, whose structure can be analysed with network analysis. However, the content of the referenced documents may not be always accessed. Even in that cas… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 9 publications
(4 citation statements)
references
References 39 publications
0
4
0
Order By: Relevance
“…e length of the shortest element sentence is 30-40 words, and the length of the longest element sentence will reach more than 300 words. e traditional model mostly uses fixed parameters as the vector dimension and fills the vector with 0 for the short sentence, which cannot effectively capture the characteristic representation of sentences with different lengths [5]. In order to weaken the negative impact of the length difference of different sentences on the effect of the model, the multihead self-attention mechanism (MAT) based on the mask method is further integrated into the BERT-CNN model.…”
Section: Introductionmentioning
confidence: 99%
“…e length of the shortest element sentence is 30-40 words, and the length of the longest element sentence will reach more than 300 words. e traditional model mostly uses fixed parameters as the vector dimension and fills the vector with 0 for the short sentence, which cannot effectively capture the characteristic representation of sentences with different lengths [5]. In order to weaken the negative impact of the length difference of different sentences on the effect of the model, the multihead self-attention mechanism (MAT) based on the mask method is further integrated into the BERT-CNN model.…”
Section: Introductionmentioning
confidence: 99%
“…In this matrix, the measure scale goes from [-1, 1], with 1 denoting the strongest connotation. The work of [26] defines topic variety as the proportion of unique words across all themes. The scale goes from [0, 1], with 0 denoting superfluous topics and 1 denoting topics with more variety.…”
Section: Discussionmentioning
confidence: 99%
“…The designed model reduces the computational complexity but the performance of long documents classification was not improved. Improving topic modeling analysis method was introduced in [16] for legal case documents classification. But it failed to use a multilayer network model for legal case documents categorization.…”
Section: Related Workmentioning
confidence: 99%