Proceedings of the 28th International Conference on Computational Linguistics 2020
DOI: 10.18653/v1/2020.coling-main.545
|View full text |Cite
|
Sign up to set email alerts
|

Aspect-based Document Similarity for Research Papers

Abstract: Traditional document similarity measures provide a coarse-grained distinction between similar and dissimilar documents. Typically, they do not consider in what aspects two documents are similar. This limits the granularity of applications like recommender systems that rely on document similarity. In this paper, we extend similarity with aspect information by performing a pairwise document classification task. We evaluate our aspect-based document similarity approach for research papers. Paper citations indicat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
40
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
6
3

Relationship

3
6

Authors

Journals

citations
Cited by 21 publications
(42 citation statements)
references
References 33 publications
2
40
0
Order By: Relevance
“…Section titles exist in particular in long documents like scientific papers. They usually imply the section content and describe the common topic for its sub-sentences (Ostendorff et al, 2020). In our work, we propose to utilize the corresponding section title as an additional HiStruct information when encoding its sub-sentences.…”
Section: Hierarchical Structure Informationmentioning
confidence: 99%
“…Section titles exist in particular in long documents like scientific papers. They usually imply the section content and describe the common topic for its sub-sentences (Ostendorff et al, 2020). In our work, we propose to utilize the corresponding section title as an additional HiStruct information when encoding its sub-sentences.…”
Section: Hierarchical Structure Informationmentioning
confidence: 99%
“…Despite this wide applicability of the problem, and a range of proposed approaches, evaluation of these retrieval/similarity methods remains a problem. A broad set of approaches evaluate systems against citations or combine citation information with other incidental information such as section headers as a means to determine citation intents [36]. Incidental sources of information such as section headers tend to represent approximate signals of citation intent [19,Sec 4], which may be useful for building systems.…”
Section: Query Titlementioning
confidence: 99%
“…Domain-specific models build on top of Transformers typically outperform their baselines for related tasks [12]. For example, SciBERT [2] was pre-trained on scientific documents and typically outperforms BERT for scientific NLP tasks, such as determining document similarity [22].…”
Section: Related Workmentioning
confidence: 99%
“…Intermediate Pre-Trained. SciBERT [2] optimizes the MLM for 1.14M randomly selected papers from Semantic Scholar 22 . BioClinicalBERT [1] specializes on 2M notes in the MIMIC-III database [13], a collection of disidentified clinical data.…”
Section: A2 Model Detailsmentioning
confidence: 99%