2022
DOI: 10.48550/arxiv.2203.14541
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Specialized Document Embeddings for Aspect-based Similarity of Research Papers

Abstract: Document embeddings and similarity measures underpin contentbased recommender systems, whereby a document is commonly represented as a single generic embedding. However, similarity computed on single vector representations provides only one perspective on document similarity that ignores which aspects make two documents alike. To address this limitation, aspect-based similarity measures have been developed using document segmentation or pairwise multi-class document classification. While segmentation harms the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 44 publications
(111 reference statements)
0
1
0
Order By: Relevance
“…Recent media bias studies have progressed from manually generated linguistic features [37,38] to state-of-the-art NLP models yielding internal word representations by unsupervised or supervised training on massive text corpora. The Transformer architecture [48] has shown superior performance in several downstream tasks, such as, text classification [26][27][28], plagiarism detection [50,51], word sense disambiguation [52] and fake news detection on the health domain [49]. However, the use of neural language models, such as BERT [8] and RoBERTa [24] in the media bias domain is still incipient [41,42].…”
Section: Transformer-based Detection Approachesmentioning
confidence: 99%
“…Recent media bias studies have progressed from manually generated linguistic features [37,38] to state-of-the-art NLP models yielding internal word representations by unsupervised or supervised training on massive text corpora. The Transformer architecture [48] has shown superior performance in several downstream tasks, such as, text classification [26][27][28], plagiarism detection [50,51], word sense disambiguation [52] and fake news detection on the health domain [49]. However, the use of neural language models, such as BERT [8] and RoBERTa [24] in the media bias domain is still incipient [41,42].…”
Section: Transformer-based Detection Approachesmentioning
confidence: 99%