Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2022
DOI: 10.18653/v1/2022.naacl-main.331
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Vector Models with Textual Guidance for Fine-Grained Scientific Document Similarity

Abstract: We present a new scientific document similarity model based on matching fine-grained aspects of texts. To train our model, we exploit a naturally-occurring source of supervision: sentences in the full-text of papers that cite multiple papers together (co-citations). Such cocitations not only reflect close paper relatedness, but also provide textual descriptions of how the co-cited papers are related. This novel form of textual supervision is used for learning to match aspects across papers. We develop multi-ve… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(6 citation statements)
references
References 46 publications
0
6
0
Order By: Relevance
“…The most basic idea to calculate REDi is the fact that papers cite strongly related papers to their own research. There is a study by Mysore et al ( 2022 ) that is based on the same idea. In this study, the degree of relatedness strength is replaced by the distance between the fields.…”
Section: Methodsmentioning
confidence: 99%
“…The most basic idea to calculate REDi is the fact that papers cite strongly related papers to their own research. There is a study by Mysore et al ( 2022 ) that is based on the same idea. In this study, the degree of relatedness strength is replaced by the distance between the fields.…”
Section: Methodsmentioning
confidence: 99%
“…In the scientific domain, contrastive learning of cross-document links (e.g. citations) has led to improved document-level representations (Cohan et al, 2020;Ostendorff et al, 2022b;Mysore et al, 2022). These representations can be indexed and consumed later by lightweight downstream models without additional fine-tuning.…”
Section: Introductionmentioning
confidence: 99%
“…Further, we use this benchmark to investigate and improve the generalization ability of document representation models. Following recent work (Cohan et al, 2020;Ostendorff et al, 2022b;Mysore et al, 2022) we pre-fine-tune a transformer model originally trained on citation triplets to produce high-quality representations for downstream tasks. We hypothesize that condensing all relevant information of a document into a single vector might not be expressive enough for generalizing across a wide range of tasks.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations