Proceedings of the Second Workshop on Scholarly Document Processing 2021
DOI: 10.18653/v1/2021.sdp-1.2
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Document Expansion for Information Retrieval with Stochastic Text Generation

Abstract: One of the challenges in information retrieval (IR) is the vocabulary mismatch problem, which happens when the terms between queries and documents are lexically different but semantically similar. While recent work has proposed to expand the queries or documents by enriching their representations with additional relevant terms to address this challenge, they usually require a large volume of query-document pairs to train an expansion model. In this paper, we propose an Unsupervised Document Expansion with Gene… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 29 publications
0
4
0
Order By: Relevance
“…We compare LoGE with: The Base contains only the retrieval by BM25 scoring on documents preprocessed with the basic filters; The Pegasus4IR model, for which we have adapted Pegasus (Zhang et al , 2020), is for IR. We generate a rewritten text by applying the most widely used pretrained Pegasus model in document summarization; and UDEG (Jeong et al , 2021) is the state-of-the-art model for abstractive generation of document extensions for ad hoc search. It is the main competitive model with a similar approach to unsupervised document extension. …”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…We compare LoGE with: The Base contains only the retrieval by BM25 scoring on documents preprocessed with the basic filters; The Pegasus4IR model, for which we have adapted Pegasus (Zhang et al , 2020), is for IR. We generate a rewritten text by applying the most widely used pretrained Pegasus model in document summarization; and UDEG (Jeong et al , 2021) is the state-of-the-art model for abstractive generation of document extensions for ad hoc search. It is the main competitive model with a similar approach to unsupervised document extension. …”
Section: Methodsmentioning
confidence: 99%
“…UDEG (Jeong et al , 2021) is the state-of-the-art model for abstractive generation of document extensions for ad hoc search. It is the main competitive model with a similar approach to unsupervised document extension.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Besides interpolation, Wei and Zou (2019) and Ma (2019) proposed perturbation over words, and Lee et al (2021b) proposed perturbation over word embeddings. Jeong et al (2021) and Gao et al (2021) perturbed text embeddings to generate diverse sentences and to augment positive sentence pairs in unsupervised learning. In contrast, we address dense retrieval, perturbing document representations with dropout (Srivastava et al, 2014) in a supervised setting with labeled documents.…”
Section: Related Workmentioning
confidence: 99%