2012
DOI: 10.1145/2422256.2422277
|View full text |Cite
|
Sign up to set email alerts
|

Modeling and solving term mismatch for full-text retrieval

Abstract: Even though modern retrieval systems typically use a multitude of features to rank documents, the backbone for search ranking is usually the standard tf.idf retrieval models.This thesis addresses a limitation of the fundamental retrieval models, the term mismatch problem, which happens when query terms fail to appear in the documents that are relevant to the query. The term mismatch problem is a long standing problem in information retrieval. However, it was not well understood how often term mismatch happens … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
4
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 9 publications
(4 citation statements)
references
References 81 publications
0
4
0
Order By: Relevance
“…To deal with vocabulary differences, several approaches have been proposed, including text normalization, query reformulation, search results clustering, and automatic query expansion. The interested reader may refer to [60] for a more in-depth treatment. Automatic query expansion expands the original query with the aim to produce a query that is more likely to retrieve relevant documents.…”
Section: Related Workmentioning
confidence: 99%
“…To deal with vocabulary differences, several approaches have been proposed, including text normalization, query reformulation, search results clustering, and automatic query expansion. The interested reader may refer to [60] for a more in-depth treatment. Automatic query expansion expands the original query with the aim to produce a query that is more likely to retrieve relevant documents.…”
Section: Related Workmentioning
confidence: 99%
“…In a two-stage QA system, the retriever retrieves a list of passages from a large database, then the reader provides the final answer, the accuracy of which is not only decided by the reader itself but also by the performance of the retriever. Traditional retrievers are efficient, with an inverted index, but face difficulties (e.g., term mismatch [6]) in matching queries and passages, e.g., Term Frequency-Inverse Document Frequency (TF-IDF) and Best Match 25 (BM25). Recently, based on Pre-trained Language Models (PLMs), the dual-encoder has been widely used to learn the relations between queries and passages.…”
Section: Introductionmentioning
confidence: 99%
“…Learning a similarity metric has gained much research interest, however due to limited availability of labeled data and complex structures in variable length sentences, the STS task becomes a hard problem. The performance of IR system is sub-optimal due to significant term mismatch in similar texts (Zhao, 2012), limited annotated data and complex structures in variable length sentences. We address the challenges in a real-world industrial application.…”
Section: Introductionmentioning
confidence: 99%