Set-based model

Pôssas, Bruno; Ziviani, Nívio; Meira, Wagner; Ribeiro-Neto, Berthier

doi:10.1145/564376.564417

Cited by 18 publications

(6 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The retrieval process is handled by the Set-Based model. As mentioned, the notion of termsets, related to query terms, contributes highly to complexity reduction, due to processing significantly lower data volume (Possas et al (2002), Pôssas et al (2005)). It is important to notice that the model implements association rule mining algorithms (Agrawal and Srikant (1994)), to combine two frequent termsets in a different element of each set, thus creating a new one, which if frequent, will be considered a termset of the model.…”

Section: Graphs and Set-based Modelmentioning

confidence: 99%

“…On the other hand, on large values, the percentage model tends to perform as the complete graph-based extension of the set-based model (GSB) (Kalogeropoulos et al (2020)). To further amplify our case, taking into consideration that the MAP metric could be misleading, we counted the number of queries that our proposed models outperformed the set-based model (Possas et al (2002)). That is expressed by the difference in average precision for each query between the set-based model and the rest, as it is depicted in figure 8.…”

Section: Models Performance On Rankingmentioning

confidence: 99%

“…The main focus of information retrieval is the creation of effective and efficient models that comply with the user's information-seeking needs, which are usually expressed in the form of unstructured queries. In this study, following the approach of Kalogeropoulos et al (2020), an effort is made to alleviate the connection among terms in the textual graphs that arise in their proposed extension of the Set-Based model (Possas et al (2002), Pôssas et al (2005)) by implementing appropriate text partitions termed windows, and combining appropriately the resulted scheme with state-of-the-art approaches.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

On the Graph-Based Extension of the Set Based Model: A New Approach on Graph Representation and Model Ensemble

Kalogeropoulos,

Skamnelos,

Makris

2024

Preprint

View full text Add to dashboard Cite

The purpose of this paper is to present a set of algorithmic improvements upon an extension of the Set-Based Model (Kalogeropoulos et al (2020)) that focuses on the dependence among document terms employing graph representations. A graph-based approach to document representation can positively affect the information retrieval process due to the ability of graphs to capture the syntactic notion and, in some cases, the semantic relationship among document terms. The aforementioned model generates complete graphs; thus, each document term will be interdependent with the rest. Consequently, an interdependent segment or multiple parts of a document called a window is defined, in which the graph creation algorithms are applied. The proposed methods aim to approximate the window size by exploiting the document length. Moreover, an attempt to create multiple windows is made, considering the relationship between a sentence and a paragraph, which is reflected in the semantic importance of nodes and edges. An attempt to tackle the stop-word detection problem on bridge nodes is made by implementing algorithmic schemes that use core decomposition (Seid-man (1983a)) to identify the importance of such nodes in a sample of the corpus collection. Finally, a simple reranking scheme and an ensemble voting technique are implemented to enhance model performance on queries where the proposed approach lacks performance. The experimental analysis made on multiple document collections exhibits performance improvements and in some queries, our 1 approach outperforms even state-of-the-art models such as BM25 (Robertson et al (2004)) and ColBERT (Khattab and Zaharia (2020)).

show abstract

Section: Graphs and Set-based Modelmentioning

confidence: 99%

Section: Models Performance On Rankingmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

On the Graph-Based Extension of the Set Based Model: A New Approach on Graph Representation and Model Ensemble

Kalogeropoulos,

Skamnelos,

Makris

2024

Preprint

View full text Add to dashboard Cite

show abstract

“…What the BWI can provide is a further reduction of the search space of Apriori without incurring in large memory consumption. In IR, FI mining has been applied to indexing and retrieving documents; in this application, documents and queries can be represented as term sets where a term set is an itemset and items are terms ( Pôssas, Ziviani, Meira, & Ribeiro-Neto, 2002 ), ( Pôssas, Ziviani, Meira, & Ribeiro-Neto, 2005 ). Basically, all these approaches are based on counting frequent term sets and prescribe the selection of the most frequent words, the join of these frequent words with themselves, the selection of the most frequent word pairs, and so on.…”

Section: Related Workmentioning

confidence: 99%

Utilising a statistical inequality for efficiently finding term sets

Melucci

2016

Information Processing & Management

View full text Add to dashboard Cite

“…However, this method will greatly increase the complexity of the model, making it intractable in practice. Pôssas et al [11] followed a similar direction using term sets. Term sets group terms that co-occur frequently in documents.…”

Section: Related Workmentioning

confidence: 99%

Relating dependent indexes using dempster-shafer theory

Shi

Nie

Cao

2008

Proceedings of the 17th ACM Conference on Information and Knowledge Management

View full text Add to dashboard Cite

Traditional information retrieval (IR) approaches assume that the indexing terms are independent, which is not true in reality. Although some previous studies have tried to consider term relationships, strong simplifications had to be made at the very basic indexing step, namely, dependent terms are assigned independent counts or probabilities.In this study, we propose to consider dependencies between terms using Dempster-Shafer theory of evidence. An occurrence of a string in a document is considered to represent the set of all the terms implied in it. Probability is assigned to such a set of terms instead of individual terms. During query evaluation phase, a part of the probability of a set can be transferred to those of the query that are related, allowing us to integrate language-dependent relations in IR.This approach has been tested on several Chinese IR collections. Our experimental results show that our model can outperform the existing state-of-the-art approaches. The proposed method can be used as a general way to consider different types of relationship between terms and for other languages.

show abstract

Set-based model

Cited by 18 publications

References 30 publications

On the Graph-Based Extension of the Set Based Model: A New Approach on Graph Representation and Model Ensemble

On the Graph-Based Extension of the Set Based Model: A New Approach on Graph Representation and Model Ensemble

Utilising a statistical inequality for efficiently finding term sets

Relating dependent indexes using dempster-shafer theory

Contact Info

Product

Resources

About