Merging the Results of Approximate Match Operations

Guha, Sudipto; Koudas, Nick; Marathe, A.; Srivastava, Divesh

doi:10.1016/b978-012088469-8.50057-7

Cited by 44 publications

(26 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The superscript is a bit vector indicating the membership of tuples seen so far for that state. For example, s 10 1,2 = t 1 , ¬t 2 and s 01 1,2 = ¬t 1 , t 2 . Now, assume n is the number of all tuples, consider an example satisfying the following two conditions:…”

Section: Complexity Analysis On U-topk and U-kranks 761 U-topkmentioning

confidence: 99%

Semantics and evaluation of top-k queries in probabilistic databases

Zhang

Chomicki

2009

Distrib Parallel Databases

View full text Add to dashboard Cite

We formulate three intuitive semantic properties for topk queries in probabilistic databases, and propose GlobalTopk query semantics which satisfies all of them. We provide a dynamic programming algorithm to evaluate top-k queries under Global-Topk semantics in simple probabilistic relations. For general probabilistic relations, we show a polynomial reduction to the simple case. Our analysis shows that the complexity of query evaluation is linear in k and at most quadratic in database size.

show abstract

Section: Complexity Analysis On U-topk and U-kranks 761 U-topkmentioning

confidence: 99%

Semantics and evaluation of top-k queries in probabilistic databases

Zhang

Chomicki

2009

Distrib Parallel Databases

View full text Add to dashboard Cite

show abstract

“…The use of supervised (training-based) approaches or learners aims at automating the process of entity matching to reduce the required manual effort. Training-based approaches, e.g., Naïve Bayes [49], logistic regression [46], Support Vector Machine (SVM) [11,43,49] or decision trees [63,29,49,53,54,56] have so far been used for some subtasks, e.g., determining suitable parameterizations for matchers or adjusting combination functions parameters (weights for matchers, offsets). However, training-based approaches require suitable training data and providing such data typically involves manual effort.…”

Section: Combination Of Matchersmentioning

confidence: 99%

“…A single match approach typically performs very differently for different domains and match problems. For example, it has been shown that there is no universally best string similarity measure [29,50]. Instead it is often beneficial and necessary to combine several methods for improved matching quality, e.g., to consider the similarity of several attributes or to take into account relationships between entities.…”

Section: Introductionmentioning

confidence: 99%

Frameworks for entity matching: A comparison

Köpcke

Rahm

2010

Data & Knowledge Engineering

347

204

View full text Add to dashboard Cite

a b s t r a c tEntity matching is a crucial and difficult task for data integration. Entity matching frameworks provide several methods and their combination to effectively solve different match tasks. In this paper, we comparatively analyze 11 proposed frameworks for entity matching. Our study considers both frameworks which do or do not utilize training data to semiautomatically find an entity matching strategy to solve a given match task. Moreover, we consider support for blocking and the combination of different match algorithms. We further study how the different frameworks have been evaluated. The study aims at exploring the current state of the art in research prototypes of entity matching frameworks and their evaluations. The proposed criteria should be helpful to identify promising framework approaches and enable categorizing and comparatively assessing additional entity matching frameworks and their evaluations.

show abstract

“…(2) Random accessing and ranking supports mainly random access over the dataset until the answers have been retrieved. [10] uses foot-rule distance to measure the two rankings and model the rank problem as the minimum cost perfect matching problem, whereas [5] proposes to translate the top-k query into a range query in database. (3) Pre-materialization and rank indices organizes the tuples in a special way, then applies similarity match for the answer of ranked query.…”

Section: Related Workmentioning

confidence: 99%

Efficient Processing of Ranked Queries with Sweeping Selection

Wen

Ester

2005

Knowledge Discovery in Databases: PKDD 2005

View full text Add to dashboard Cite

Abstract. Existing methods for top-k ranked query employ techniques including sorting, updating thresholds and materializing views. In this paper, we propose two novel index-based techniques for top-k ranked query: (1) indexing the layered skyline, and (2) indexing microclusters of objects into a grid structure. We also develop efficient algorithms for ranked query by locating the answer points during the sweeping of the line/hyperplane of the score function over the indexed objects. Both methods can be easily plugged into typical multi-dimensional database indexes. The comprehensive experiments not only demonstrate that our methods outperform the existing ones, but also illustrate that the application of data mining technique (microclustering) is a useful and effective solution for database query processing.

show abstract

Merging the Results of Approximate Match Operations

Cited by 44 publications

References 14 publications

Semantics and evaluation of top-k queries in probabilistic databases

Semantics and evaluation of top-k queries in probabilistic databases

Frameworks for entity matching: A comparison

Efficient Processing of Ranked Queries with Sweeping Selection

Contact Info

Product

Resources

About