Approximate String Membership Checking: A Multiple Filter, Optimization-Based Approach

Sun, C. P.; Naughton, Jeffrey F.; Barman, Siddharth

doi:10.1109/icde.2012.68

Cited by 3 publications

(2 citation statements)

References 24 publications

(47 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another related problem, Approximate string match, refers to the problem of matching a string or sub-string to a given pattern in a text. There have also been a lot of studies on this problem [24] [27], and Navarro gives a detailed analysis on the existing approaches in his survey [14]. The selectivity estimation of similarity search and similarity join in [3][4][5] [28].…”

Section: Related Workmentioning

confidence: 99%

GFilter: A General Gram Filter for String Similarity Search

Zheng

Wang

et al. 2015

IEEE Trans. Knowl. Data Eng.

View full text Add to dashboard Cite

Numerous applications such as data integration, protein detection, and article copy detection share a similar core problem: given a string as the query, how to efficiently find all the similar answers from a large scale string collection. Many existing methods adopt a prefix-filter-based framework to solve this problem, and a number of recent works aim to use advanced filters to improve the overall search performance. In this paper we propose a gram-based framework to achieve near maximum filter performance. The main idea is to judiciously choose the high-quality grams as the prefix of query according to their estimated ability to filter candidates. As this selection process is proved to be NP-hard problem, we give a cost model to measure the filter ability of grams and develop efficient heuristic algorithms to find high-quality grams. Extensive experiments on real datasets demonstrate the superiority of the proposed framework in comparison with the state-of-art approaches.

show abstract

Section: Related Workmentioning

confidence: 99%

GFilter: A General Gram Filter for String Similarity Search

Zheng

Wang

et al. 2015

IEEE Trans. Knowl. Data Eng.

View full text Add to dashboard Cite

show abstract

“…Related to his work on XML as well as Information Extraction, Naughton and his students worked on various problems in searching and combining textual data in databases. This includes work on combining keyword search results with forms [144,149], approximate string membership [157,160], and debugging of "why not" provenance in keyword search over databases [182].…”

Section: Text Search In Databases (2009-2015)mentioning

confidence: 99%

Naughton's Wisconsin Bibliography: A Brief Guide

Hellerstein

2016

Preprint

View full text Add to dashboard Cite

show abstract

Smurf

2018

View full text Add to dashboard Cite

We argue that more attention should be devoted to developing self-service string matching (SM) solutions, which lay users can easily use. We show that Falcon, a self-service entity matching (EM) solution, can be applied to SM and is more accurate than current self-service SM solutions. However, Falcon often asks lay users to label many string pairs (e.g., 770-1050 in our experiments). This is expensive, can significantly compound labeling mistakes, and takes a long time. We developed Smurf, a self-service SM solution that reduces the labeling effort by 43-76%, yet achieves comparable F 1 accuracy. The key to make Smurf possible is a novel solution to efficiently execute a random forest (that Smurf learns via active learning with the lay user) over two sets of strings. This solution uses RDBMS-style plan optimization to reuse computations across the trees in the forest. As such, Smurf significantly advances self-service SM and raises interesting future directions for self-service EM and scalable random forest execution over structured data.

show abstract

Approximate String Membership Checking: A Multiple Filter, Optimization-Based Approach

Cited by 3 publications

References 24 publications

GFilter: A General Gram Filter for String Similarity Search

GFilter: A General Gram Filter for String Similarity Search

Naughton's Wisconsin Bibliography: A Brief Guide

Smurf

Contact Info

Product

Resources

About