2012 SC Companion: High Performance Computing, Networking Storage and Analysis 2012
DOI: 10.1109/sc.companion.2012.149
|View full text |Cite
|
Sign up to set email alerts
|

Understanding Cloud Data Using Approximate String Matching and Edit Distance

Abstract: For health and human services, fraud detection and other security services, identity resolution is a core requirement for understanding big data in the cloud. Due to the lack of a globally unique identifier and captured typographic differences for the same identity, identity resolution has high spatial and temporal complexities. We propose a filter and verify method to substantially increase the speed of approximate string matching using edit distance. This method has been found to be almost 80 times faster (1… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2014
2014
2020
2020

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(9 citation statements)
references
References 17 publications
0
9
0
Order By: Relevance
“…Filters can make the system more efficient by removing unnecessary comparisons. One of the most common methods is length filtering, where the difference in the length of the two strings s and t must not be greater than k [7]. The algorithm of length filtering can be seen in Table 2.…”
Section: B Filter and Verify Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Filters can make the system more efficient by removing unnecessary comparisons. One of the most common methods is length filtering, where the difference in the length of the two strings s and t must not be greater than k [7]. The algorithm of length filtering can be seen in Table 2.…”
Section: B Filter and Verify Methodsmentioning
confidence: 99%
“…Damerau-Levenshtein Distance algorithm is a development of the Levenshtein Distance algorithm. Damerau extended Levenshtein distance to also detect transposition errors and treat them as one edit operation [7]. Therefore Damerau-Levenshtein calculates the minimum insertion, deletion, substitution, and transposition operations to convert one word into another.…”
Section: Issn 2355-0082mentioning
confidence: 99%
See 1 more Smart Citation
“…The processes used on the Levenshtein distance algorithm are insertion, deletion, substitution. While, in the damerau Levenshtein distance algorithm, the operation used is almost the same as Levenshtein distance, but with the addition of the transposition operation between two characters [5]. Damerau Levenshtein does not distinguish between these four operations.…”
Section: Damerau Levenhstein Algorithmmentioning
confidence: 99%
“…An enhanced bit counting method [17] can be used to count the 1's in that only iterates once for each 1 in , such that = (1 ′ ). If is greater than 2 , the edit distance cannot be less than or equal to [18]. The complexity of the Bitwise Signature Filter is ( ) and for a dataset with strings it is ( ).…”
Section: Figure 4: Bitwise Signature and Hash Codementioning
confidence: 99%