2023
DOI: 10.1007/s00500-023-08687-8
|View full text |Cite
|
Sign up to set email alerts
|

GPU-based similarity metrics computation and machine learning approaches for string similarity evaluation in large datasets

Abstract: The digital era brings up on one hand massive amounts of available data and on the other hand the need of parallel computing architectures for efficient data processing. String similarity evaluation is a processing task applied on large data volumes, commonly performed by various applications such as search engines, biomedical data analysis and even software tools for defending against viruses, spyware, or spam. String similarities are also used in musical industry for matching playlist records with repertory … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 35 publications
0
1
0
Order By: Relevance
“…This strategic choice enhances the algorithm's overall performance on parallel architectures as the recent article published by (Groth et al, 2023) in the knowledge and information systems journal has proved to be 40% higher throughput than the CPU-only approach when dealing with string manipulation operations. Our approach offers a considerable speedup, especially when dealing with a large data set of patterns as mentioned in the 2023 study published Application of soft computing (Baloi et al, 2023) which used GPU-accelerated pattern matching to find similarity metrics. Two algorithms have emerged as the most widely utilized for multiple string matching on GPUs -the Aho-Corasick (AC) and Rabin-Karp algorithms.…”
Section: Introductionmentioning
confidence: 99%
“…This strategic choice enhances the algorithm's overall performance on parallel architectures as the recent article published by (Groth et al, 2023) in the knowledge and information systems journal has proved to be 40% higher throughput than the CPU-only approach when dealing with string manipulation operations. Our approach offers a considerable speedup, especially when dealing with a large data set of patterns as mentioned in the 2023 study published Application of soft computing (Baloi et al, 2023) which used GPU-accelerated pattern matching to find similarity metrics. Two algorithms have emerged as the most widely utilized for multiple string matching on GPUs -the Aho-Corasick (AC) and Rabin-Karp algorithms.…”
Section: Introductionmentioning
confidence: 99%