Redesigning the string hash table, burst trie, and BST to exploit cache

Askitis, Nikolas; Zobel, Justin

doi:10.1145/1671970.1921704

Cited by 9 publications

(3 citation statements)

References 91 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The best replacement word based on contextual information among the candidate words that passes the heuristics is selected using a 4-gram language model (LM) trained on the title strings from Wiki-Clickstream. The language model is generated using KenLM [24], 4 which is based on modified Kneser-Ney smoothing and provides fast model construction and querying. For example, an LM-based replacement for the target string "live queen", is "live together" while a random replacement gives "live teufelshorner"; for the target string "web server", a random replacement yields "web castelvetere" and an LM-based replacement gives "web content".…”

Section: Methodsmentioning

confidence: 99%

“…QAC implementation strategies vary based on how the partial query P is matched against the target strings [32]. A common approach is to use a trie [3,4,25,27] to retrieve candidates that have P as a prefix; or inverted index-based approaches [10,11,23] that offer completions independent of the ordering of the words in the partial query. The functionality of a QAC system can be extended beyond character level matches by including contextual cues [11] or synonyms [12,28].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Generation of Synthetic Query Auto Completion Logs

Krishnan

Moffat

Zobel

et al. 2020

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Privacy concerns can prohibit research access to large-scale commercial query logs. Here we focus on generation of a synthetic log from a publicly available dataset, suitable for evaluation of query auto completion (QAC) systems. The synthetic log contains plausible string sequences reflecting how users enter their queries in a QAC interface. Properties that would influence experimental outcomes are compared between a synthetic log and a real QAC log through a set of side-byside experiments, and confirm the applicability of the generated log for benchmarking the performance of QAC methods.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Generation of Synthetic Query Auto Completion Logs

Krishnan

Moffat

Zobel

et al. 2020

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

show abstract

“…The use of pointers in dynamic string data structures is the fundamental cause of cache-inefficiency, as they can lead to random memory accesses (Askitis and Zobel, 2011). Existing data structures are computationally efficient, but use large number of pointers to manage strings.…”

Section: About This Workmentioning

confidence: 99%

An analysis on the performance of hash table-based dictionary implementations with different data usage models

Thenmozhi

Srimathi

2017

IJHPCN

View full text Add to dashboard Cite

Abstract:The efficiency of in-memory computing applications depends on the choice of mechanism to store and retrieve strings. The tree and trie are the abstract data types (ADTs) that offer better efficiency for ordered dictionary. Hash table is one among the several other ADTs that provides efficient implementation for unordered dictionary. The performance of a data structure will depend on hardware capabilities of computing devices such as RAM size, cache memory size and even the speed of the physical storage media. Hence, an application which will be running on real or virtualised hardware environment certainly will have restricted access to memory and hashing is heavily used for such applications for speedy process. In this work, an analysis on the performance of six hash table based dictionary ADT implementations with different data usage models is carried out. The six different popular hash table based dictionary ADT implementations are Khash, Uthash, GoogleDenseHash, TommyHashtable, TommyHashdyn and TommyHashlin, tested under different hardware and software configurations.

show abstract