Query Processing in Highly-Loaded Search Engines

Broccolo, Daniele; Macdonald, Craig; Orlando, Salvatore; Ounis, Iadh; Perego, Raffaele; Silvestri, Fabrizio; Tonellotto, Nicola

doi:10.1007/978-3-319-02432-5_9

Cited by 4 publications

(3 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A variety of work has explored the efficiency of sharded search systems, covering topics including: reducing the communications and merging costs when large numbers of shards are searched [13]; load balancing in mirrored systems [23,38]; query shedding under high load to improve overall throughput [10]; and query pruning to improve efficiency [59]. Other work focuses on addressing the load imbal-ances that arise when non-random shards are used, including the development of techniques for strategic assignment of index postings to shards, and strategic replication of frequently-accessed elements [41,42].…”

Section: Distributed Retrievalmentioning

confidence: 99%

Efficient distributed selective search

Kim¹,

Callan²,

Culpepper³

et al. 2016

Inf Retrieval J

View full text Add to dashboard Cite

Simulation and analysis have shown that selective search can reduce the cost of large-scale distributed information retrieval. By partitioning the collection into small topical shards, and then using a resource ranking algorithm to choose a subset of shards to search for each query, fewer postings are evaluated. In this paper we extend the study of selective search into new areas using a fine-grained simulation, examining the difference in efficiency when term-based and sample-based resource selection algorithms are used; measuring the effect of two policies for assigning index shards to machines; and exploring the benefits of index-spreading and mirroring as the number of deployed machines is varied. Results obtained for two large datasets and four large query logs confirm that selective search is significantly more efficient than conventional distributed search architectures and can handle higher query rates. Furthermore, we demonstrate that selective search can be tuned to avoid bottlenecks, and thus maximize usage of the underlying computer hardware.

show abstract

Section: Distributed Retrievalmentioning

confidence: 99%

Efficient distributed selective search

Kim¹,

Callan²,

Culpepper³

et al. 2016

Inf Retrieval J

View full text Add to dashboard Cite

show abstract

“…For example, consideration of the spatial and temporal variance in energy prices that RESQ exploits may lead to increased cost savings for cache eviction algorithms (e.g., [16] and [7]). Similarly, adaptive selection of underlying site retrieval strategies (e.g., based on query efficiency prediction [5]) may also help RESQ process the queued query volume at a local site within particularly low-or high-price time durations. While the design of RESQ does not necessarily preclude the adoption of such existing approaches, a fresh look at the problems in the context of rank-and energyawareness for distributed search systems may yield useful insights.…”

Section: Resultsmentioning

confidence: 99%

Rank-energy selective query forwarding for distributed search systems

Teymorian¹,

Maloof²

2013

Proceedings of the 22nd ACM International Conference on Information &Amp; Knowledge Management

View full text Add to dashboard Cite

Scaling high-quality, cost-efficient query evaluation is critical to search system performance. Although partial indexes reduce query processing times, result quality may be jeopardized due to exclusion of relevant non-local documents. Selectively forwarding queries between geographically distributed search sites may help. The basic idea of query forwarding is that after a local site receives a query, it determines non-local sites to forward the query to and returns an aggregation of the local and non-local results. Nevertheless, electricity costs remain substantial sources of operating expenses. We present a hybrid rank-energy query forwarding model termed "RESQ." The novel contribution is to simultaneously consider both ranking quality and spatiallytemporally varying energy prices when making forwarding decisions. Experiments with a large-scale query log, publiclyavailable electricity price data, and real search site locations demonstrate that query forwarding under RESQ achieves the result scalability of partial indexes with the cost savings of energy-aware approaches (e.g., an 87% ranking guarantee with a 46% savings in energy costs).

show abstract

“…Daniele et al [15] use query efficiency predictors to feed a load-sensitive selective pruning framework and they also demonstrate that a mutiple feature predictor using DAAT is more accurate than a single feature one. In [16], authors use predictors to introduce a novel dropping strategy for maintaining the response times under a specified threshold.…”

Section: Query Efficiency Predictorsmentioning

confidence: 99%

Query Scheduling Techniques and Power-Latency Trade-off Model for Large-Scale Search Engines

Freire

2015

SIGIR Forum

View full text Add to dashboard Cite

AcknowledgmentsThis path would not have been possible without the help of my supervisor, FidelCacheda, who gave me the opportunity of diving again into the Information Retrieval field, after an unforgettable start in the Information Retrieval Lab. Thank you for being such a great motivating person, for guiding and advising me in the best way and, above all, for always letting me choose the next step.All my gratitude to all the colleagues that have shared with me great moments in the Telematic Engineering Lab. Thank you for making our office a warm place to spend lots of hours and for building such a great friendship even out of our blind walls.I would like to acknowledge Iadh Ounis and the Information Retrieval Group from the University of Glasgow. They gave me the opportunity of staying three months in 2012 in one of the leading groups in IR, where I put in touch with an impeccable way of work. My special gratitude goes to Craig Macdonald: thank you for becoming my best teacher and reference during all my PhD. I can't ever forget some of your encouraging words that became my motto: Don't think "is this sufficient?" but "how can we do better". All this work would not have being possible without your valuable help. A special gratitude also for Silvia Lorenzo Freire, for sharing with me her huge knowledge and offering me her useful help. Vorrei ringraziare in modo particolare a tutti i membri del HighI can not forget Roi Blanco, who was discreetly present at every milestone of my career. Thank you also for opening me the next opportunity of learning, I will make the most of it.A great deal of gratitude is due to all the people that have walked with me during this period. Your encouraging and warm-hearted words were the best incentive to finish this work. Those who kept me out of this thesis, sharing awesome moments and messages, have also done a good job. AbstractWeb search engines have to deal with a rapid increase of information, demanded by high incoming query traffic. This situation has driven companies to build geographically distributed data centres housing thousands of computers, consuming enormous amounts of electricity and requiring a huge infrastructure around. At this scale, even minor efficiency improvements result in large financial savings.This thesis represents a novel contribution to query scheduling and power consumption state-of-the-art, by assisting large-scale data centres to build more efficient search engines.On the one hand, this thesis proposes new scheduling techniques to decrease the response time of queries, by estimating the server that will be idle soonest.On the other hand, this thesis defines a simple mathematical model that establishes a threshold between the power and latency of a search engine. Using historical and current data, the model estimates the incoming query traffic and automatically increases/decreases the necessary number of active machines in the system. We achieve high energy savings during the whole day, without degrading the latency.Our experiments have attested th...

show abstract

Query Processing in Highly-Loaded Search Engines

Cited by 4 publications

References 7 publications

Efficient distributed selective search

Efficient distributed selective search

Rank-energy selective query forwarding for distributed search systems

Query Scheduling Techniques and Power-Latency Trade-off Model for Large-Scale Search Engines

Contact Info

Product

Resources

About