Characterization and analysis of a web search benchmark

Hadjilambrou, Zacharias; Kleanthous, Marios; Sazeides, Yiannakis

doi:10.1109/ispass.2015.7095818

Cited by 4 publications

(4 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This work extends and earlier work [9] by using larger index size (10GB) and Poisson inter-arrival distribution for dictating query arrival rate. Furthermore, this work provides a broader top-down benchmark characterization, as it performs, among other, (a) an analysis of the performance benefit from various micro-architectural features such as caching and prefetching, and (b) an analysis of the the latency impact of DVFS and idle states.The specific contributions of this work are the following:(1) We characterize end-to-end query processing times and confirm that index search is the most time-consuming part of the query execution [9,27,30].(2) We show that index search time scales linearly with index size, whereas other operations, such as document summary generation, take constant processing time.(3) We demonstrate that a 10GB index, generated with a typical crawl walk starting from various seed web sites, exhibits good load balancing in terms of the number of indexed documents per partition. (4) Given the performance scaling with dataset partitioning and parallel search (due to the good load balancing of index terms across partitions), we motivate the use of low power servers with many simple cores for index search.…”

mentioning

confidence: 81%

See 1 more Smart Citation

Comprehensive Characterization of an Open Source Document Search Engine

Hadjilambrou

Kleanthous

Antoniou

et al. 2019

ACM Trans. Archit. Code Optim.

Self Cite

View full text Add to dashboard Cite

This work performs a thorough characterization and analysis of the open source Lucene search library. The article describes in detail the architecture, functionality, and micro-architectural behavior of the search engine, and investigates prominent online document search research issues. In particular, we study how intra-server index partitioning affects the response time and throughput, explore the potential use of low power servers for document search, and examine the sources of performance degradation ands the causes of tail latencies. Some of our main conclusions are the following: (a) intra-server index partitioning can reduce tail latencies but with diminishing benefits as incoming query traffic increases, (b) low power servers given enough partitioning can provide same average and tail response times as conventional high performance servers, (c) index search is a CPU-intensive cache-friendly application, and (d) C-states are the main culprits for performance degradation in document search. 19:2 Z. Hadjilambrou et al. search services are required to provide tight QoS guarantees, such as tail latencies below 500ms [2] even at peak traffic loads. Previous work aims at improving the latency, efficiency and cost of operation of search services. In the work of Meisner et al. [27], full system power management is evaluated for a web search workload. To improve energy efficiency, Lo et al. [20] proposed running each server just fast enough to satisfy global latency requirements, whereas Vamanan et al. [33] proposed to exploit time slack by slowing down individual sub-queries. The possibility of using mobile cores for web search for improved cost and energy efficiency is studied in the work of Reddi et al. [30]. Ren et al. [31] examined how web search can benefit from heterogeneous cores, whereas Haque et al. [10] and Jeon et al. [15] looked at adaptive parallelism for improving response times. Work stealing for meeting web search target latency is proposed by Li et al. [17]. Hsu et al. [14] propose a turbo boost framework that increases CPU voltage and frequency at fine-grain time intervals to reduce the latency of computational heavy search queries. Other work has collocated search applications with other types of workloads to increase data center utilization [25,26,35].This article presents a thorough top-down characterization of an open source search engine to improve the overall understanding of search engines. In particular, this work presents a characterization of the Lucene-based Nutch web search benchmark [8] on real hardware providing insights about the application and micro-architectural level behavior of this benchmark. This workload is based on the popular Lucene document search engine. Previous characterization efforts of this benchmark focused only on the query stream characterization [34] and micro-architectural characterization [8]. Another work conducted with the Nutch benchmark [9] evaluated the performance of index intra-server partitioning and slower cores. However, that work used a small inde...

show abstract

mentioning

confidence: 81%

“…(1) We characterize end-to-end query processing times and confirm that index search is the most time-consuming part of the query execution [9,27,30].…”

mentioning

confidence: 83%

Comprehensive Characterization of an Open Source Document Search Engine

Hadjilambrou

Kleanthous

Antoniou

et al. 2019

ACM Trans. Archit. Code Optim.

Self Cite

View full text Add to dashboard Cite

show abstract

“…On the other hand, some articles investigated benchmarking systems at design-time, i.e., performance metrics that can be calculated before the sourcecode is written. Finally, the type 'analytical and measurement' Metric type Articles Performance (de Souza Pinto et al 2018) (van Eyk et al 2018) (Shukla et al 2017) (Aragon et al 2019) (Bermbach et al 2017) (Bondi 2016) (Martinez-Millana et al 2015) (Tekli et al 2011) (Vasar et al 2012) (Franks et al 2011) (Gesvindr & Buhnova 2019) (Ibrahim et al 2018) (Pandey et al 2017) (Ferreira et al 2016) (Ueda et al 2016) (Hadjilambrou et al 2015) (Amaral et al 2015) (Brummett et al 2015 Table 1 Selected articles in the systematic literature review.…”

Section: Benchmark In Software Engineeringmentioning

confidence: 99%

Migration from Monolith to Microservices: Benchmarking a Case Study.

Bjørndal¹,

Mazzara²,

Bucchiarone³

et al. 2021

JOT

View full text Add to dashboard Cite

The migration from monolithic to microservice-based systems have become increasingly popular in the last decade.However, the advantages of this type of migration have not been extensively investigated in the literature, to the best of the authors' knowledge. This paper aims to present a methodology and performance indicators to support better assessment on whether the migration from a monolithic to microservice-based architecture is beneficial. A systematic review was conducted to identify the most relevant performance metrics in the literature, validated in a survey with professionals from the industry. Next, this set of metrics, including latency, throughput, scalability, CPU, memory usage, and network utilization -were used in two experiments to evaluate monolithic and microservice versions of the same system. The results reported here contribute to the body of knowledge on benchmarking different software architectures. In addition, this study illustrates how the identified metrics can more precisely assess both monolithic and microservice systems.

show abstract

“…We assume that our application scales with speedup, and we can reduce the total number of processors (while still maintaining the same throughput), by re-partitioning the input data [7].…”

Section: Acknowledgmentsmentioning

confidence: 99%

Toward Multi-Layer Holistic Evaluation of System Designs

Kleanthous

Sazeides

Özer

et al. 2016

IEEE Comput. Arch. Lett.

Self Cite

View full text Add to dashboard Cite

The common practice for quantifying the benefit(s) of design-time architectural choices of server processors is often limited to the chip-or server-level. This quantification process invariably entails the use of salient metrics, such as performance, power, and reliability, which capture -in a tangible manner -a designs overall ramifications. This paper argues for the necessity of a more holistic evaluation approach, which considers metrics across multiple integration levels (chip, server and datacenter). In order to facilitate said comprehensive evaluation, we utilize an aggregate metric, e.g. the Total Cost of Ownership (TCO), to harness the complexly of comparing multiple metrics at multiple levels. We motivate our proposition for holistic evaluation with a case study that compares a 2D processor to a 3D processor at various design integration levels. We show that while a 2D processor is clearly the best choice at the processor level, the conclusion is reversed at the data-center level, where the 3D processor becomes a better choice. This result emanates mainly from the performance benefits of processor-DRAM 3D integration, and the ability to amortize (at the datacenter-level) the higher 3D per-server cost and lower reliability by requiring fewer 3D servers to match the same performance.

show abstract

Characterization and analysis of a web search benchmark

Cited by 4 publications

References 14 publications

Comprehensive Characterization of an Open Source Document Search Engine

Comprehensive Characterization of an Open Source Document Search Engine

Migration from Monolith to Microservices: Benchmarking a Case Study.

Toward Multi-Layer Holistic Evaluation of System Designs

Contact Info

Product

Resources

About