Performance analysis of several back-end database architectures

Hagmann, Robert B.; Ferrari, Domenico

doi:10.1145/5236.5242

Cited by 32 publications

(8 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There is also a large volume of work on architectures for distributed and parallel database systems including research on performance, e.g., see Bell and Grimson [1992], DeWitt and Gray [1992], DeWitt et al [1986], Hagmann and Ferrari [1986], Mackert and Lohman [1986], and Stonebraker et al [1983]. Although the fields of information retrieval and databases are similar, there are several distinctions which make studying the performance of IR systems unique.…”

Section: Related Workmentioning

confidence: 99%

Evaluating the performance of distributed architectures for information retrieval using a variety of workloads

Cahoon

McKinley

Lu³

2000

ACM Trans. Inf. Syst.

View full text Add to dashboard Cite

The information explosion across the Internet and elsewhere offers access to an increasing number of document collections. In order for users to effectively access these collections, information retrieval (IR) systems must provide coordinated, concurrent, and distributed access. In this article, we explore how to achieve scalable performance in a distributed system for collection sizes ranging from 1GB to 128GB. We implement a fully functional distributed IR system based on a multithreaded version of the Inquery unified IR system. To explore the design space more fully, we also implement and validate a flexible simulation model. We measure performance as a function of system parameters such as client command rate, number of document collections, terms per query, query term frequency, number of answers returned, and command mixture. Our results show that it is important to model both query and document commands because the heterogeneity of commands significantly impacts performance. Based on our results, we recommend simple changes to the prototype and evaluate the changes using the simulator. Because of the significant resource demands of information retrieval, it is not difficult to generate workloads that overwhelm system resources regardless of the architecture. However under some realistic workloads, we demonstrate system organizations for which response time gracefully degrades as the workload increases and performance scales with the number of processors. This scalable architecture includes a surprisingly small number of brokers through which a large number of clients and servers communicate. INTRODUCTIONThe increasing numbers of large, unstructured text collections require full-text information retrieval (IR) systems in order for users to access them effectively. Current systems typically only allow users to connect to a single database either locally or perhaps on another machine. A distributed IR system should be able to provide multiple users with concurrent, efficient access to multiple text collections located on disparate sites. Since the documents in unstructured text collections are independent, IR systems are ideal applications to distribute across a network of workstations. However, the high resource demands of IR systems limit their performance, especially as the number of users, as well as the size and number of text collections, increases. Distributed computing offers a solution to these problems.Only recently have people published work on distributed architectures for information retrieval. The Very Large Collection track in the TREC conferences promotes the development of distributed and shared memory architectures for IR [Hawking and Thistlewaite 1997;Hawking et al. 1998]. Several researchers created distributed IR systems and demonstrated the feasibility of distributed architectures for information retrieval [Harman et al. 1991;Macleod et al. 1987]. However, it is not clear from these initial implementations how the systems will perform in practice, since, unlike the case for database syst...

show abstract

Section: Related Workmentioning

confidence: 99%

Evaluating the performance of distributed architectures for information retrieval using a variety of workloads

Cahoon

McKinley

Lu³

2000

ACM Trans. Inf. Syst.

View full text Add to dashboard Cite

show abstract

“…Among the first to study query processing specifically in a client-server environment were Hagmann and Ferrari [HF86]. They investigated different ways to split the functionality of a DBMS (e.g., query parsing, optimization, and execution) between client and server machines.…”

Section: Response Time Experimentsmentioning

confidence: 99%

Performance tradeoffs for client-server query processing

Franklin

Jónsson

Kossmann

1996

Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data - SIGMOD '96

View full text Add to dashboard Cite

The construction of high-per formancedatabuse systems that combine the best aspects of the relational and object-oriented approaches requires the design of client-server architectures that can fully exploit client and~erver resources in a jlexible manner The two predominant paradigms for c[ient-server query execution are datashipping and query -shqrpirrg We jirs~dejine these pohcies m wrms of the restrictions thev place on operator site se[ection during query optimization. We then investigate the performance tradeoffs between them for bulk query processing.While each strategy has advantages, nei!her one on its own is eflicient across a wide range of circumstances. We describe andevaluate a more~exible policy called hybrid-shipping, which can execute queries at clients, servers, or any combination of the two. Hybrid-shipping is shown to at least match the best of the two "pure" policies, and in some situations, to perform better than both. The implementation of hybrid-shipping raises a number of dlficuh problems for query optimization. We describe an initial investigation into the use of a .2-step query optimization strategy as a way of addressing these issues.

show abstract

“…In a client-server architecture, there is a question of where to optimize a query [Hagmann and Ferrari 1986] and where to keep the statistics for cache investment-at the client or at the server? Since we wanted to change the SHORE server code as little as possible, we decided to run an instance of the optimizer and also carry out cache investment at every client.…”

Section: Software and Hardwarementioning

confidence: 99%

Cache investment

Kossmann

Franklin

Drasch³

et al. 2000

ACM Trans. Database Syst.

View full text Add to dashboard Cite

Emerging distributed query-processing systems support flexible execution strategies in which each query can be run using a combination of data shipping and query shipping. As in any distributed environment, these systems can obtain tremendous performance and availability benefits by employing dynamic data caching. When flexible execution and dynamic caching are combined, however, a circular dependency arises: Caching occurs as a by-product of query operator placement, but query operator placement decisions are based on (cached) data location. The practical impact of this dependency is that query optimization decisions that appear valid on a per-query basis can actually cause suboptimal performance for all queries in the long run.To address this problem, we developed Cache Investment -a novel approach for integrating query optimization and data placement that looks beyond the performance of a single query. Cache Investment sometimes intentionally generates a "suboptimal" plan for a particular query in the interest of effecting a better data placement for subsequent queries. Cache Investment can be integrated into a distributed database system without changing the internals of the query optimizer. In this paper, we propose Cache Investment mechanisms and policies and analyze their performance. The analysis uses results from both an implementation on the SHORE storage manager and a detailed simulation model. Our results show that Cache Investment can significantly improve the overall performance of a system and demonstrate the trade-offs among various alternative policies.

show abstract

Performance analysis of several back-end database architectures

Cited by 32 publications

References 16 publications

Evaluating the performance of distributed architectures for information retrieval using a variety of workloads

Evaluating the performance of distributed architectures for information retrieval using a variety of workloads

Performance tradeoffs for client-server query processing

Cache investment

Contact Info

Product

Resources

About