2020
DOI: 10.1007/s10796-020-09995-2
|View full text |Cite
|
Sign up to set email alerts
|

Cache-Based Multi-Query Optimization for Data-Intensive Scalable Computing Frameworks

Abstract: In modern large-scale distributed systems, analytics jobs submitted by various users often share similar work, for example scanning and processing the same subset of data. Instead of optimizing jobs independently, which may result in redundant and wasteful processing, multi-query optimization techniques can be employed to save a considerable amount of cluster resources. In this work, we introduce a novel method combining in-memory cache primitives and multi-query optimization, to improve the efficiency of data… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 16 publications
(10 citation statements)
references
References 43 publications
0
10
0
Order By: Relevance
“…Thus, common subexpressions are evaluated once. This approach was subsequently extended to include query result caches, materialized/cached views, intermediate query results, and query rewriting, which have been extensively studied for relational database systems [11], [14], [15], [28]- [31], [34], [35] and streaming processing systems [19]- [21]. Group processing algorithms have proven to be effective in multiple applications involving high-load conditions [7], [8], [15], [19]- [21], [27]- [31], [34]- [37], [52], [53].…”
Section: A Farthest Neighbor Search Algorithmsmentioning
confidence: 99%
“…Thus, common subexpressions are evaluated once. This approach was subsequently extended to include query result caches, materialized/cached views, intermediate query results, and query rewriting, which have been extensively studied for relational database systems [11], [14], [15], [28]- [31], [34], [35] and streaming processing systems [19]- [21]. Group processing algorithms have proven to be effective in multiple applications involving high-load conditions [7], [8], [15], [19]- [21], [27]- [31], [34]- [37], [52], [53].…”
Section: A Farthest Neighbor Search Algorithmsmentioning
confidence: 99%
“…These multi-query optimization techniques later expanded to involve query rewriting, query result caches, materialized views, and intermediate query results for relational database systems [ 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 ] and streaming processing systems [ 37 , 38 , 39 ]. Many applications involving high-load conditions have proven that batch processing algorithms can significantly reduce the query processing time for multiple simultaneous queries [ 19 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 ]. Furthermore, multi-query optimization techniques have received significant attention in spatial databases.…”
Section: Related Workmentioning
confidence: 99%
“…efficient data models, data processing pipelines and architectures to integrate standard and big data sources (Jovanovic et al 2020) as well as to improve resource utilization and aggregate performance in shared environments (Michiardi et al 2020); predictive analytics to forecast product demand in the fashion industry (Gardino et al 2020) and techniques to deal with the lack of annotated data for sensor-based human activity recognition (Prabono et al 2020); text data processing to assess the performance of text storage systems through a generic benchmark (Truicȃ et al 2020) and innovative solutions to deal with specific use cases such as the legal domain (Bordino et al 2020); novel approaches for mining social media to support intelligent transportation systems (Vallejos et al 2020) and digging deep the IoT scenario (Ustek-Spilda et al 2020); -solutions to deal with privacy issues in distance learning systems (Preuveneers et al 2020).…”
Section: Special Issue Contentmentioning
confidence: 99%
“…To gain a more efficient resource utilization and better aggregate performance in shared environments, where queries are concurrently submitted by multiple users, Multi-Query Optimization (MQO) techniques are adopted in paper (Michiardi et al 2020). The proposed system extends the SparkSQL Catalyst optimizer to provide a general approach to MQO for distributed computing frameworks that support a relational API.…”
Section: Efficient Data Models Data Processing Pipelines and Architementioning
confidence: 99%