2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS) 2018
DOI: 10.1109/padsw.2018.8644891
|View full text |Cite
|
Sign up to set email alerts
|

Partitioning and Bucketing Techniques to Speed up Query Processing in Spark-SQL

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 8 publications
(3 citation statements)
references
References 5 publications
0
3
0
Order By: Relevance
“…A technique established on frequent itemset mining was proposed in [16] towards partition, bucket, and sort tables (PBSTs) into a big DW with the more common predicate attributes in the workload. This method considered the quantity of the relation attributes, data skew, and the physical features of the cluster nodes.…”
Section: State Of the Art Of Horizontal Fragmentation Methodsmentioning
confidence: 99%
“…A technique established on frequent itemset mining was proposed in [16] towards partition, bucket, and sort tables (PBSTs) into a big DW with the more common predicate attributes in the workload. This method considered the quantity of the relation attributes, data skew, and the physical features of the cluster nodes.…”
Section: State Of the Art Of Horizontal Fragmentation Methodsmentioning
confidence: 99%
“…We select the 𝑇 𝑃𝐶 − 𝐷𝑆 [59], 𝑇 𝑃𝐶 − 𝐻 [12], and three programs from 𝐻𝑖𝐵𝑒𝑛𝑐ℎ [31] as representative programs to evaluate LOCAT, as shown in Table 1. 𝑇 𝑃𝐶 − 𝐷𝑆, containing 104 queries, has been widely used in Spark SQL systems for research and development of optimization techniques [18,32,47]. It models complex decision support functions to provide highly comparable, controlled, and repeatable tasks in evaluating the performance of Spark SQL systems [7].…”
Section: Representative Programsmentioning
confidence: 99%
“…In [28] the authors propose a technique based on frequent itemsets mining, to Partition, Bucket and Sort the Tables of a big data warehouse with the most frequent predicate attributes in the queries: they apply data mining algorithms on a queries workload to determine the most frequent predicate attributes, and use a hash-partitioning technique without making any assumptions about the filters used in the query predicates (i.e. in the Where clause of a SQL query).…”
Section: Related Workmentioning
confidence: 99%