2017
DOI: 10.1016/j.is.2016.09.007
|View full text |Cite
|
Sign up to set email alerts
|

Skewed distributions in semi-stream joins: How much can caching help?

Abstract: Semi-stream join algorithms join a fast data stream with a disk-based relation. This is important, for example, in real-time data warehousing where a stream of transactions is joined with master data before loading it into a data warehouse. In many important scenarios, the stream input has a skewed distribution, which makes certain performance optimisations possible.We propose two such optimisation techniques: 1) a caching technique for frequently used master data, and 2) a technique for selective load sheddin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
8
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
3
1
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(8 citation statements)
references
References 26 publications
0
8
0
Order By: Relevance
“…1) Requirements/challenges for real-time stream processing for real-time DWH Following requirements and challenges for implementation of real-time stream processing for real-time DWH were identified after exploring various studies [4], [14]- [18], [20]- [31], [34]- [36], [39]- [41], [43]- [50], [52], [59], [63]- [70], [78]:…”
Section: Assessment Of Rq3: Which Approaches/tools Have Been Repormentioning
confidence: 99%
See 2 more Smart Citations
“…1) Requirements/challenges for real-time stream processing for real-time DWH Following requirements and challenges for implementation of real-time stream processing for real-time DWH were identified after exploring various studies [4], [14]- [18], [20]- [31], [34]- [36], [39]- [41], [43]- [50], [52], [59], [63]- [70], [78]:…”
Section: Assessment Of Rq3: Which Approaches/tools Have Been Repormentioning
confidence: 99%
“…Some real-world trajectory datasets have been adopted by [56] and [58]: a fleet of trucks, a city buses for experimental evaluation of proposed methodologies whereas, synthetic datasets generated using the benchmark data generator were also used during evaluation by [56]. Semi-stream join algorithms developed by [20], [23], [43], [69], [70] were tested by using both synthetic and real-life datasets. They also analyzed memory and time requirements.…”
Section: Assessment Of Rq4: What Evidence Have Been Reported Whilementioning
confidence: 99%
See 1 more Smart Citation
“…Content may change prior to final publication. [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16] Hadoop no Similarity [17], [18], [19] Spark no Similarity [20], [21], [22], [23], [24], [25], [26], [27] N/A yes Equi [28] N/A yes Similarity [29] Spark yes Equi DSim-Join Spark yes Similarity…”
Section: Related Workmentioning
confidence: 99%
“…Naeem, Nguyen, and Weber [23] proposed multi-way semi-stream join methods. Naeem et al [24] presented a technique for load shedding in semistream join processing. Naeem, Weber, and Lutteroth [25] proposed a semi-stream join method for the many-to-many equi-join.…”
Section: Related Workmentioning
confidence: 99%