2014 IEEE 30th International Conference on Data Engineering 2014
DOI: 10.1109/icde.2014.6816701
|View full text |Cite
|
Sign up to set email alerts
|

PHiDJ: Parallel similarity self-join for high-dimensional vector data with MapReduce

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
33
0

Year Published

2015
2015
2019
2019

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 33 publications
(33 citation statements)
references
References 21 publications
0
33
0
Order By: Relevance
“…For the cross aggregate ranking problem studied in this paper, there is no predicate defining part of the Cartesian product to be evaluated. As a result, query processing techniques for equi-join [31], thetajoin [32], similarity join [33], [34] are not applicable to our query processing problem.…”
Section: Mapreduce Variants and Enhancementsmentioning
confidence: 99%
See 1 more Smart Citation
“…For the cross aggregate ranking problem studied in this paper, there is no predicate defining part of the Cartesian product to be evaluated. As a result, query processing techniques for equi-join [31], thetajoin [32], similarity join [33], [34] are not applicable to our query processing problem.…”
Section: Mapreduce Variants and Enhancementsmentioning
confidence: 99%
“…The method uses the MapReduce shuffle mechanism to deliver tuples with the same join key to the same reducer. Studies on join processing in MapReduce are largely concerned with optimizing the query performance with respect to different join query types, such as equi-join [31], theta-join [32], similarity join [33], [34], k-NN join [35] and top-k join [28].…”
Section: Mapreduce Variants and Enhancementsmentioning
confidence: 99%
“…To deal with the similarity joins problem efficiently on large-scale data set, many researchers try to propose the parallel algorithms based on MapReduce framework 10 for different data types, such as set data, [11][12][13][14][15] vector data 9,[16][17][18][19][20][21][22] and spatial data. 23 Lin et al 11 firstly proposed brute force and indexed approaches to implement pairwise document similarity comparisons with MapReduce, but many duplicated comparisons existed in such approaches.…”
Section: Similarity Join Query In Distributed Processing Modelmentioning
confidence: 99%
“…The StatStream system [25] specializes in discovering correlations using a grid structure, but it incurs prohibitive communication cost in a distributed environment. Recently, partitioning-based approaches have attracted attention for distributed batch data processing [7], [19], [22]. However, such approaches are data-dependent and need an aprori data pre-scanning step to estimate the data distribution.…”
Section: Related Workmentioning
confidence: 99%
“…This has led to the development of many distributed, fault-tolerant, and realtime computation systems [2], [3], [13], [24]. Analogous to the trend observed in map-reduce systems (e.g., Apache Hadoop); where efficiently performing complex joins using map-reduce was a challenging problem [7], [19], [22], using distributed real-time computation engines for efficiently and continuously mining meaningful information from time-series is becoming challenging, as we will see later.…”
Section: Introductionmentioning
confidence: 99%