Proceedings of the 37th International ACM SIGIR Conference on Research &Amp; Development in Information Retrieval 2014
DOI: 10.1145/2600428.2609624
|View full text |Cite
|
Sign up to set email alerts
|

Load balancing for partition-based similarity search

Abstract: All pairs similarity search, used in many data mining and information retrieval applications, is a time consuming process. Although a partition-based approach accelerates this process by simplifying parallelism management and avoiding unnecessary I/O and comparison, it is still challenging to balance the computation load among parallel machines with a distributed architecture. This is mainly due to the variation in partition sizes and irregular dissimilarity relationship in large datasets. This paper presents … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
6
0

Year Published

2015
2015
2019
2019

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 22 publications
0
6
0
Order By: Relevance
“…Traditional NNG construction methods could not scale to sets of object this large. Given the growing popularity of cloud computing, some of the traditional NNS methods were ported to cloud programming frameworks developed for dealing with big data (e.g., Hadoop, Spark) [1,2,14,18,31,38,39,43,46]. Most of the solutions use the MapReduce [20] framework and can be split into two categories.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Traditional NNG construction methods could not scale to sets of object this large. Given the growing popularity of cloud computing, some of the traditional NNS methods were ported to cloud programming frameworks developed for dealing with big data (e.g., Hadoop, Spark) [1,2,14,18,31,38,39,43,46]. Most of the solutions use the MapReduce [20] framework and can be split into two categories.…”
Section: Related Workmentioning
confidence: 99%
“…The second category of MapReduce methods use a mapperonly scheme, with no reducers [1,2,43]. They partition the set of objects into subsets (blocks) and use serial APSS methods to find pairwise similarities of objects in block pairs.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The second category of MapReduce methods use a mapper-only scheme, with no reducers [1,2,22]. They partition the set of objects into subsets (blocks) and use serial APSS methods to find pairwise similarities of objects in block pairs.…”
Section: Related Workmentioning
confidence: 99%
“…However, these methods suffer from high communication costs which make them inefficient for large datasets [2]. Partition based MapReduce methods [1,2,22] address this problem via block data decomposition, using serial APSS methods on MapReduce nodes to compute pairwise similarities between objects in block pairs. These methods could further benefit from multi-core parallel APSS solutions, which are not prevalent in the literature.…”
Section: Introductionmentioning
confidence: 99%