Supporting KDD Applications by the k-Nearest Neighbor Join

Böhm, Christian; Krebs, Florian

doi:10.1007/978-3-540-45227-0_50

Cited by 25 publications

(24 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…A clustering algorithm based on closest pairs has been proposed in [13]. In [2,3] the authors study applications of the k-NN join operation to knowledge discovery, which is a direct extension of the k-Semi-Closest-Pair query. More specifically, the authors discuss the application of k-NN join to clustering, classification and sampling tasks in data mining operations, and they illustrate how these tasks can be performed more efficiently.…”

Section: Given Two Spatial Datasets Dmentioning

confidence: 99%

Processing Distance Join Queries with Constraints

Papadopoulos¹,

Νανόπουλος

Manolopoulos

2005

The Computer Journal

View full text Add to dashboard Cite

Distance-join queries are used in many modern applications, such as spatial databases, spatiotemporal databases, and data mining. One of the most common distance-join queries is the closest-pair query. Given two datasets D A and D B the closest-pair query (CPQ) retrieves the pair (a,b), where a ∈ D A and b ∈ D B , having the smallest distance between all pairs of objects. An extension to this problem is to generate the k closest pairs of objects (k-CPQ). In several cases spatial constraints are applied, and object pairs that are retrieved must also satisfy these constraints. Although the application of spatial constraints seems natural towards a more focused search, only recently they have been studied for the CPQ problem with the restriction that D A = D B . In this work, we focus on constrained closest-pair queries (CCPQ), between two distinct datasets D A and D B , where objects from D A must be enclosed by a spatial region R. Several algorithms are presented and evaluated using real-life and synthetic datasets. Among them, a heap-based method enhanced with batch capabilities outperforms the other approaches as it is demonstrated by an extensive performance evaluation.

show abstract

Section: Given Two Spatial Datasets Dmentioning

confidence: 99%

Processing Distance Join Queries with Constraints

Papadopoulos¹,

Νανόπουλος

Manolopoulos

2005

The Computer Journal

View full text Add to dashboard Cite

show abstract

“…A related problem, called AkNN, which reports the kNN for each data point, is directly used in the JarvisPatrick Clustering algorithm [16]. AkNN is also used in a number of other clustering algorithms including the kmeans and the k-medoid clustering algorithms [4].…”

Section: Introductionmentioning

confidence: 99%

“…In many applications that use ANN, especially large scientific applications, the datasets are growing rapidly and often the ANN computation is one of the main computational bottlenecks. Recognizing this problem, there has been a lot of interest in the database community in developing efficient external ANN algorithms [4,5,9,13,32]. All of these methods build R*-tree indices [3] on one or both datasets, and evaluate the ANN by traversing the index.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Efficient Evaluation of All-Nearest-Neighbor Queries

Chen

Patel

2007

2007 IEEE 23rd International Conference on Data Engineering

View full text Add to dashboard Cite

show abstract

“…A clustering algorithm based on closest pairs has been proposed in [12]. In [2,3] the authors study applications of the k-NN join operation to knowledge discovery, which is a direct extension of the k-semi-closest-pair query. More specifically, the authors discuss the application of k-NN join to clustering, classification and sampling tasks in data mining operations, and they illustrate how these tasks can be performed more efficiently.…”

Section: Introductionmentioning

confidence: 99%

Closest Pair Queries with Spatial Constraints

Papadopoulos

Νανόπουλος

Manolopoulos

2005

Advances in Informatics

View full text Add to dashboard Cite

Abstract. Given two datasets D A and D B the closest-pair query (CPQ) retrieves the pair (a,b), where a ∈ D A and b ∈ D B , having the smallest distance between all pairs of objects. An extension to this problem is to generate the k closest pairs of objects (k-CPQ). In several cases spatial constraints are applied, and object pairs that are retrieved must also satisfy these constraints. Although the application of spatial constraints seems natural towards a more focused search, only recently they have been studied for the CPQ problem with the restriction that D A = D B . In this work we focus on constrained closest-pair queries (CCPQ), between two distinct datasets D A and D B , where objects from DA must be enclosed by a spatial region R. A new algorithm is proposed, which is compared with a modified closest-pair algorithm. The experimental results demonstrate that the proposed approach is superior with respect to CPU and I/O costs.

show abstract

Supporting KDD Applications by the k-Nearest Neighbor Join

Cited by 25 publications

References 17 publications

Processing Distance Join Queries with Constraints

Processing Distance Join Queries with Constraints

Efficient Evaluation of All-Nearest-Neighbor Queries

Closest Pair Queries with Spatial Constraints

Contact Info

Product

Resources

About