Proceedings of the 2019 International Conference on Management of Data 2019
DOI: 10.1145/3299869.3319892
|View full text |Cite
|
Sign up to set email alerts
|

A Scalable Index for Top-k Subtree Similarity Queries

Abstract: Given a query tree Q, the top-k subtree similarity query retrieves the k subtrees in a large document tree T that are closest to Q in terms of tree edit distance. The classical solution scans the entire document, which is slow. The state-of-theart approach precomputes an index to reduce the query time. However, the index is large (quadratic in the document size), building the index is expensive, updates are not supported, and data-specific tuning is required. We present a scalable solution for the top-k subtre… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
2
1

Relationship

3
3

Authors

Journals

citations
Cited by 9 publications
(4 citation statements)
references
References 35 publications
0
4
0
Order By: Relevance
“…The kNN-Join is not commutative, i.e., the order of the join partners matters. An efficient technique that leverages an inverted list on tokens that are partitioned into size stripes is the Cone algorithm [42], which is crafted for label sets in the context of top-k subtree similarity queries. To increase the limited scope of the original algorithm, we adapted it to leverage ScanCount.…”
Section: Sparse Vector-based Nn Methodsmentioning
confidence: 99%
“…The kNN-Join is not commutative, i.e., the order of the join partners matters. An efficient technique that leverages an inverted list on tokens that are partitioned into size stripes is the Cone algorithm [42], which is crafted for label sets in the context of top-k subtree similarity queries. To increase the limited scope of the original algorithm, we adapted it to leverage ScanCount.…”
Section: Sparse Vector-based Nn Methodsmentioning
confidence: 99%
“…The kNN-Join is not commutative, i.e., the order of the join partners matters. An efficient technique that leverages an inverted list on tokens that are partitioned into size stripes is the Cone algorithm [35], which is crafted for label sets in the context of top-𝑘 subtree similarity queries. To increase the limited scope of the original algorithm, we adapted it to leverage ScanCount.…”
Section: String Similarity Joinsmentioning
confidence: 99%
“…DBLP [6] stores bibliographic data in XML format and includes, among others, authors, titles, and venues of computer science publications. Due to its availability and intuitiveness, the DBLP dataset has been used in many works for experimental purposes, e.g., as a collection of sets [44, 45], as a collection of trees [37, 38, 46], as a large hierarchical document [34, 40], and as a coauthor network graph [42, 49]. In this section, we show the impact of differences in the data preparation process that converts raw DBLP XML data into the desired input format.…”
Section: A Link Is Not Enoughmentioning
confidence: 99%