This paper introduces RankSQL, a system that provides a systematic and principled framework to support efficient evaluations of ranking (top-k) queries in relational database systems (RDBMS), by extending relational algebra and query optimization. Previously, top-k query processing is studied in the middleware scenario or in RDBMS in a "piecemeal" fashion, i.e., focusing on specific operator or sitting outside the core of query engines. In contrast, we aim to support ranking as a first-class database construct. As a key insight, the new ranking relationship can be viewed as another logical property of data, parallel to the "membership" property of relational data model. While membership is essentially supported in RDBMS, the same support for ranking is clearly lacking. We address the fundamental integration of ranking in RDBMS in a way similar to how membership, i.e., Boolean filtering, is supported. We extend relational algebra by proposing a rank-relational model to capture the ranking property, and introducing new and extended operators to support ranking as a first-class construct. Enabled by the extended algebra, we present a pipelined and incremental execution model of ranking query plans (that cannot be expressed traditionally) based on a fundamental ranking principle. To optimize top-k queries, we propose a dimensional enumeration algorithm to explore the extended plan space by enumerating plans along two dual dimensions: ranking and membership. We also propose a sampling-based method to estimate the cardinality of rank-aware operators, for costing plans. Our experiments show the validity of our framework and the accuracy of the proposed estimation model.Example 1: Consider user Amy, who wants to plan her trip to Chicago. She wants to stay in a hotel, have lunch in an Italian restaurant (condition c1: r.cuisine=Italian), and walk to a museum after lunch; the hotel and the restaurant together should cost less than $100 (c2: h.price+r.price<100); the museum and the restaurant should be in the same area (c3: r.area=m.area). Further, to rank the qualified results, she specifies several ranking criteria, or "predicates"-for low hotel price, with p1: cheap(h.price); for close distance between the hotel and the restaurant, with p2: close(h.addr, r.addr); and for matching her interests with the museum's collections, with p3: related (m.collection, "dinosaur"). These ranking predicates return numeric scores and the overall scoring function sums up their values. The query is shown below in PostgreSQL syntax. SELECT * FROMHotel h, Restaurant r, Museum m WHERE c1 AND c2 AND c3 ORDER BY p1 + p2 + p3 LIMIT kWith current relational query processing capabilities, the only way to execute the previous query is to: (1) consume all the records of the three inputs; (2) join the three inputs and materialize the whole join results; (3) evaluate the three predicates p1, p2, and p3 for each valid join result; (4) sort the join results on p1 + p2 + p3;
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.