Ruby Y. Tahboub scite author profile

Abstract-The SQL group-by operator plays an important role in summarizing and aggregating large datasets in a data analytics stack. While the standard group-by operator, which is based on equality, is useful in several applications, allowing similarity aware grouping provides a more realistic view on real-world data that could lead to better insights. The Similarity SQL-based Group-By operator (SGB, for short) extends the semantics of the standard SQL Group-by by grouping data with similar but not necessarily equal values. While existing similarity-based grouping operators efficiently materialize this approximate semantics, they primarily focus on one-dimensional attributes and treat multi-dimensional attributes independently. However, correlated attributes, such as in spatial data, are processed independently, and hence, groups in the multi-dimensional space are not detected properly. To address this problem, we introduce two new SGB operators for multi-dimensional data. The first operator is the clique (or distance-to-all) SGB, where all the tuples in a group are within some distance from each other. The second operator is the distance-to-any SGB, where a tuple belongs to a group if the tuple is within some distance from any other tuple in the group. Since a tuple may satisfy the membership criterion of multiple groups, we introduce three different semantics to deal with such a case: (i) eliminate the tuple, (ii) put the tuple in any one group, and (iii) create a new group for this tuple. We implement and test the new SGB operators and their algorithms inside PostgreSQL. The overhead introduced by these operators proves to be minimal and the execution times are comparable to those of the standard Group-by. The experimental study, based on TPC-H and a social check-in data, demonstrates that the proposed algorithms can achieve up to three orders of magnitude enhancement in performance over baseline methods developed to solve the same problem.

show abstract

Performance evaluation of broadcast and global combine operations in all-port wormhole-routed OTIS-Mesh interconnection networks

Mahafzah

Tahboub

2010

Cluster Comput

View full text Add to dashboard Cite

OTIS (Optical Transpose Interconnection System) optoelectronic architecture is an attractive high-speed interconnection network. As a continuation for the research work performed on OTIS, this paper investigates broadcast and global combine communication operations on the promising all-port wormhole-routed OTIS-Mesh using the Extended Dominating Node (EDN) approach, referred to as EDN-OTIS-Mesh. The performance of broadcast and global combine operations is evaluated, both analytically and by simulation, in terms of the number of communication steps, latency, and latency improvement. A comparative study is conducted among three interconnection networks' architectures: the single-port wormhole-routed OTIS-Mesh, allport wormhole-routed OTIS-Mesh, and all-port wormholerouted EDN-OTIS-Mesh. The obtained analytical and simulation results show that the broadcast and global combine operations on all-port EDN-OTIS-Mesh significantly outperform the single-port and all-port OTIS-Mesh.

show abstract

On supporting compilation in spatial query engines

Tahboub

Rompf

2016

View full text Add to dashboard Cite

Architecting a Query Compiler for Spatial Workloads

Tahboub

Rompf

2020

View full text Add to dashboard Cite

Modern location-based applications rely extensively on the ecient processing of spatial data and queries. Spatial query engines are commonly engineered as an extension to a relational database or a cluster-computing framework. Large parts of the spatial processing runtime is spent on evaluating spatial predicates and traversing spatial indexing structures. Typical high-level implementations of these spatial structures incur signicant interpretive overhead, which increases latency and lowers throughput. A promising idea to improve the performance of spatial workloads is to leverage native code generation techniques that have become popular in relational query engines. However, architecting a spatial query compiler is challenging since spatial processing has fundamentally dierent execution characteristics from relational workloads in terms of data dimensionality, indexing structures, and predicate evaluation.In this paper, we discuss the underlying reasons why standard query compilation techniques are not fully eective when applied to spatial workloads, and we demonstrate how a particular style of query compilation based on techniques borrowed from partial evaluation and generative programming manages to avoid most of these diculties by extending the scope of custom code generation into the data structures layer. We extend the LB2 main-memory query compiler, a relational engine developed in this style, with spatial data types, predicates, indexing structures, and operators. We show that the spatial extension matches the performance of specialized library code and outperforms relational and map-reduce extensions.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ruby Y. Tahboub

How to Architect a Query Compiler, Revisited

Similarity Group-by Operators for Multi-Dimensional Relational Data

Performance evaluation of broadcast and global combine operations in all-port wormhole-routed OTIS-Mesh interconnection networks

On supporting compilation in spatial query engines

Architecting a Query Compiler for Spatial Workloads

Contact Info

Product

Resources

About