Abstract-The SQL group-by operator plays an important role in summarizing and aggregating large datasets in a data analytics stack. While the standard group-by operator, which is based on equality, is useful in several applications, allowing similarity aware grouping provides a more realistic view on real-world data that could lead to better insights. The Similarity SQL-based Group-By operator (SGB, for short) extends the semantics of the standard SQL Group-by by grouping data with similar but not necessarily equal values. While existing similarity-based grouping operators efficiently materialize this approximate semantics, they primarily focus on one-dimensional attributes and treat multi-dimensional attributes independently. However, correlated attributes, such as in spatial data, are processed independently, and hence, groups in the multi-dimensional space are not detected properly. To address this problem, we introduce two new SGB operators for multi-dimensional data. The first operator is the clique (or distance-to-all) SGB, where all the tuples in a group are within some distance from each other. The second operator is the distance-to-any SGB, where a tuple belongs to a group if the tuple is within some distance from any other tuple in the group. Since a tuple may satisfy the membership criterion of multiple groups, we introduce three different semantics to deal with such a case: (i) eliminate the tuple, (ii) put the tuple in any one group, and (iii) create a new group for this tuple. We implement and test the new SGB operators and their algorithms inside PostgreSQL. The overhead introduced by these operators proves to be minimal and the execution times are comparable to those of the standard Group-by. The experimental study, based on TPC-H and a social check-in data, demonstrates that the proposed algorithms can achieve up to three orders of magnitude enhancement in performance over baseline methods developed to solve the same problem.
OTIS (Optical Transpose Interconnection System) optoelectronic architecture is an attractive high-speed interconnection network. As a continuation for the research work performed on OTIS, this paper investigates broadcast and global combine communication operations on the promising all-port wormhole-routed OTIS-Mesh using the Extended Dominating Node (EDN) approach, referred to as EDN-OTIS-Mesh. The performance of broadcast and global combine operations is evaluated, both analytically and by simulation, in terms of the number of communication steps, latency, and latency improvement. A comparative study is conducted among three interconnection networks' architectures: the single-port wormhole-routed OTIS-Mesh, allport wormhole-routed OTIS-Mesh, and all-port wormholerouted EDN-OTIS-Mesh. The obtained analytical and simulation results show that the broadcast and global combine operations on all-port EDN-OTIS-Mesh significantly outperform the single-port and all-port OTIS-Mesh.
Modern location-based applications rely extensively on the ecient processing of spatial data and queries. Spatial query engines are commonly engineered as an extension to a relational database or a cluster-computing framework. Large parts of the spatial processing runtime is spent on evaluating spatial predicates and traversing spatial indexing structures. Typical high-level implementations of these spatial structures incur signicant interpretive overhead, which increases latency and lowers throughput. A promising idea to improve the performance of spatial workloads is to leverage native code generation techniques that have become popular in relational query engines. However, architecting a spatial query compiler is challenging since spatial processing has fundamentally dierent execution characteristics from relational workloads in terms of data dimensionality, indexing structures, and predicate evaluation.In this paper, we discuss the underlying reasons why standard query compilation techniques are not fully eective when applied to spatial workloads, and we demonstrate how a particular style of query compilation based on techniques borrowed from partial evaluation and generative programming manages to avoid most of these diculties by extending the scope of custom code generation into the data structures layer. We extend the LB2 main-memory query compiler, a relational engine developed in this style, with spatial data types, predicates, indexing structures, and operators. We show that the spatial extension matches the performance of specialized library code and outperforms relational and map-reduce extensions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations鈥揷itations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright 漏 2024 scite LLC. All rights reserved.
Made with 馃挋 for researchers
Part of the Research Solutions Family.