Recent works showed that implementations of quicksort using vector CPU instructions can outperform the non‐vectorized algorithms in widespread use. However, these implementations are typically single‐threaded, implemented for a particular instruction set, and restricted to a small set of key types. We lift these three restrictions: our proposed vqsort algorithm integrates into the state‐of‐the‐art parallel sorter ips4o$$ ip{s}^4o $$, with a geometric mean speedup of 1.59. The same implementation works on seven instruction sets (including SVE and RISC‐V V) across four platforms. It also supports floating‐point and 16–128 bit integer keys. To the best of our knowledge, this is the fastest sort for large arrays of non‐tuple keys on CPUs, up to 20 times as fast as the sorting algorithms implemented in standard libraries. This article focuses on the practical engineering aspects enabling the speed and portability, which we have not yet seen demonstrated for a quicksort implementation. Furthermore, we introduce compact and transpose‐free sorting networks for in‐register sorting of small arrays, and a vector‐friendly pivot sampling strategy that is robust against adversarial input.
Constrained optimization problems arise frequently in classical machine learning. There exist frameworks addressing constrained optimization, for instance, CVXPY and GENO. However, in contrast to deep learning frameworks, GPU support is limited. Here, we extend the GENO framework to also solve constrained optimization problems on the GPU. The framework allows the user to specify constrained optimization problems in an easy-to-read modeling language. A solver is then automatically generated from this specification. When run on the GPU, the solver outperforms state-of-the-art approaches like CVXPY combined with a GPU-accelerated solver such as cuOSQP or SCS by a few orders of magnitude.
Computational problems ranging from artificial intelligence to physics require efficient computations of large tensor expressions. These tensor expressions can often be represented in Einstein notation. To evaluate tensor expressions in Einstein notation, that is, for the actual Einstein summation, usually external libraries are used. Surprisingly, Einstein summation operations on tensors fit well with fundamental SQL constructs. We show that by applying only four mapping rules and a simple decomposition scheme using common table expressions, large tensor expressions in Einstein notation can be translated to portable and efficient SQL code. The ability to execute large Einstein summation queries opens up new possibilities to process data within SQL. We demonstrate the power of Einstein summation queries on four use cases, namely querying triplestore data, solving Boolean satisfiability problems, performing inference in graphical models, and simulating quantum circuits. The performance of Einstein summation queries, however, depends on the query engine implemented in the database system. Therefore, supporting efficient Einstein summation computations in database systems presents new research challenges for the design and implementation of query engines.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.