Soft vector processors can augment and extend the capability of embedded hard vector processors in FPGA-based SoCs such as the Xilinx Zynq. We develop a compiler framework and an auto-tuning runtime that optimizes and executes data-parallel computation either on the scalar ARM processor, the embedded NEON engine or the Vectorblox MXP soft vector processor as appropriate. We consider computational conditions such as pre cision, vector length, chunk size, 10 requirements under which soft vector processing can outperform scalar cores and hard vector blocks. Across a range of data-parallel benchmarks, we show that the MXP soft vector processor can outperform the NEON engine by up to 3.95 x while saving 9% dynamic power (0.1 W absolute). Our compilation and runtime framework is also able to outperform the gcc NEON vectorizer under certain conditions by explicit generation of NEON intrinsics and performance tuning of the auto-generated data-parallel code.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.