Siqi Wang scite author profile

Siqi Wang

5Publications

157Citation Statements Received

73Citation Statements Given

How they've been cited

271

156

How they cite others

Affiliations

University of Chinese Academy of Sciences, National University of Singapore, Shanghai Institute of Technical Physics

Publications

Order By: Most citations

Energy-efficient execution of data-parallel applications on heterogeneous mobile platforms

Prakash

Wang

Irimiea

et al. 2015

View full text Add to dashboard Cite

Abstract-State-of-the-art mobile system-on-chips (SoC) include heterogeneity in various forms for accelerated and energyefficient execution of diverse range of applications. The modern SoCs now include programmable cores such as CPU and GPU with very different functionality. The SoCs also integrate performance heterogeneous cores with different power-performance characteristics but the same instruction-set architecture such as ARM big.LITTLE. In this paper, we first explore and establish the combined benefits of functional heterogeneity and performance heterogeneity in improving power-performance behavior of data parallel applications. Next, given an application specified in OpenCL, we present a static partitioning strategy to execute the application kernel across CPU and GPU cores along with voltage-frequency setting for individual cores so as to obtain the best power-performance tradeoff. We achieve over 19% runtime improvement by exploiting the functional and performance heterogeneities concurrently. In addition, energy saving of 36% is achieved by using appropriate voltage-frequency setting without significantly degrading the runtime improvement from concurrent execution.

show abstract

Design Space exploration of FPGA-based accelerators with multi-level parallelism

Zhong

Prakash

Wang

et al. 2017

View full text Add to dashboard Cite

Neural Network Inference on Mobile SoCs

2020

View full text Add to dashboard Cite

The ever-increasing demand from mobile Machine Learning (ML) applications calls for evermore powerful onchip computing resources. Mobile devices are empowered with Heterogeneous Multi-Processor Systems on Chips (HMPSoCs) to process ML workloads such as Convolutional Neural Network (CNN) inference. HMPSoCs house several different types of ML capable components on-die, such as CPU, GPU, and accelerators. These different components are capable of independently performing inference but with very different powerperformance characteristics. In this article, we provide a quantitative evaluation of the inference capabilities of the different components on HMPSoCs. We also present insights behind their respective power-performance behaviour. Finally, we explore the performance limit of the HMPSoCs by synergistically engaging all the components concurrently.

show abstract

High-Throughput CNN Inference on Embedded ARM Big.LITTLE Multicore Processors

Wang

Ananthanarayanan

Zeng

et al. 2020

IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

View full text Add to dashboard Cite

IoT Edge intelligence requires Convolutional Neural Network (CNN) inference to take place in the edge devices itself. ARM big.LITTLE architecture is at the heart of prevalent commercial edge devices. It comprises of single-ISA heterogeneous cores grouped into multiple homogeneous clusters that enable power and performance trade-offs. All cores are expected to be simultaneously employed in inference to attain maximal throughput. However, high communication overhead involved in parallelization of computations from convolution kernels across clusters is detrimental to throughput. We present an alternative framework called Pipe-it that employs pipelined design to split convolutional layers across clusters while limiting parallelization of their respective kernels to the assigned cluster. We develop a performance-prediction model that utilizes only the convolutional layer descriptors to predict the execution time of each layer individually on all permitted core configurations (type and count). Pipe-it then exploits the predictions to create a balanced pipeline using an efficient design space exploration algorithm. Pipe-it on average results in a 39% higher throughput than the highest antecedent throughput.

show abstract

OPTiC: Optimizing Collaborative CPU–GPU Computing on Mobile Devices With Thermal Constraints

Wang

Ananthanarayanan

Mitra

2019

IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Siqi Wang

Energy-efficient execution of data-parallel applications on heterogeneous mobile platforms

Design Space exploration of FPGA-based accelerators with multi-level parallelism

Neural Network Inference on Mobile SoCs

High-Throughput CNN Inference on Embedded ARM Big.LITTLE Multicore Processors

OPTiC: Optimizing Collaborative CPU–GPU Computing on Mobile Devices With Thermal Constraints

Contact Info

Product

Resources

About