Ehsan Yousefzadeh-Asl-Miandoab scite author profile

Ehsan Yousefzadeh-Asl-Miandoab

5Publications

3Citation Statements Received

0Citation Statements Given

How they've been cited

How they cite others

131

Affiliations

IT University of Copenhagen, Sharif University of Technology

Publications

Order By: Most citations

Nura

Darabi

Mahani

Baxishi

et al. 2022

Proc. ACM Meas. Anal. Comput. Syst.

View full text Add to dashboard Cite

Multi-application execution in Graphics Processing Units (GPUs), a promising way to utilize GPU resources, is still challenging. Some pieces of prior work (e.g., spatial multitasking) have limited opportunity to improve resource utilization, while other works, e.g., simultaneous multi-kernel, provide fine-grained resource sharing at the price of unfair execution. This paper proposes a new multi-application paradigm for GPUs, called NURA, that provides high potential to improve resource utilization and ensures fairness and Quality-of-Service (QoS). The key idea is that each streaming multiprocessor (SM) executes Cooperative Thread Arrays (CTAs) belong to only one application (similar to the spatial multi-tasking) and shares its unused resources with the SMs running other applications demanding more resources. NURA handles resource sharing process mainly using a software approach to provide simplicity, low hardware cost, and flexibility. We also perform some hardware modifications as an architectural support for our software-based proposal. We conservatively analyze the hardware cost of our proposal, and observe less than 1.07% area overhead with respect to the whole GPU die. Our experimental results over various mixes of GPU workloads show that NURA improves GPU system throughput by 26% compared to state-of-the-art spatial multi-tasking, on average, while meeting the QoS target. In terms of fairness, NURA has almost similar results to spatial multitasking, while it outperforms simultaneous multi-kernel by an average of 76%.

show abstract

OSM: Off-Chip Shared Memory for GPUs

Darabi

Yousefzadeh-Asl-Miandoab

Akbarzadeh

et al. 2022

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

Profiling and Monitoring Deep Learning Training Tasks

Yousefzadeh-Asl-Miandoab

Robroek

Tözün

2023

View full text Add to dashboard Cite

The embarrassingly parallel nature of deep learning training tasks makes CPU-GPU co-processors the primary commodity hardware for them. The computing and memory requirements of these tasks, however, do not always align well with the available GPU resources. It is, therefore, important to monitor and profile the behavior of training tasks on co-processors to understand better the requirements of different use cases. In this paper, our goal is to shed more light on the variety of tools for profiling and monitoring deep learning training tasks on server-grade NVIDIA GPUs. In addition to surveying the main characteristics of the tools, we analyze the functional limitations and overheads of each tool by using a both light and heavy training scenario. Our results show that monitoring tools like nvidia-smi and dcgm can be integrated with resource managers for online decision making thanks to their low overheads. On the other hand, one has to be careful about the set of metrics to correctly reason about the GPU utilization. When it comes to profiling, each tool has its time to shine; a framework-based or systemwide GPU profiler can first detect the frequent kernels or bottlenecks, and then, a lower-level GPU profiler can focus on particular kernels at the micro-architectural-level.

show abstract

An Analysis of Collocation on GPUs for Deep Learning Training

Robroek¹,

Yousefzadeh-Asl-Miandoab²,

Tözün³

2022

Preprint

View full text Add to dashboard Cite

Data Management and Visualization for Benchmarking Deep Learning Training Systems

Robroek

Duane

Yousefzadeh-Asl-Miandoab

et al. 2023

View full text Add to dashboard Cite

Evaluating hardware for deep learning is challenging. The models can take days or more to run, the datasets are generally larger than what fits into memory, and the models are sensitive to interference. Scaling this up to a large amount of experiments and keeping track of both software and hardware metrics thus poses real difficulties as these problems are exacerbated by sheer experimental data volume. This paper explores some of the data management and exploration difficulties when working on machine learning systems research. We introduce our solution in the form of an open-source framework built on top of a machine learning lifecycle platform. Additionally, we introduce a web environment for visualizing and exploring experimental data.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.