Tanzima Islam scite author profile

Tanzima Islam

5Publications

57Citation Statements Received

74Citation Statements Given

How they've been cited

How they cite others

124

Affiliations

Lawrence Livermore National Laboratory, Texas State University

Publications

Order By: Most citations

McrEngine: A Scalable Checkpointing System Using Data-Aware Aggregation and Compression

Islam

Mohror

Bagchi

et al. 2013

Scientific Programming

View full text Add to dashboard Cite

High performance computing (HPC) systems use checkpoint-restart to tolerate failures. Typically, applications store their states in checkpoints on a parallel file system (PFS). As applications scale up, checkpoint-restart incurs high overheads due to contention for PFS resources. The high overheads force large-scale applications to reduce checkpoint frequency, which means more compute time is lost in the event of failure. We alleviate this problem through a scalable checkpoint-restart system, mcrEngine. McrEngine aggregates checkpoints from multiple application processes with knowledge of the data semantics available through widely-used I/O libraries, e.g., HDF5 and netCDF, and compresses them. Our novel scheme improves compressibility of checkpoints up to 115% over simple concatenation and compression. Our evaluation with large-scale application checkpoints show that mcrEngine reduces checkpointing overhead by up to 87% and restart overhead by up to 62% over a baseline with no aggregation or compression.

show abstract

Gpnocsim - A General Purpose Simulator for Network-On-Chip

Hossain

Ahmed

Al-Nayeem

et al. 2007

View full text Add to dashboard Cite

CMT-Bone — A Proxy Application for Compressible Multiphase Turbulent Flows

Banerjee

Hackl

Shringarpure

et al. 2016

View full text Add to dashboard Cite

Performance optimality or reproducibility

Patki

Thiagarajan

Ayala

et al. 2019

View full text Add to dashboard Cite

The era of extremely heterogeneous supercomputing brings with itself the devil of increased performance variation and reduced reproducibility. There is a lack of understanding in the HPC community on how the simultaneous consideration of network traffic, power limits, concurrency tuning, and interference from other jobs impacts application performance. In this paper, we design a methodology that allows both HPC users and system administrators to understand the trade-off space between optimal and reproducible performance. We present a firstof-its-kind dataset that simultaneously varies multiple system-and user-level parameters on a production cluster, and introduce a new metric, called the desirability score, which enables comparison across different system configurations. We develop a novel, model-agnostic machine learning methodology based on the graph signal theory for comparing the influence of parameters on application predictability, and using a new visualization technique, make practical suggestions for best practices for multi-objective HPC environments. CCS Concepts• Computing methodologies → Parallel computing methodologies, Machine learning, Model development and analysis;• General and reference → Performance.

show abstract

College Life is Hard! - Shedding Light on Stress Prediction for Autistic College Students using Data-Driven Analysis

Islam¹,

Liang

Sweeney

et al. 2021

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Tanzima Islam

McrEngine: A Scalable Checkpointing System Using Data-Aware Aggregation and Compression

Gpnocsim - A General Purpose Simulator for Network-On-Chip

CMT-Bone — A Proxy Application for Compressible Multiphase Turbulent Flows

Performance optimality or reproducibility

College Life is Hard! - Shedding Light on Stress Prediction for Autistic College Students using Data-Driven Analysis

Contact Info

Product

Resources

About