Development Effort Estimation in HPC

Wienke, Sandra; Miller, Julian; Schulz, Martin; Müller, Matthias S.

doi:10.1109/sc.2016.9

Cited by 12 publications

(8 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Productivity in particular is notoriously difficult to measure; notions such as "programmer time" and "cost" are hard to compare across developers and environments, and little can be done to quantify these values for code that has already been developed. See Wienke et al [3] for an exploration of some of these considerations and approaches for making useful measurements.…”

Section: Measurementmentioning

confidence: 99%

“…Measuring programmer productivity in general and absolute terms is likely to be impossible, and obtaining a complete picture of programmer productivity requires attention to all aspects of writing code, accounting for developer experience and choice of programming model [3]. Our experience suggests that performance portability practitioners are often satisfied by approximations relating to factors that are known (or assumed) to impact programmer productivity.…”

Section: Measuring Productivitymentioning

confidence: 99%

See 1 more Smart Citation

Navigating Performance, Portability, and Productivity

et al. 2021

View full text Add to dashboard Cite

The phrase "performance portability" is commonly used, but may mean different things to different people. Developing a better appreciation of the needs of different software developers and a framework for talking about these needs improves our ability to define goals, design experiments and make forward progress. This article discusses a methodology for quantifying, summarizing, visualizing, and understanding application performance portability and programmer productivity.

show abstract

Section: Measurementmentioning

confidence: 99%

Section: Measuring Productivitymentioning

confidence: 99%

Navigating Performance, Portability, and Productivity

et al. 2021

View full text Add to dashboard Cite

show abstract

“…Thus, we use development diaries to record the quantity and type of eort carried out by the students. To maximize the accuracy and comparability of the data while minimizing the intrusion of the data collection, we develope the electronic development diary EortLog 2 [6]. It uses strict input forms, precise questionnaires and xed intervals (60-minutes has proven well) to achieve highly accurate data.…”

Section: Degree Of Achieved Learning Objectivesmentioning

confidence: 99%

“…The collected productivity data of the students opens up wide areas of research into HPC programming productivity such as the estimation of software costs of HPC projects. While most of this research is ongoing and will require more data, some early results can be found in [2,3,6]. Figure 2 provides an example for the analyses carried out on the productivity data collected during the OpenMP lab in summer 2019.…”

Section: Evaluation Of the Labsmentioning

confidence: 99%

Self-paced Learning in HPC Lab Courses

Terboven¹,

Miller²,

Wienke³

et al. 2020

JOCSE

Self Cite

View full text Add to dashboard Cite

In a software lab, groups of students develop parallel code using modern tools, document the results and present their solutions. The learning objectives include the foundations of High-Performance Computing (HPC), such as the understanding of modern architectures, the development of parallel programming skills, and coursespecic topics, like accelerator programming or cluster setup. In order to execute the labs successfully with limited personnel resources and still provide students with access to world-class HPC architectures, we developed a set of concepts to motivate students and to track their progress. This includes the learning status survey and the developer diary, which are presented in this work. We also report on our experiences with using innovative teaching concepts to incentivize students to optimize their codes, such as using competition among the groups. Our concepts enable us to track the eectiveness of our labs and to steer them for increasing sizes of diverse students. We conclude that software labs are eective in adding practical experiences to HPC education. Our approach to hand out open tasks and to leave creative freedom in implementing the solutions enables the students to self-pace their learning process and to vary their investment of eort during the semester. Our eort and progress tracking ensures the achieving of the extensive learning objectives and enables our research on HPC programming productivity.

show abstract

“…Since software is a key component of HPC, there have been numerous efforts exploring the interaction between software engineering and HPC from as early as 2001. These efforts have attempted to understand if and how various software engineering aspects influence development and use of scientific software, e.g., architecture, platforms, programming model, tools/IDEs, effort estimation, developer experience, user preferences, code complexity, portability, performance [6,14,11,16,12,7,17]. They have also attempted to validate conjectures about user productivity in HPC community by interviewing HPC users [18].…”

mentioning

confidence: 99%

Why do Users Kill HPC Jobs?

Ranganath

Andresen

2018

2018 IEEE 25th International Conference on High Performance Computing (HiPC)

View full text Add to dashboard Cite

Given the cost of HPC clusters, making best use of them is crucial to improve infrastructure ROI. Likewise, reducing failed HPC jobs and related waste in terms of user wait times is crucial to improve HPC user productivity (aka human ROI). While most efforts (e.g., debugging HPC programs) explore technical aspects to improve ROI of HPC clusters, we hypothesize non-technical (human) aspects are worth exploring to make non-trivial ROI gains; specifically, understanding non-technical aspects and how they contribute to the failure of HPC jobs.In this regard, we conducted a case study in the context of Beocat cluster at Kansas State University. The purpose of the study was to learn the reasons why users terminate jobs and to quantify wasted computations in such jobs in terms of system utilization and user wait time. The data from the case study helped identify interesting and actionable reasons why users terminate HPC jobs. It also helped confirm that user terminated jobs may be associated with non-trivial amount of wasted computation, which if reduced can help improve the ROI of HPC clusters. MotivationGiven the cost of creating and operating high-performance computing (HPC) clusters, making best use of the clusters is crucial for infrastructure ROI, e.g., creation of a level 3 XSEDE [4] cluster like Beocat at Kansas State University (described in Section 2.2), can easily costs more than 2 million US dollars. Beyond merely keeping processors busy and memory/storage occupied, this is about the usefulness of computations performed on clusters, i.e., the results from computations are not wasted due to them being incomplete or incorrect or irrelevant. This latter goal is often pursued by exploring techniques to reduce HPC job failures stemming from hardware and/or software failures. In particular, there has been considerable interest in the HPC community to improve infrastructure ROI by identifying and tackling hurdles rooted in technical aspects of HPC. For example, in 2017, DOE published a technical report focused on needs and ways to specify, test/verify, and debug massively parallel programs [10]. There have been empirical studies to characterize and understand job failures by considering 1) various non-human factors such as spatial and temporal dependences between failures, power quality, temperature, and radiation [8], and 2) different statistics such as mean time between failures, mean time to repair [15], and submission inter-arrival time [19]. Ahrens et al. studied the use of HPC for data-intensive science in the US DOE and identified various challenges: support to monitor progress of computation and steer computation in real time, use novel and apt data abstractions and representations, and leverage couplings between experiments [5]. Faulk et al. attempted to define and measure HPC productivity in terms of science accomplished and the involved artifacts [9].While improving infrastructure ROI is important, we conjecture the community should also focus on improving human ROI. By human ROI (aka HPC user productivi...

show abstract

Development Effort Estimation in HPC

Cited by 12 publications

References 41 publications

Navigating Performance, Portability, and Productivity

Navigating Performance, Portability, and Productivity

Self-paced Learning in HPC Lab Courses

Why do Users Kill HPC Jobs?

Contact Info

Product

Resources

About