Optimizing XML processing for grid applications using an emulation framework

2012 IEEE 10th International Symposium on Parallel and Distributed Processing With Applications

2012

Self Cite

It is important to design and develop scientific middleware libraries to harness the opportunities presented by emerging multi-core processors that are available on grid and cloud environments. Scientific middleware libraries not adhering or adapting to this programming paradigm can suffer from severe performance limitations while executing on emerging multi-core processors. In this paper, we focus on the utilization of a critical shared resource on chip multiprocessors (CMPs), the L2 cache. The way in which an application schedules and assigns processing work to each thread determines the access pattern of the shared L2 cache, which may result in either enhancing or diminishing the effects of memory latency on a multi-core processor. Therefore, while processing scientific datasets such as HDF5, it is essential to conduct fine-grained analysis of cache utilization, to make informed processing and scheduling decisions in multi-threaded programming. In this paper, using the TAU toolkit for performance feedback from dual-and quad-core machines, we analyze and recommend methods for effective scheduling of threads on multi-core nodes to augment the performance of scientific applications processing HDF5 data. We discuss the benefits that can be achieved by using L2 Cache-Affinity and L2 Balanced-Set based scheduling algorithms for improving L2 cache performance and effectively the overall execution time.

show abstract

Section: Research Challenges Addressed In This Papermentioning

confidence: 99%

L2 Cache Performance Analysis and Optimizations for Processing HDF5 Data on Multi-core Nodes

2012 IEEE 10th International Symposium on Parallel and Distributed Processing With Applications

2012

Self Cite

show abstract

“…Our framework, McGrid, provides feedback at the micro-architectural level on the cache performance of a marked region of code [14]. McGrid is a configurable framework that runs on top of the SESC [22], which is a cycle-accurate architectural simulator tailored for multi-core architectural settings.…”

Section: Research Challenges Addressed In This Papermentioning

confidence: 99%

“…Such detailed configuration settings allows an in-depth study of how programming paradigms affect the cache and memory access patterns. In earlier work [14], we focused on analysing the number of CPU cycles taken by each core to process XML datasets. The results of that work are useful for application cases in which XML processing is CPU bound.…”

Section: Research Challenges Addressed In This Papermentioning

confidence: 99%

Cache Performance Optimization for Processing XML-Based Application Data on Multi-core Processors

2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing

2010

Self Cite

There is a critical need to develop new programming paradigms for grid middleware tools and applications to harness the opportunities presented by emerging multi-core processors. Implementations of grid middleware and applications that do not adapt to the programming paradigm when executing on emerging processors can severely impact the overall performance. In this paper we focus on the utilization of the L2 cache, which is a critical shared resource on Chip Multiprocessors. The access pattern of the shared L2 cache, which is dependent on how the application schedules and assigns processing work to each thread, can either enhance or undermine the ability to hide memory latency on a multi-core processor. None of the current grid simulators and emulators provides feedback and fine-grained performance data that is essential for a detailed analysis. In this paper, using the feedback from an emulation framework, we present performance analysis and provide recommendations on how processing threads can be scheduled on multi-core nodes to enhance the performance of a class of grid applications that requires processing of large-scale XML data. In particular, we discuss the gains associated with the use of the adaptations we have made to the Cache-Affinity and Balanced-Set scheduling algorithms to improve L2 cache performance, and hence the overall application execution time.

show abstract

Analysis of Cache Performance for Processing XML-Based Application Data on Multi-core Processors

2008 IEEE Fourth International Conference on eScience

2008