Valgrind

Nethercote, Nicholas; Seward, Julian

doi:10.1145/1273442.1250746

Cited by 664 publications

(101 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Also, the Valgrind [36] tool and SimpleScalar simulator [42] are used to measure the total number of L1 and L2 data cache accesses and misses. The first processor contains 8 128-bit XMM registers, L1 data and instruction caches of size 32 kbytes and shared L2 cache of size 4 Mbytes.…”

Section: Resultsmentioning

confidence: 99%

“…Intel Pentium core 2 duo and i7, by using the SIMD (Single Instruction Multiple Data) unit; the proposed methodology is compared with the cblas sgemv routine of ATLAS which runs on general purpose processors only. Also Valgrind [36] tool and SimpleScalar simulator [42] are used to measure the total number of instructions executed and the number of L1 and L2 data cache accesses and misses.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A methodology for speeding up matrix vector multiplication for single/multi-core architectures

et al. 2015

View full text Add to dashboard Cite

International audienceIn this paper, a new methodology for computing the Dense Matrix Vector Multiplication, for both embedded (processors without SIMD unit) and general purpose processors (single and multi-core processors, with SIMD unit), is presented. This methodology achieves higher execution speed than ATLAS state-of-the-art library (speedup from 1.2 up to 1.45). This is achieved by fully exploiting the combination of the software (e.g., data reuse) and hardware parameters (e.g., data cache associativity) which are considered simultaneously as one problem and not separately, giving a smaller search space and high-quality solutions. The proposed methodology produces a different schedule for different values of the (i) number of the levels of data cache; (ii) data cache sizes; (iii) data cache associativities; (iv) data cache and main memory latencies; (v) data array layout of the matrix and (vi) number of cores

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

A methodology for speeding up matrix vector multiplication for single/multi-core architectures

et al. 2015

View full text Add to dashboard Cite

show abstract

“…In consequence it is important to benchmark all performance optimizations. In this work perf [6], valgrind [17] and std::chrono::high_resolution_clock [13] were used to benchmark the execution time and identify critical code sections. The most time consuming code section in the SGBDT algorithm is the calculation of the cumulative probability histograms (CPH), which are required in order to calculate the best-cut at each node of the tree.…”

Section: Methodsmentioning

confidence: 99%

FastBDT: A Speed-Optimized Multivariate Classification Algorithm for the Belle II Experiment

Keck

2017

Comput Softw Big Sci

View full text Add to dashboard Cite

during which the fitted classifier is applied to new datapoints with unknown labels. During the fitting-phase, the internal parameters (or model) of a multivariate classifier are adjusted, so that the classifier can statistically distinguish signal and background data-points. The model complexity plays an important role during the fitting-phase and can be controlled by the hyper-parameters of the model. If the model is too simple (too complex) it will be under-fitted (over-fitted) and perform poorly on test data-points with unknown labels.Stochastic gradient-boosted decision trees [8] are widely employed in high energy physics for multivariate classification and regression tasks. The implementation presented in this paper was developed for the Belle II experiment [2], which is located at the SuperKEKB collider in Tsukuba, Japan. Multivariate classification is extensively used in the Belle II Analysis Software Framework (BASF2) [16], for instance during the reconstruction of particle tracks, as part of particle identification algorithms, and to suppress background processes in physics analyses. Often, a large amount of classifiers must be fitted due to hyper-parameter optimization, different background scenarios, to gain improved estimates on the importance of individual features, or to create networks of classifiers which feed into one another. Therefore, Belle II required a default multivariate classification algorithm which is: fast during fitting and application; robust enough to be trained in an automated environment; can be reliably used by non-experts; preferably generates an interpretable model and exhibits a good out-of-the box performance.FastBDT satisfies those requirements and is the default multivariate classification algorithm in BASF2. On the other hand, BASF2 supports other popular multivariate analysis frameworks like TMVA [9], scikit-learn (SKLearn) [19], XGBoost [5] and Tensorflow [1] as well.Abstract Stochastic gradient-boosted decision trees are widely employed for multivariate classification and regression tasks. This paper presents a speed-optimized and cachefriendly implementation for multivariate classification called FastBDT. The concepts used to optimize the execution time are discussed in detail in this paper. The key ideas include: an equal-frequency binning on the input data, which allows replacing expensive floating-point with integer operations, while at the same time increasing the quality of the classification; a cache-friendly linear access pattern to the input data, in contrast to usual implementations, which exhibit a random access pattern. FastBDT provides interfaces to C/ C++, Python and TMVA. It is extensively used in the field of high energy physics (HEP) by the Belle II experiment.

show abstract

“…We use the Valgrind [47] utility to handle the execution and to output every load instruction to a temporary file. Next, the temporary file is processed: each load instruction is kept if the address falls into one of the thirteen ranges.…”

Section: Approachmentioning

confidence: 99%

QMDS: a file system metadata management service supporting a graph data model-based query language

Ames

Gokhale

Maltzahn

2013

International Journal of Parallel, Emergent and Distributed Sys

View full text Add to dashboard Cite

File system metadata management has become a bottleneck for many data-intensive applications that rely on high-performance file systems. Part of the bottleneck is due to the limitations of an almost 50 year old interface standard with metadata abstractions that were designed at a time when high-end file systems managed less than 100MB. Today's highperformance file systems store 7 to 9 orders of magnitude more data, resulting in numbers of data items for which these metadata abstractions are inadequate, such as directory hierarchies unable to handle complex relationships among data. Users of file systems have attempted to work around these inadequacies by moving application-specific metadata management to relational databases to make metadata searchable. Splitting file system metadata management into two separate systems introduces inefficiencies and systems management problems.To address this problem, we propose QMDS: a file system metadata management service that integrates all file system metadata and uses a graph data model with attributes on nodes and edges. Our service uses a query language interface for file identification and attribute retrieval. We present our metadata management service design and architecture and study its performance using a text analysis benchmark application. Results from our QMDS prototype show the effectiveness of this approach. Compared to the use of a file system and relational database, the QMDS prototype shows superior performance for both ingest and query workloads.

show abstract

Valgrind

Cited by 664 publications

References 21 publications

A methodology for speeding up matrix vector multiplication for single/multi-core architectures

A methodology for speeding up matrix vector multiplication for single/multi-core architectures

FastBDT: A Speed-Optimized Multivariate Classification Algorithm for the Belle II Experiment

QMDS: a file system metadata management service supporting a graph data model-based query language

Contact Info

Product

Resources

About