SUMMARYBasic block vectorization consists in realizing instructionlevel parallelism inside basic blocks in order to generate SIMD instructions and thus speedup data processing. It is however problematic, because the vectorized program may actually be slower than the original one. Therefore, it would be useful to predict beforehand whether or not vectorization will actually produce any speedup. This paper proposes to do so by expressing vectorization profitability as a classification problem, and by predicting it using a machine learning technique called support vector machine (SVM). It considers three compilers (icc, gcc and llvm), and a benchmark suite made of 151 loops, unrolled with factors ranging from 1 to 20. The paper further proposes a technique that combines the results of two SVMs to reach 99% of accuracy for all three compilers. Moreover, by correctly predicting unprofitable vectorizations, the technique presented in this paper provides speedups of up to 2.16 times, 2.47 times and 3.83 times for icc, gcc and LLVM, respectively (9%, 18% and 56% on average). It also lowers to less than 1% the probability of the compiler generating a slower program with vectorization turned on (from more than 25% for the compilers alone).
SUMMARYModern optimizing compilers tend to be conservative and often fail to vectorize programs that would have benefited from it. In this paper, we propose a way to predict the relevant command-line options of the compiler so that it chooses the most profitable vectorization strategy. Machine learning has proven to be a relevant approach for this matter: fed with features that describe the software to the compiler, a machine learning device is trained to predict an appropriate optimization strategy. The related work relies on the control and data flow graphs as software features. In this article, we consider tensor contraction programs, useful in various scientific simulations, especially chemistry. Depending on how they access the memory, different tensor contraction kernels may yield very different performance figures. However, they exhibit identical control and data flow graphs, making them completely out of reach of the related work. In this paper, we propose an original set of software features that capture the important properties of the tensor contraction kernels. Considering the Intel Merom processor architecture with the Intel Compiler, we model the problem as a classification problem and we solve it using a support vector machine. Our technique predicts the best suited vectorization options of the compiler with a cross-validation accuracy of 93.4%, leading to up to a 3-times speedup compared to the default behavior of the Intel Compiler. This article ends with an original qualitative discussion on the performance of software metrics by means of visualization. All our measurements are made available for the sake of reproducibility.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.