International audienceRecently, some high-performance IEEE 754 single precision floating-point software has been designed, which aims at best exploiting some features (integer arithmetic, parallelism) of the STMicroelectronics ST200 Very Long Instruction Word (VLIW) processor. We review here the techniques and software tools used or developed for this design and its implementation, and how they allowed very high instruction-level parallelism (ILP) exposure. Those key points include a hierarchical description of function evaluation algorithms, the exploitation of the standard encoding of floating-point data, the automatic generation of fast and accurate polynomial evaluation schemes, and some compiler optimizations
Abstract-Graphics and signal processing applications often require that sines and cosines be evaluated at a same floatingpoint argument, and in such cases a very fast computation of the pair of values is desirable. This paper studies how 32-bit VLIW integer architectures can be exploited in order to perform this task accurately for IEEE single precision. We describe software implementations for sinf, cosf, and sincosf over [−π/4, π/4] that have a proven 1-ulp accuracy and whose latency on STMicroelectronics' ST231 VLIW integer processor is 19, 18, and 19 cycles, respectively. Such performances are obtained by introducing a novel algorithm for simultaneous sine and cosine that combines univariate and bivariate polynomial evaluation schemes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.