This paper addresses the execution cost of arithmetic operations with a focus on fuzzy arithmetic. Thanks to an appropriate representation format for fuzzy intervals, we show that it is possible to halve the number of operations and divide by 2 to 8 the memory requirements compared to conventional solutions. In addition, we demonstrate the benefit of some hardware features encountered in today's accelerators (GPU) such as static rounding, memory usage, instruction level parallelism (ILP) and thread-level parallelism (TLP). We then describe a library of fuzzy arithmetic operations written in CUDA and C++. The library is evaluated against traditional approaches using compute-bound and memory-bound benchmarks on Nvidia GPUs, with an observed performance gain of 2 to 20.