We propose a reproducible variant of the unblocked LU factorization for graphics processor units (GPUs).For this purpose, we build upon Level-1/2 BLAS kernels that deliver correctly-rounded and reproducible results for the dot (inner) product, vector scaling, and the matrix-vector product. In addition, we draw a strategy to enhance the accuracy of the triangular solve via iterative refinement. Following a bottom-up approach, we finally construct a reproducible unblocked implementation of the LU factorization for GPUs, which accommodates partial pivoting for stability and can be eventually integrated in a high performance and stable algorithm for the (blocked) LU factorization.The IEEE 754 standard, created in 1985 and then revised in 2008, has led to a considerable enhancement in the reliability of numerical computations by rigorously specifying the properties of floating-point arithmetic. This standard is now adopted by most processors, thus leading to a much better portability of numerical applications.Exascale computing (10 18 operations per second) is likely to be reached within a decade. For the type of systems yielding such performance rate, getting accurate and reproducible 1 results in floating-point arithmetic will represent two considerable challenges [9,28]. Reproducibility is also an important and useful property when debugging and checking the correctness of codes as well as for legal issues.Email addresses: riakymch@kth.se (Roman Iakymchuk), stef.graillat@lip6.fr (Stef Graillat), david.defour@univ-perp.fr (David Defour), quintana@uji.es (Enrique S. Quintana-Ortí) 1 By accuracy, we mean the relative error between the exact result and the computed result. We define reproducibility as the ability to obtain a bit-wise identical floating-point result from multiple runs of the code on the same input data.