With their inherent capability to exploit parallelism, GPUs have become a popular platform for data-intensive scientific computing applications. This trend is expected to continue as the number of computations required by scientific applications reach the petascale and even exascale range. As the minimum feature size of transistors decreases due to improving process technology, GPUs are becoming more vulnerable to transient faults caused by events such as power fluctuations and alpha particle strikes, therefore we need methods that guarantee correct computation even in the presence of such faults. In this paper, we develop and analyze three fault tolerant schemes, FC-O, PC-C and PC-CS, for the block Householder QR algorithm that can deal with faults in the streaming processor (SP) core of a GPU. We also present a transient fault injection mechanism for NVIDIA GPUs, which has the capability of injecting faults of varying durations. We show that two of our schemes, PC-C and PC-CS, have good error coverage and relatively low overhead, and can scale reasonably well at the petascale and exascale range.