We study a Hoare Logic to reason about parallel programs executed on graphics processing units (GPUs), called GPU kernels. During the execution of GPU kernels, multiple threads execute in lockstep, that is, execute the same instruction simultaneously. When the control branches, the two branches are executed sequentially, but during the execution of each branch only those threads that take it are enabled; after the control converges, all the threads are enabled and again execute in lockstep. In this article, we first consider a semantics in which all threads execute in lockstep (this semantics simplifies the actual execution model of GPUs) and adapt Hoare Logic to this setting by augmenting the usual Hoare triples with an additional component representing the set of enabled threads. It is determined that the soundness and relative completeness of the logic do not hold for all programs; a difficulty arises from the fact that one thread can invalidate the loop termination condition of another thread through shared memory. We overcome this difficulty by identifying an appropriate class of programs for which the soundness and relative completeness hold. Additionally, we discuss thread interleaving, which is present in the actual execution of GPUs but not in the lockstep semantics mentioned above. We show that if a program is race free, then the lockstep and interleaving semantics produce the same result. This implies that our logic is sound and relatively complete for race-free programs, even if the thread interleaving is taken into account.