To prevent information leakage during program execution, modern software cryptographic implementations target constant-time function, where the number of instructions executed remains the same when program inputs change. However, the underlying microarchitecture behaves differently when processing different data inputs, impacting the execution time of the same instructions. These differences in execution time can covertly leak confidential information through a timing channel. Given the recent reports of covert channels present on commercial microprocessors, a number of microarchitectural features on CPUs have been reexamined from a timing leakage perspective. Unfortunately, a similar microarchitectural evaluation of the potential attack surfaces on GPUs has not been adequately performed. Several prior work has considered a timing channel based on the behavior of a GPU's coalescing unit. In this article, we identify a second finer-grained microarchitectural timing channel, related to the banking structure of the GPU's Shared Memory. By considering the timing channel caused by Shared Memory bank conflicts, we have developed a differential timing attack that can compromise table-based cryptographic algorithms. We implement our timing attack on an Nvidia Kepler K40 GPU and successfully recover the complete 128-bit encryption key of an Advanced Encryption Standard (AES) GPU implementation using 900,000 timing samples. We also evaluate the scalability of our attack method by attacking an implementation of the AES encryption algorithm that fully occupies the compute resources of the GPU. We extend our timing analysis onto other Nvidia architectures: Maxwell, Pascal, Volta, and Turing GPUs. We also discuss countermeasures and experiment with a novel multi-key implementation, evaluating its resistance to our side-channel timing attack and its associated performance overhead. CCS Concepts: • Security and privacy → Hardware security implementation; Side-channel analysis and countermeasures; • Computer systems organization → Single instruction, multiple data;