We present our recent code modernizations of the of the ab initio molecular dynamics program CPMD (www.cpmd.org) with a special focus on the ultra-soft pseudopotential (USPP) code path. Following the internal instrumentation of CPMD, all time critical routines have been revised to maximize the computational throughput and to minimize the communication overhead for optimal performance. Throughout the program missing hybrid MPI+OpenMP parallelization has been added to optimize scaling. For communication intensive routines, as the multiple distributed 3-d FFTs of the electronic states and distributed matrix-matrix multiplications related to the β-projectors of the pseudopotentials, this MPI+OpenMP parallelization now overlaps computation and communication. The necessary partitioning of the workload is optimized by an auto-tuning algorithm. In addition, the largest global MPI Allreduce operation has been replaced by highly tuned node-local parallelized operations using MPI shared-memory windows to avoid inter-node communication. A batched algorithm for the multiple 3-d FFTs improves the throughput of the MPI Alltoall communication and, thus, the scalability of the implementation, both for USPP and for the frequently used normconserving pseudopotential code path. The enhanced performance and scalability is demonstrated on a mid-sized benchmark system of 256 water molecules and further water systems of from 32 up to 2048 molecules.