New computed tomography (CT) algorithms are commonly developed in high-level programming languages, such as Python or MATLAB, while low-level languages are used to support their computation-intensive operations. In the past decade, graphics processing units (GPUs) have become the de-facto standard for large parallel computations in areas such as computational imaging, image processing, and machine learning. Our fast-and-flexible CT reconstruction software, ASTRA Toolbox, therefore already implemented tomographic projectors, i.e., the core computational operations modeling the X-ray physics, using NVIDIA CUDA (Compute Unified Device Architecture), a low-level platform for computation on GPUs. However, the Python-C++ language barrier prevents high-level Python users from modifying these lowlevel projectors, and, as a consequence, research into new tomographic algorithms is more complex and time-consuming than necessary. With the ASTRA KernelKit, we lifted tomographic projectors to Python and leveraged CuPy, a numerical software like NumPy and SciPy that exposes CUDA to Python, to obtain a fine-grained control over their efficiency and implementation. In this article, we introduced our software and illustrated its importance for highperformance and data-driven applications using examples from deep learning, real-time X-ray CT, and kernel tuning.