The U-Net architecture is a state-of-the-art neural network for semantic image segmentation that is widely used in biomedical research. It is based on an encoder-decoder framework and its vanilla version shows already high performance in terms of segmentation quality. Due to its large parameter space, however, it has high computational costs on both, CPUs and GPUs. In a research setting, inference time is relevant, but not crucial for the results. However, especially in mobile, clinical applications a light and fast variant would allow deep-learning assisted, objective diagnosis at the point of care. In this work, we suggest an optimized, tiny-weight U-Net for an inexpensive hardware accelerator. We first mined the U-Net architecture to reduce computational complexity to increase runtime performance by simultaneously keeping the accuracy on a high level. Using an open, biomedical dataset for high-speed videoendoscopy (BAGLS), we show that we can dramatically reduce the parameter space and computations by over 99.8% while keeping the segmentation performance at 95% of our baseline. Using a custom upscaling routine, we further successfully deployed our optimized U-Net to an EdgeTPU hardware accelerator to gain costeffective speed improvements on conventional computers and to showcase the applicability of EdgeTPUs for biomedical imaging processing of large images on portable devices. Combining the optimized architecture and the EdgeTPU, we gain a speedup of >79-times compared to our initial baseline while keeping high accuracy. This combination allows to provide immediate results to the clinician, especially in constrained computational environments, and an objective diagnosis at the point of care.