Abstract-Ultrasound imaging is a reference medical diagnostic technique thanks to its blend of versatility, effectiveness and moderate cost. The core computation of all ultrasound imaging methods is based on simple formulae, except for those required to calculate delays with high precision and throughput. Unfortunately, advanced 3D systems require the calculation or storage of billions of such delay values per frame, which is a challenge. In 2D systems, this requirement can be four orders of magnitude lower, but efficient computation is still crucial in view of low-power implementations that can be battery-operated, enabling usage in rescue scenarios.In this paper we explore two smart designs of the delay generation function. To quantify their hardware cost, we implement them on FPGA and study their footprint and performance. We evaluate how these architectures scale to different ultrasound applications, from a low-power 2D system to a next-generation 3D machine. When using numerical approximations, we demonstrate the ability to generate delay values with sufficient throughput to support 10000-channel 3D imaging at up to 30 fps while using 63% of a Virtex 7 FPGA, requiring 24 MB of external memory accessed at about 32 GB/s bandwidth. Alternatively, with similar FPGA occupation, we show an exact calculation method that reaches 24 fps on 1225-channel 3D imaging and does not require external memory at all. Both designs can be scaled to use a negligible amount of resources for 2D imaging in low-power applications, and for ultrafast 2D imaging at hundreds of frames per second.