A universal circuit (UC) can be programmed to simulate any circuit up to a given size n by specifying its program inputs. It provides elegant solutions in various application scenarios, e.g., for private function evaluation (PFE) and for improving the flexibility of attribute-based encryption schemes. The asymptotic lower bound for the size of a UC is (n log n), and Valiant (STOC'76) provided two theoretical constructions, the so-called 2-way and 4-way UCs (i.e., recursive constructions with 2 and 4 substructures), with asymptotic sizes ∼ 5n log 2 n and ∼ 4.75n log 2 n, respectively. In this article, we present and extend our results published in (Kiss and Schneider EUROCRYPT'16) and (Günther et al. ASIACRYPT'17). We validate the practicality of Valiant's UCs by realizing the 2-way and 4-way UCs in our modular open-source implementation. We also provide an example implementation for PFE using these sizeoptimized UCs. We propose a 2/4-hybrid approach that combines the 2-way and the 4-way UCs in order to minimize the size of the resulting UC. We realize that the bottleneck in universal circuit generation and programming becomes the memory consumption of the program since the whole structure of size O(n log n) is handled by the algorithms in memory. In this work, we overcome this by designing novel scalable algorithms for the UC generation and programming. Both algorithms use only O(n) memory at any point in time. We prove the practicality of our scalable design with a scalable proof-of-concept implementation for generating Valiant's 4-way UC. We note that this can be extended to work with optimized building blocks analogously. Moreover, we substantially improve the size of our UCs by including and implementing the recent optimization of Zhao et al. (ASIACRYPT'19) that reduces the asymptotic size of the 4-way UC to ∼ 4.5n log 2 n. Furthermore, we include their optimization in the implementation of our 2/4-hybrid UC which yields the smallest UC construction known so far.