Polar coding gives rise to the first explicit family of codes that provably achieve capacity for a wide range of channels with efficient encoding and decoding. But how fast can polar coding approach capacity as a function of the code length? In finite-length analysis, the scaling between code length and the gap to capacity is usually measured in terms of the scaling exponent µ. It is well known that the optimal scaling exponent, achieved by random binary codes, is µ = 2. It is also well known that the scaling exponent of conventional polar codes on the binary erasure channel (BEC) is µ = 3.627, which falls far short of the optimal value. On the other hand, it was recently shown that polar codes derived from ℓ × ℓ binary polarization kernels approach the optimal scaling exponent µ = 2 on the BEC as ℓ → ∞, with high probability over a random choice of the kernel.Herein, we focus on explicit constructions of ℓ × ℓ binary kernels with small scaling exponent for ℓ 64. In particular, we exhibit a sequence of binary linear codes that approaches capacity on the BEC with quasi-linear complexity and scaling exponent µ < 3. To the best of our knowledge, such a sequence of codes was not previously known to exist. The principal challenges in establishing our results are twofold: how to construct such kernels and how to evaluate their scaling exponent.In a single polarization step, an ℓ × ℓ kernel K ℓ transforms an underlying BEC into ℓ bit-channels W 1 , W 2 , . . . , W ℓ . The erasure probabilities of W 1 , W 2 , . . . , W ℓ , known as the polarization behavior of K ℓ , determine the resulting scaling exponent µ(K ℓ ). We first introduce a class of self-dual binary kernels and prove that their polarization behavior satisfies a strong symmetry property. This reduces the problem of constructing K ℓ to that of producing a certain nested chain of only ℓ/2 self-orthogonal codes. We use nested cyclic codes, whose distance is as high as possible subject to the orthogonality constraint, to construct the kernels K 32 and K 64 . In order to evaluate the polarization behavior of K 32 and K 64 , two alternative trellis representations (which may be of independent interest) are proposed. Using the resulting trellises, we show that µ(K 32 ) = 3.122 and explicitly compute over half of the polarization-behavior coefficients for K 64 , at which point the complexity becomes prohibitive. To complete the computation, we introduce a Monte-Carlo interpolation method, which produces the estimate µ(K 64 ) ≃ 2.87. We augment this estimate with a rigorous proof that µ(K 64 ) < 2.97.