Reed-Muller (RM) codes and polar codes are generated by the same matrix G m = 1 0 1 1 ⊗m but using different subset of rows. RM codes select simply rows having largest weights. Polar codes select instead rows having the largest conditional mutual information proceeding top to down in G m ; while this is a more elaborate and channel-dependent rule, the top-to-down ordering has the advantage of making the conditional mutual information polarize, giving directly a capacity-achieving code on any binary memoryless symmetric channel (BMSC). RM codes are yet to be proved to have such property.In this paper, we reconnect RM codes to polarization theory. It is shown that proceeding in the RM code ordering, i.e., not top-to-down but from the lightest to the heaviest rows in G m , the conditional mutual information again polarizes. We further demonstrate that it does so faster than for polar codes. This implies that G m contains another code, different than the polar code and called here the twin code, that is provably capacity-achieving on any BMSC. This proves a necessary condition for RM codes to achieve capacity on BMSCs. It further gives a sufficient condition if the rows with largest conditional mutual information correspond to the heaviest rows, i.e., if the twin code is the RM code. We show here that the two codes bare similarity with each other and give further evidence that they are likely the same. E. Abbe is with the Mathematics Institute and the School of Computer and Communication Sciences at EPFL, Switzerland, and the Program in Applied and Computational Mathematics and the Department of Electrical Engineering in Princeton University, USA. M. Ye is with . 1 See [1] for accounts on this conjecture. 2 Recall that a BMS channel is a channel W : {0, 1} → Y such that there is a permutation π on the output alphabet Y satisfying i) π −1 = π and ii) W (y|1) = W (π(y)|0) for all y ∈ Y. arXiv:1901.11533v1 [cs.IT] 31 Jan 2019 : A ⊆ [m]). Next we define another n i.i.d. Bernoulli-1/2 random variables XWe transmit X (m) z , z ∈ {0, 1} m through n independent copies of a BMS channel W : {0, 1} → Y, and we denote the corresponding channel outputs as Y (m,W ) z , z ∈ {0, 1} m . Let X (m) := (X (m) z : z ∈ {0, 1} m ) and Y (m,W ) := (Y (m,W ) z : z ∈ {0, 1} m ).Since W is symmetric and (X (m) z = n(1 − I(W )).