We consider several novel aspects of unique factorization in formal languages. We reprove the familiar fact that the set uf(L) of words having unique factorization into elements of L is regular if L is regular, and from this deduce an quadratic upper and lower bound on the length of the shortest word not in uf(L). We observe that uf(L) need not be context-free if L is context-free. Next, we consider variations on unique factorization. We define a notion of "semi-unique" factorization, where every factorization has the same number of terms, and show that, if L is regular or even finite, the set of words having such a factorization need not be context-free. Finally, we consider additional variations, such as unique factorization "up to permutation" and "up to subset". X Paul Bell, Daniel Reidenbach, and Jeffrey Shallit Proof. We just use the example of Proposition 5.⊓ ⊔ Theorem 20. If L is regular then ufs(L) need not be a CFL.Proof. We use a variation of the construction in the proof of Theorem 16. Let L = (ab) + (ac) + aa + (ba) + (ca) + + aa + aaa. Then (using the notation in the proof of Theorem 16), if w := aa(ab) r (ac) s aa(ba) t (ca) q aaa ∈ ufs(L) ∩ R with r, s, t, q ≥ 1 then there are two different factorizations of w:aa which are subset-invariant if and only if r = t and s = q. So ufs(L) ∩ R = {aa(ab) r (ac) s aa(ba) r (ca) s aaa : r, s ≥ 1}, which is not a CFL. ⊓ ⊔
AcknowledgmentThe idea of considering semi-unique factorization was inspired by a talk of Nasir Sohail at the University of Waterloo in April 2014.