Motivated by applications to string processing, we introduce variants of the Lyndon factorization called inverse Lyndon factorizations. Their factors, named inverse Lyndon words, are in a class that strictly contains anti-Lyndon words, that is Lyndon words with respect to the inverse lexicographic order. The Lyndon factorization of a nonempty word w is unique but w may have several inverse Lyndon factorizations. We prove that any nonempty word w admits a canonical inverse Lyndon factorization, named ICFL(w), that maintains the main properties of the Lyndon factorization of w: it can be computed in linear time, it is uniquely determined, it preserves a compatibility property for sorting suffixes. In particular, the compatibility property of ICFL(w) is a consequence of another result: any factor in ICFL(w) is a concatenation of consecutive factors of the Lyndon factorization of w with respect to the inverse lexicographic order.
IntroductionLyndon words were introduced in [25], as standard lexicographic sequences, and then used in the context of the free groups in [6]. A Lyndon word is a word which is strictly smaller than each of its proper cyclic shifts for the lexicographical ordering. A famous theorem concerning Lyndon words asserts that any nonempty word factorizes uniquely as a nonincreasing product of Lyndon words, called its Lyndon factorization. This theorem, that can be recovered from results in [6], provides an example of a factorization of a free monoid, as defined in [32] (see also [4,23]). Moreover, there are several results which give relations between Lyndon words, codes and combinatorics of words [3].The Lyndon factorization has recently revealed to be a useful tool also in string processing algorithms [2,29] with strong potentialities that have not been completely explored and understood. This is due also to the fact that it can be efficiently computed. Linear-time algorithms for computing this factorization can be found in [11,12] whereas an O(lg n)-time parallel algorithm has been proposed in [1,10]. A connection between the Lyndon factorization and the Lempel-Ziv (LZ) factorization has been given in [18], where it is shown that in general the size of the LZ factorization is larger than the size of the Lyndon factorization, and in any case the size of the Lyndon factorization cannot be larger than a factor of 2 with respect to the size of LZ.Relations between Lyndon words and the Burrows-Wheeler Transform (BWT) have been discovered first in [9,26] and, more recently, in [21]. Variants of BWT proposed in the previous papers are based on combinatorial results proved in [14] (see [30] for further details and [13] for more recent related results).Lyndon words are lexicographically smaller than all its proper nonempty suffixes. This explains why the Lyndon factorization has become of particular interest also in suffix sorting Lyndon words are also called prime words and their prefixes are also called preprime words in [19]. Interesting properties of Lyndon words are recalled below. Proposition 2.2 Eac...