We give a new characterization of maximal repetitions (or runs) in strings based on Lyndon words. The characterization leads to a proof of what was known as the "runs" conjecture (Kolpakov & Kucherov (FOCS '99)), which states that the maximum number of runs ρ(n) in a string of length n is less than n. The proof is remarkably simple, considering the numerous endeavors to tackle this problem in the last 15 years, and significantly improves our understanding of how runs can occur in strings. In addition, we obtain an upper bound of 3n for the maximum sum of exponents σ(n) of runs in a string of length n, improving on the best known bound of 4.1n by Crochemore et al. (JDA 2012), as well as other improved bounds on related problems. The characterization also gives rise to a new, conceptually simple linear-time algorithm for computing all the runs in a string. A notable characteristic of our algorithm is that, unlike all existing linear-time algorithms, it does not utilize the Lempel-Ziv factorization of the string. We also establish a relationship between runs and nodes of the Lyndon tree, which gives a simple optimal solution to the 2-Period Query problem that was recently solved by Kociumaka et al. (SODA 2015). * A preliminary version of this paper has appeared in [1].
In this paper, we propose a new dynamic compressed index of O(w) space for a dynamic text T , where w = O(min(z log N log * M, N )) is the size of the signature encoding of T , z is the size of the Lempel-Ziv77 (LZ77) factorization of T , N is the length of T , and M ≥ 4N is an integer that can be handled in constant time under word RAM model. Our index supports searching for a pattern P in T in O(|P |fA + log w log |P | log * M (log N + log |P | log * M ) + occ log N ) time and insertion/deletion of a substring of length y in O((y + log N log * M ) log w log N log * M ) time, where fA = O(min{ log log M log log w log log log M , log w log log w }). Also, we propose a new space-efficient LZ77 factorization algorithm for a given text of length N , which runs in O(N fA + z log w log 3 N (log * N ) 2 ) time with O(w) working space.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.