In this paper, we propose a novel approach to combine compact directed acyclic word graphs (CDAWGs) and grammar-based compression. This leads us to an efficient self-index, called Linear-size CDAWGs (L-CDAWGs), which can be represented with O(ẽT log n) bits of space allowing for O(log n)-time random and O(1)-time sequential accesses to edge labels, and O(m log σ + occ)-time pattern matching. Here,ẽT is the number of all extensions of maximal repeats in T , n and m are respectively the lengths of the text T and a given pattern, σ is the alphabet size, and occ is the number of occurrences of the pattern in T . The repetitiveness measurẽ eT is known to be much smaller than the text length n for highly repetitive text. For constant alphabets, our L-CDAWGs achieve O(m + occ) pattern matching time with O(e r T log n) bits of space, which improves the pattern matching time of Belazzougui et al.'s run-length BWT-CDAWGs by a factor of log log n, with the same space complexity. Here, e r T is the number of right extensions of maximal repeats in T . As a byproduct, our result gives a way of constructing a straight-line program (SLP) of size O(ẽT ) for a given text T in O(n +ẽT log σ) time.
A substring u of a string T is called a minimal unique substring (MUS) of T if u occurs exactly once in T and any proper substring of u occurs at least twice in T . A string w is called a minimal absent word (MAW) of T if w does not occur in T and any proper substring of w occurs in T . In this paper, we study the problems of computing MUSs and MAWs in a sliding window over a given string T . We first show how the set of MUSs can change in a sliding window over T , and present an O(n log σ)-time and O(d)-space algorithm to compute MUSs in a sliding window of width d over T , where σ is the maximum number of distinct characters in every window. We then give tight upper and lower bounds on the maximum number of changes in the set of MAWs in a sliding window over T . Our bounds improve on the previous results in [Crochemore et al., 2017].
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.