Abstract-Key retrieval is very important in various applications. A trie and DAWG are data structures for key retrieval. The double array is one of methods to construct a trie and has both high speed and compactness. In this paper, a data structure of DAWG by the double array using BASE and CHECK is compared with that of DAWG by the double array using CHECK and NEXT, and the retrieval speed and the space usage are theoretically observed. When DAWG and DFA by the double array are constructed, it turns out that it is important to consider indexes for CHECK and NEXT arrays as edge numbers.Index Terms-Automaton, DAWG, double array, triple array.
I. INTRODUCTIONKey retrieval is used in various applications [1]. A trie is one of data structures to retrieve keys and merge common prefixes of keys [2]. Moreover, Directed Acyclic Word Gragh (DAWG) is a data structure to reduce the number of trie states [3]. DAWG merges common parts of keys. As the trie and DAWG are kinds of Deterministic Finite Automaton (DFA), they can be traditionally represented by a matrix form (transition table) and a linked list.The triple array is a data structure to construct DFA [4]. This method uses three one-dimensional arrays called BASE, CHECK, and NEXT in order to compress the matrix form. It has high speed because keys of length k can be retrieved by O(k). There is also a method called the double array [5], which compresses the triple array. This method deletes a NEXT array from the triple array and consists of BASE and CHECK. The double array is used in various applications and fields because of its high speed [6], [7]. However, because one state has only one parent state, the original double array can construct a trie but cannot construct DAWG and DFA. Moreover, the compact double array was proposed as a method to compress the double array [8]. This method reduces the space usage by storing traversed characters in CHECK. A method to construct DAWG by the features of the compact double array was proposed [9]. This method has higher speed and less space usage than other methods such as the matrix form, linked list and TST [10]. Furthermore, a method to construct DFA with CHECK and NEXT by deleting a BASE array from the triple array was proposed [11].In this paper, a data structure of DAWG by the double array using BASE and CHECK(BC DAWG) is compared with that of DAWG by the double array using CHECK and NEXT(CN DAWG), and the retrieval speed and the space usage are theoretically observed. A construction algorithm of CN DAWG is proposed. Moreover, features of DAWG by the double array are discussed.
II. DOUBLE ARRAY AND DAWGA. Trie A trie is a tree structure used for key retrieval in the field of natural language processing, and is a kind of DFA. Fig. 1 shows examples of the trie in key set K = {"aaa", "aba", "bbc", "cbc", "cc"}. Double circles show terminal states. The trie merges common prefixes of keys. Retrieval always starts from a root state (for example, state number 1 in Fig. 1), and traverses states by one-by-one character in the key....