We define two relations that capture the essential structure of the problem of computing LALR(1) look-ahead sets, and present an efficient algorithm to compute the sets in time linear in the size of the relations. In particular, for a PASCAL grammar, our algorithm performs less than 20% of the set unions performed by a popular-compiler (YACC).
Two relations that capture the essential structure of the problem of computing LALR(1) look-ahead sets are defined, and an efficient algorithm is presented to compute the sets in time linear in the size of the relations. In particular, for a PASCAL grammar, the algorithm performs fewer than 15 percent of the set unions performed by the popular compiler-compiler YACC.When a grammar is not LALR(1), the relations, represented explicitly, provide for printing useroriented error messages that specifically indicate how the look-ahead problem arose. In addition, certain loops in the digraphs induced by these relations indicate that the grammar is not LR(k) for any k.Finally, an oft-discovered and used but incorrect look-ahead set algorithm is similarly based on two other relations defined for the fwst time here. The formal presentation of this algorithm should help prevent its rediscovery.
LR parsers can be made to run 6 to 10 times as fast as the best table-interpretive LR parsers. The resulting parse time is negligible compared to the time required by the remainder of a typical compiler containing the parser.
A parsing speed of 1/2 million lines per minute on a computer similar to a VAX 11/780 was achieved, up from an interpretive speed of 40,000 lines per minute. A speed of 240,000 lines per minute on an Intel 80286 was achieved, up from an interpretive speed of 37,000 lines per minute.
The improvement is obtained by translating the parser's finite state control into assembly language. States become code memory addresses. The current input symbol resides in a register and a quick sequence of register-constant comparisons determines the next state, which is merely jumped to. The parser's push-down stack is implemented directly on a hardware stack. The stack contains code memory addresses rather than the traditional state numbers.
The strongly-connected components of the directed graph induced by the parser's terminal and nonterminal transitions are examined to determine a typically small subset of the states that require parse-time stack-overflow-check code when hardware does not provide the check automatically.
The increase in speed is at the expense of space: a factor of 2 to 4 increase in total table size can be expected, depending upon whether syntactic error recovery is required.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.