Parsing Expression Grammar (PEG) encodes a recursive-descent parser with limited backtracking. The parser has many useful properties. Converting PEG to an executable parser is a rather straightforward task. Unfortunately, PEG is not well understood as a language definition tool. It is thus of a practical interest to construct PEGs for languages specified in some familiar way, such as Backus-Naur Form (BNF). The problem was attacked by Medeiros in an elegant way by noticing that both PEG and BNF can be formally defined in a very similar way. Some of his results were extended in a previous paper by this author. We continue here with further extensions.
Parsing Expression Grammar (PEG) encodes a recursive-descent parser with limited backtracking. The parser has many useful properties, and with the use of memoization, it works in a linear time. In its appearance, PEG is almost identical to a grammar in Extended Backus-Naur Form (EBNF), but usually defines a different language. However, in some cases only minor typographical changes are sufficient to convert an EBNF grammar into its PEG parser. As recently shown by Medeiros, this is, in particular, true for LL(1) grammars. But this is also true for many non-LL(1) grammars, which is interesting because the backtracking of PEG is often a convenient way to circumvent just the LL(1) restriction. We formulate a number of conditions for EBNF grammar to become its own PEG parser, and arrive at a condition that we call LL(1p), meaning that a top-down parser can choose its next action by looking at the input within the reach of one parsing procedure (rather than by looking at the next letter). An extension to LL(kp) for k > 1 seems possible. * Address for correspondence: Ceremonimästarvägen 10, SE-181 40 Lidingö, Sweden 178 R.R. Redziejowski / From EBNF to PEGcontext-free grammars. The construction for LL(1) grammars is very simple: just replace the unordered choice "|" by the ordered choice "/". Unfortunately, this is not of much use when we employ PEG just in order to circumvent the LL(1) restriction. But it turns out that this property is not limited to LL(1) grammars. As an example, take the following EBNF grammar:This grammar is not LL(1), and not even LL(k) for any k: both Hex and Bin may start with any number of zeros and ones. A classical top-down parser constructed from this grammar cannot choose between the two alternatives of Literal by looking at any predefined number of characters ahead. But, treated as PEG with "|" denoting the ordered choice, this grammar recognizes exactly the language defined by its EBNF interpretation. Presented with input such as "1010B", Literal calls Hex that consumes "1010", fails to recognize "X", and backtracks. Literal proceeds then to try Bin and succeeds.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.