About
- Playground parsing algorithm is based on PwZ: Parsing with zippers.
- There are extensions to the orginal algorithm, for example, Kleene star, Lookaheads etc.
- There are may be bugs
Notation
Status | Note | Notation | Expression type | |
---|---|---|---|---|
π’ | concatenation | (space) | Seq | |
π’ | unordered choice | | | Alt | |
π’ | token | "x" | Tok | |
π’ | 1 | symbol | x | - |
π’ | Kleene star | * | Rep | |
π‘ | 2 | Kleene plus | + | |
π‘ | 3 | optional | ? | |
π΄ | 4 | quantifiers | {n,m} | |
π’ | 5 | any character | . | Any |
π‘ | 6 | range | [a-z] | |
π’ | 6 | set of characters | [abc] | |
π’ | 7 | negation of set | [^abc] | - |
π’ | 6 | escape sequences | "\n" | |
π‘ | 8 | lexeme | lex(x) | Lex |
π‘ | 9 | ignored | ign(x) | Ign |
π‘ | 10 | positive lookahead | & | Pla |
π‘ | negative lookahead | ! | Nla | |
π’ | 11 | end of file | !. | Eof |
π‘ | 12 | ordered choice | / | |
π΄ | 13 | intersection | ||
π΄ | 14 | negation | ||
π΄ | codepoints | \u{hhhh} | ||
π΄ | character classes | \w, \d |
- Symbol expressed as a property of Node (Expression)
- Kleene plus implemented as
A+ = A A*
. TODO: implement asRep
with min, max - Optional implemented as
A? = "" | A
. TODO: implement asRep
with min, max - Quantifiers β. TODO: implement as
Rep
with min, max - Matches any character of length 1. It doesnβt match EoF ("")
- For now implemented as
Tok
. TODO: support multiple ranges, like[a-zA-Z_]
- Implemented as property (
invert
) of theTok
Lexeme
makes parser scannerless.- I have doubt about notation. For now I use
lex(x)
. Other options:- Separate section?
- At rule level?
lex: S -> "a";
,S:lex -> "a";
- At symbol level?
S -> lex("a")
,S -> lex:"a"
- I have doubt about notation. For now I use
- Ignored symbols consume input and output empty strings. Useful when tree compaction used (all empty nodes are removed).
- For now implemented as separate Expression type, but could be as well implemented as property of Node (Expression).
- I have doubt about notation. For now I use
ign(x)
. Maybe~x
?
- Lookaheads similar to PEG operators
- TODO: There are probably bugs in implementation (especially when lookaheads are recursive)
- Allows to express some context-sensitive grammars, like
- End of file matched only if token is empty string (which can appear only in the end of file
str[pos] || ""
). The difference fromTok
is that after matching it doesnβt stop traversing, instead it continues as if it was emptySeq
- Ordered choice operator from PEG simulated with negative lookahead:
A / B = A | !A B
- I think I need to use backtracking in order to implement effective ordered choice
- Intersection. Original Brzozowski paper had this, but when Might proposed PwD he omitted it. It could be trivially implemented in PwD
- In PwZ it is a bit more tricky. Should work the same as
Alt
except it matches only if all branches match - Other formalism that proposed to use intersection is conjuctive grammar
- Allows to express some context-sensitive grammars, like
- In PwZ it is a bit more tricky. Should work the same as
- Negation makes sense for matching (negation of character, complement of a language, negative lookahead), but not for parsing. Which tree you suppose to return?