Skip to content

About

  • Playground parsing algorithm is based on PwZ.
  • There are extensions to the orginal algorithm, for example, Kleene star, Lookaheads etc.
  • There are may be bugs

Notation

StatusNoteNotationExpression type
🟒concatenation (space)Seq
🟒unordered choice|Alt
🟒token"x"Tok
🟒1symbolx-
🟒Kleene star*Rep
🟑2Kleene plus+
🟑3optional?
πŸ”΄4quantifiers{n,m}
🟒5any character.Any
🟑6range[a-z]
🟒6set of characters[abc]
🟒7negation of set[^abc]-
🟒6escape sequences"\n"
🟑8lexemelex(x)Lex
🟑9ignoredign(x)Ign
🟑10positive lookahead&Pla
🟑negative lookahead!Nla
🟒11end of file!.Eof
🟑12ordered choice/
πŸ”΄13intersection
πŸ”΄14negation
πŸ”΄codepoints\u{hhhh}
πŸ”΄character classes\w, \d
  1. Symbol expressed as a property of Node (Expression)
  2. Kleene plus implemented as A+ = A A*. TODO: implement as Rep with min, max
  3. Optional implemented as A? = "" | A. TODO: implement as Rep with min, max
  4. Quantifiers. TODO: implement as Rep with min, max
  5. Matches any character of length 1. It doesn’t match EoF ("")
  6. For now implemented as Tok. TODO: support multiple ranges, like [a-zA-Z_]
  7. Implemented as property (invert) of the Tok
  8. Lexeme makes parser scanerless.
    • I have doubt about notation. For now I use lex(x). Other options:
      • Separate section?
      • At rule level? lex: S -> "a";, S:lex -> "a";
      • At symbol level? S -> lex("a"), S -> lex:"a"
  9. Ignored symbols consume input and output empty strings. Useful when tree compaction used (all empty nodes are removed).
    • For now implemented as separate Expression type, but could be as well implemented as property of Node (Expression).
    • I have doubt about notation. For now I use ign(x). Maybe ~x?
  10. Lookaheads similar to PEG operators
    • TODO: There are probably bugs in implementation (especially when lookaheads are recursive)
    • Allows to express some context-sensitive grammars, like anbncna^nb^nc^n
  11. End of file matched only if token is empty string (which can appear only in the end of file str[pos] || ""). The difference from Tok is that after matching it doesn’t stop traversing, instead it continues as if it was empty Seq
  12. Ordered choice operator from PEG simulated with negative lookahead: A / B = A | !A B
    • I think I need to use backtracking in order to implement effective ordered choice
  13. Intersection. Original Brzozowski paper had this, but when Might proposed PwD he omitted it. It could be trivially implemented in PwD
    • In PwZ it is a bit more tricky. Should work the same as Alt except it matches only if all branches match
    • Other formalism that proposed to use intersection is conjuctive grammar
    • Allows to express some context-sensitive grammars, like anbncna^nb^nc^n
  14. Negation makes sense for matching (negation of character, complement of a language, negative lookahead), but not for parsing. Which tree you suppose to return?