Skip to content

Parsing with derivatives

Parsing with Derivatives, 2011 by Matthew Might, David Darais, Daniel Spiewak

Related:

Brzozowski derivatives for context free languages

The idea was proposed by Matthew Might et al in 2010.

They extend the idea of Brzozowki derivative from regular to context-free languages. Then they create a set of functions that can be composed to represent grammars - the same way as in parser combinators. They rely on laziness and fixed point (essentially cycle detection) to make sure that the algorithm won’t do infinite recursion. Then they transform the recognizer into a parser and introduce the idea of derivative for the parser. And the last step is performance optimization - memoization is supposed to improve performance, but the problem is that on every parse it produces new grammar which grows. I guess, exponentially. To solve the problem with growing grammar they added a compaction step after each parse cycle (Brzozowki had the same idea).

What is good:

  • small code size
  • supports CFG including left recursive grammars
  • can handle ambiguous grammar - returns all possible parse trees
  • acceptable performance for such small code

What is bad:

  • no error reporting - it either produces all possible parse trees or none
  • not so trivial to implement if the language doesn’t provide some features

The initial intention was to “kill yacc” e.g. provide a simpler parsing algorithm (at least for learning purposes). It is indeed simple, but without fixing the problems listed above I don’t think it will “take over the world”.

Original idea

Idea of extending Kleene algebra with general recursion (Kleene star is an iteration) was investigated by Dexter Kozen and Hans Leiß in 1990-s. See Towards Kleene Algebra with Recursion.

They also ntocied that regular expressions with general recursion correspond to context free languages.