Skip to content

Parser combinators

In 1985 Wadler proposed to represent parsers as functions and to construct bigger parsers by composing functions. The key idea is to start from parsers for atomic languages:

-- empty language
fail xs       = []            -- Dx(∅)   = ∅
-- trivial language - empty string
empty v xs    = [(v, xs)]     -- Dϵ(P)   = P
-- singleton language
lit 'a' "abc" = [('a', "bc")] -- Da(abc) = bc
lit 'b' "abc" = []            -- Db(abc) = ∅

Then atomic parsers can be composed with union (alt), concatenation (seq), and Kleene star (rep) to create parsers for “bigger” languages.

It is a bit harder to see correspondence of derivatives to those parsers:

alt (lit 'a') (lit 'b') "abc" = [('a', "bc")]
seq (lit 'a') (lit 'b') "abc" = [("ab", "c")]
rep (lit 'a') "aac"           = [("aa", "c"), ("a", "ac"), ("", "aac")]

The problem here is that the derivative corresponds to the recognizer. Parsers are a bit trickier, than recognizers.

Ruzzo in 1979 showed that any recognizer can be transformed to parser with the overhead of O(log n) at most.

Wadler shows the importance of lazy evaluation in this approach. It can’t handle left recursive grammars.