Skip to content

Scannerless parser

A scannerless parser (aka lexerless parsing), is a type of parser that doesn’t use a separate scanning (or lexical analysis) phase to tokenize the input before parsing it. Instead, it performs tokenization and syntax parsing in a single step. Traditional parsing involves a two-step process: first, a scanner (or lexer) breaks the input into tokens, and then a parser analyzes the token sequence according to a set of grammar rules.

Examples

Here are a few examples of scannerless parsers:

  1. Generalized LR (GLR) Parsers.

  2. SDF (Syntax Definition Formalism): This is a formalism used for defining syntax. It’s often associated with the Meta-Environment, an environment for generating programming environments. The SDF promotes scannerless parsing by allowing the integration of lexical and context-free syntax into a single specification.

  3. Rascal: It’s a meta-programming language that uses SDF for syntax definition and promotes scannerless parsing.

  4. PEG (Parsing Expression Grammar): While PEGs can be used with lexers, they are also inherently capable of scannerless parsing.

  5. ANTLR: While ANTLR is primarily known as a parser generator that produces a separate lexer and parser, it can be configured for scannerless parsing if needed.

  6. Elkhound: It is a parser generator that is based on the GLR parsing algorithm and can be used for scannerless parsing.

Quotes

Scannerless parsing is a parsing technique that does not use a scanner to divide a string into lexical tokens. Instead lexical analysis is integrated in the context-free analysis of the entire string. It comes up naturally when considering grammars that completely describe the syntax of a language. The term scannerless parsing was coined by Salomon and Cormack (1989, 1995). They use ‘complete character level grammars’ describing the entire syntax of a language down to the character level.

Scannerless generalized-LR parsing, Eelco Visser, 1999