Scannerless parser
A scannerless parser (aka lexerless parsing), is a type of parser that doesn’t use a separate scanning (or lexical analysis) phase to tokenize the input before parsing it. Instead, it performs tokenization and syntax parsing in a single step. Traditional parsing involves a two-step process: first, a scanner (or lexer) breaks the input into tokens, and then a parser analyzes the token sequence according to a set of grammar rules.
Examples
Here are a few examples of scannerless parsers:
-
Generalized LR (GLR) Parsers.
-
SDF (Syntax Definition Formalism): This is a formalism used for defining syntax. It’s often associated with the Meta-Environment, an environment for generating programming environments. The SDF promotes scannerless parsing by allowing the integration of lexical and context-free syntax into a single specification.
-
Rascal: It’s a meta-programming language that uses SDF for syntax definition and promotes scannerless parsing.
-
PEG (Parsing Expression Grammar): While PEGs can be used with lexers, they are also inherently capable of scannerless parsing.
-
ANTLR: While ANTLR is primarily known as a parser generator that produces a separate lexer and parser, it can be configured for scannerless parsing if needed.
-
Elkhound: It is a parser generator that is based on the GLR parsing algorithm and can be used for scannerless parsing.
Quotes
Scannerless parsing is a parsing technique that does not use a scanner to divide a string into lexical tokens. Instead lexical analysis is integrated in the context-free analysis of the entire string. It comes up naturally when considering grammars that completely describe the syntax of a language. The term scannerless parsing was coined by Salomon and Cormack (1989, 1995). They use ‘complete character level grammars’ describing the entire syntax of a language down to the character level.