Parse tree
aka derivation tree, syntax tree, abstract syntax tree, concrete syntax tree.
Formal languages (branch of computer science and mathematics) studies diefferent problems. Two main problems (tasks) are:
- recognition - answers if a given string belongs to the language or not (yes or no). Aka membership problem
- parsing - solves recognition task, but it focuses more on structural analysis (producing a parse tree).
Recogintion vs Parsing
Ruzzo (1979) showed ↗ that any recognizer can be transformed to parser with the overhead of at most.
Recognition not bothered with stuctural anlysis:
- allows to work with with Regular Expressions (which typically given without explanation of how to construct parse tree)
- allows any sound transformation of grammar, like Chomsky Normal Form, Greibach Normal Form, etc.
Parsing more focused on stuctural anlysis:
- sound transformation of grammar changes the structure, so it is better to avoid them
- Ambiguity - there can be more than one way to produce parse tree
- Left recursion - can be problematic for some parsers
Constructing parse tree (formal)
TODO: Most of papers and books give quite hand-wavy explanation of parse tree construction (insert “Draw the rest of owl” meme here). I need to find a formal explanation of the process.
So far I found papers which actually talk about parse trees:
- M.-J. Nederhof and G. Satta. 2004. Tabular Parsing. ↗ In C. Martin-Vide, V. Mitrana, and G. Paun, editors, Formal Languages and Applications, Studies in Fuzziness and Soft Computing 148, pages 529-549.
- Kuramitsu, K. 2015. Fast, Flexible, and Declarative Construction of Abstract Syntax Trees with PEGs ↗. J. Inf. Process., 24, 123-131.
- Yamaguchi, Daisuke, and Kimio Kuramitsu. 2019. CPEG: A Typed Tree Construction from Parsing Expression Grammars with Regex-Like Captures ↗ Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing.
Examples
Naive mathematical expression
For string 1+1*1+1
there are 5 parse trees and only one is correct ↗.
Adding left associativity
For string 1+1*1+1
there is only 1 parse tree, but incorrect ↗
Adding precedence
For string 1+1*1+1
there are 2 parse trees and only one is correct ↗.
Adding precedence and left associativity
For string 1+1*1+1
there is only 1 parse and it’s correct ↗
It is harder to read (understand) grammar, and tree changed it’s shape drastically.
Removing intermediate nodes
Note: <>
is the special feature which allows to hide nodes, it is rarely supported. Given here for comparison.
One more example with brackets to show how precedence works ↗.