Skip to content

Parse tree

aka derivation tree, syntax tree, abstract syntax tree, concrete syntax tree.

Formal languages (branch of computer science and mathematics) studies diefferent problems. Two main problems (tasks) are:

  • recognition - answers if a given string belongs to the language or not (yes or no). Aka membership problem
  • parsing - solves recognition task, but it focuses more on structural analysis (producing a parse tree).

Recogintion vs Parsing

Ruzzo (1979) showed that any recognizer can be transformed to parser with the overhead of O(logn)O(\log n) at most.

Recognition not bothered with stuctural anlysis:

  • allows to work with with Regular Expressions (which typically given without explanation of how to construct parse tree)
  • allows any sound transformation of grammar, like Chomsky Normal Form, Greibach Normal Form, etc.

Parsing more focused on stuctural anlysis:

  • sound transformation of grammar changes the structure, so it is better to avoid them
  • Ambiguity - there can be more than one way to produce parse tree
  • Left recursion - can be problematic for some parsers

Constructing parse tree (formal)

TODO: Most of papers and books give quite hand-wavy explanation of parse tree construction (insert “Draw the rest of owl” meme here). I need to find a formal explanation of the process.

So far I found papers which actually talk about parse trees:

Examples

Naive mathematical expression

e := e "+" e | e "-" e | e "*" e | e "/" e | num
num := "1"

For string 1+1*1+1 there are 5 parse trees and only one is correct.

Adding left associativity

e := e "+" num | e "-" num | e "*" num | e "/" num | num
num := "1"

For string 1+1*1+1 there is only 1 parse tree, but incorrect

Adding precedence

e := add_sub
mul_div := mul_div "*" mul_div | mul_div "/" mul_div | num
add_sub := add_sub "+" add_sub | add_sub "-" add_sub | mul_div
num := "1"

For string 1+1*1+1 there are 2 parse trees and only one is correct.

Adding precedence and left associativity

e := add_sub
mul_div := mul | div | num
mul := mul_div "*" num
div := mul_div "/" num
add_sub := add | sub | mul_div
add := add_sub "+" mul_div
sub := add_sub "-" mul_div
num := "1"

For string 1+1*1+1 there is only 1 parse and it’s correct

It is harder to read (understand) grammar, and tree changed it’s shape drastically.

Removing intermediate nodes

Note: <> is the special feature which allows to hide nodes, it is rarely supported. Given here for comparison.

e := add_sub
<add_sub> := add | sub | mul_div
add := add_sub <"+"> mul_div
sub := add_sub <"-"> mul_div
<mul_div> := mul | div | num
mul := mul_div <"*"> num
div := mul_div <"/"> num
num := "1"

Final result

One more example with brackets to show how precedence works.