Grammar graph

Grammar can be interpreted as directed graph (tree in a trivial cases). There is always a starting point from each you start traversing and the same node would be a “root” of a parse tree.

Let’s explore examples to understand the concept.

Token

The simplest possible grammar: S -> "a";. Grammar tree for it would look like:

Concatenation

Concatenation of two tokens S -> "a" "b";. In this case we can consider concatentaion as a separate node with label " $\cdot$ " or not. I prefer second version because it’s more compact:

❌ With explicit node	✅ Without explicit node

Unordered choice

S -> "a" | "b". Again we can show unordered choice as separate node with label " $\cup$ " or not.

❌ With explicit node	✅ Without explicit node

For unordered choice order of children doesn’t matter (but for concatentaion it’s important).

Kleene star

S -> "a"*;. Again we can show Kleene star as separate node with label " $\ast$ " or not.

❌ With explicit node	✅ Without explicit node

For all other operations situation is the same.

Named vs nameless nodes

Let’s take a look at this example A -> "a" "b"; S -> A | "c". In this case each node is named. But what about this example: S -> "a" "b" | "c". It is essentially the same grammar so it should have the same structure.

✅ Named nodes	✅ Nameless nodes

Binary vs N-arry operator

Concatentation is a binary operator (per se) so S -> "a" "b" "c"; can be represented as binary-tree (depending on associativity), or we can treat concatenation as N-arry operator.

❌ Binary left associative	❌ Binary right associative	✅ N-arry

General recursion

Classical example for recursion is S -> "" | S "a";.

" $\epsilon$ " is an empty string. Dotted line shows that this is cycle (loop) e.g. this is the same " $S$ " node. It’s drawn twice so it would be easier to understand the structure.

”LCRS” representation

Maybe it would make easier to understans as LCRS-tree, rather than DAG. For example: S -> "a" "b" "c";

As DAG	AS LCRS-tree