Skip to content

Grammar graph

Grammar can be interpreted as directed graph (tree in a trivial cases). There is always a starting point from each you start traversing and the same node would be a “root” of a parse tree.

Let’s explore examples to understand the concept.

Token

The simplest possible grammar: S -> "a";. Grammar tree for it would look like:

n9045749999596724 S n11926066025070647 a n9045749999596724->n11926066025070647

Concatenation

Concatenation of two tokens S -> "a" "b";. In this case we can consider concatentaion as a separate node with label "\cdot" or not. I prefer second version because it’s more compact:

❌ With explicit node✅ Without explicit node
n28598544910441936 S n3259858113927989 n28598544910441936->n3259858113927989 n709969674405277 a n3259858113927989->n709969674405277 n09397409907834131 b n3259858113927989->n09397409907834131
n9275719828898807 S n13858811829854334 a n9275719828898807->n13858811829854334 n0608917576374699 b n9275719828898807->n0608917576374699

Unordered choice

S -> "a" | "b". Again we can show unordered choice as separate node with label "\cup" or not.

❌ With explicit node✅ Without explicit node
n833565744870643 S n8111600604192408 n833565744870643->n8111600604192408 n3375454042790553 a n8111600604192408->n3375454042790553 n34585684919770143 b n8111600604192408->n34585684919770143
n5093862988777245 S n8222578374431715 a n5093862988777245->n8222578374431715 n6689840189905794 b n5093862988777245->n6689840189905794

For unordered choice order of children doesn’t matter (but for concatentaion it’s important).

Kleene star

S -> "a"*;. Again we can show Kleene star as separate node with label "\ast" or not.

❌ With explicit node✅ Without explicit node
n9904676439049169 S n4715160695855187 n9904676439049169->n4715160695855187 n2874105908452813 a n4715160695855187->n2874105908452813
n35560168923548563 S n9095794966974153 a n35560168923548563->n9095794966974153

For all other operations situation is the same.

Named vs nameless nodes

Let’s take a look at this example A -> "a" "b"; S -> A | "c". In this case each node is named. But what about this example: S -> "a" "b" | "c". It is essentially the same grammar so it should have the same structure.

✅ Named nodes✅ Nameless nodes
n0051716649760384925 S n21282337471991797 A n0051716649760384925->n21282337471991797 n15163416584669687 c n0051716649760384925->n15163416584669687 n4717782733655249 a n21282337471991797->n4717782733655249 n17871866509094025 b n21282337471991797->n17871866509094025
n15016587379595614 S n6297624985684844 n15016587379595614->n6297624985684844 n1298827004872951 c n15016587379595614->n1298827004872951 n08399881185030589 a n6297624985684844->n08399881185030589 n05548813618425519 b n6297624985684844->n05548813618425519

Binary vs N-arry operator

Concatentation is a binary operator (per se) so S -> "a" "b" "c"; can be represented as binary-tree (depending on associativity), or we can treat concatenation as N-arry operator.

❌ Binary left associative❌ Binary right associative✅ N-arry
n390678731498983 S n3521704761683109 n390678731498983->n3521704761683109 n5220913264269416 c n390678731498983->n5220913264269416 n1973427049619334 a n3521704761683109->n1973427049619334 n2813292173860835 b n3521704761683109->n2813292173860835
n6577219067953906 S n22070434476516376 a n6577219067953906->n22070434476516376 n6841801245504899 n6577219067953906->n6841801245504899 n8679035926310081 b n6841801245504899->n8679035926310081 n7464930420945375 c n6841801245504899->n7464930420945375
n10459542447635983 S n7910130465015557 a n10459542447635983->n7910130465015557 n8244463487806286 b n10459542447635983->n8244463487806286 n28036556725808714 c n10459542447635983->n28036556725808714

General recursion

Classical example for recursion is S -> "" | S "a";.

n6662298823076602 S n3347014940928845 ϵ n6662298823076602->n3347014940928845 n6765139321907154 n6662298823076602->n6765139321907154 n8561749272233043 S n6662298823076602->n8561749272233043 n6765139321907154->n8561749272233043 n05314451514827656 a n6765139321907154->n05314451514827656

"ϵ\epsilon" is an empty string. Dotted line shows that this is cycle (loop) e.g. this is the same "SS" node. It’s drawn twice so it would be easier to understand the structure.

”LCRS” representation

Maybe it would make easier to understans as LCRS-tree, rather than DAG. For example: S -> "a" "b" "c";

As DAGAS LCRS-tree
n10459542447635983 S n7910130465015557 a n10459542447635983->n7910130465015557 n8244463487806286 b n10459542447635983->n8244463487806286 n28036556725808714 c n10459542447635983->n28036556725808714
n10459542447635983 S n7910130465015557 a n10459542447635983->n7910130465015557 n8244463487806286 b n7910130465015557->n8244463487806286 n28036556725808714 c n8244463487806286->n28036556725808714