Skip to content

Grammar graph

Grammar can be interpreted as directed graph (tree in a trivial cases). There is always a starting point from each you start traversing and the same node would be a “root” of a parse tree.

Let’s explore examples to understand the concept.

Token

The simplest possible grammar: S -> "a";. Grammar tree for it would look like:

n813184298163284 S n421311476429173 a n813184298163284->n421311476429173

Concatenation

Concatenation of two tokens S -> "a" "b";. In this case we can consider concatentaion as a separate node with label "\cdot" or not. I prefer second version because it’s more compact:

❌ With explicit node✅ Without explicit node
n32742138657299513 S n9320802439043545 n32742138657299513->n9320802439043545 n18778765561448374 a n9320802439043545->n18778765561448374 n9043643905096721 b n9320802439043545->n9043643905096721
n11606046068676701 S n6917593460068514 a n11606046068676701->n6917593460068514 n09814866093295804 b n11606046068676701->n09814866093295804

Unordered choice

S -> "a" | "b". Again we can show unordered choice as separate node with label "\cup" or not.

❌ With explicit node✅ Without explicit node
n7583369134004501 S n28838158159426985 n7583369134004501->n28838158159426985 n8194961290447935 a n28838158159426985->n8194961290447935 n013687140619343507 b n28838158159426985->n013687140619343507
n9623589181880099 S n8794472870353884 a n9623589181880099->n8794472870353884 n6843055658389234 b n9623589181880099->n6843055658389234

For unordered choice order of children doesn’t matter (but for concatentaion it’s important).

Kleene star

S -> "a"*;. Again we can show Kleene star as separate node with label "\ast" or not.

❌ With explicit node✅ Without explicit node
n32424227510607073 S n037393178665481974 n32424227510607073->n037393178665481974 n7011277415221888 a n037393178665481974->n7011277415221888
n4963040728435568 S n7055876796214136 a n4963040728435568->n7055876796214136

For all other operations situation is the same.

Named vs nameless nodes

Let’s take a look at this example A -> "a" "b"; S -> A | "c". In this case each node is named. But what about this example: S -> "a" "b" | "c". It is essentially the same grammar so it should have the same structure.

✅ Named nodes✅ Nameless nodes
n3026368644063595 S n2360257265126977 A n3026368644063595->n2360257265126977 n45528685656383283 c n3026368644063595->n45528685656383283 n5303332708394004 a n2360257265126977->n5303332708394004 n4420214963223408 b n2360257265126977->n4420214963223408
n8201362656297186 S n697352896589873 n8201362656297186->n697352896589873 n7534736867723253 c n8201362656297186->n7534736867723253 n7548964844409018 a n697352896589873->n7548964844409018 n23629370956078732 b n697352896589873->n23629370956078732

Binary vs N-arry operator

Concatentation is a binary operator (per se) so S -> "a" "b" "c"; can be represented as binary-tree (depending on associativity), or we can treat concatenation as N-arry operator.

❌ Binary left associative❌ Binary right associative✅ N-arry
n8008515092228263 S n04174413478113537 n8008515092228263->n04174413478113537 n6528821130527998 c n8008515092228263->n6528821130527998 n2251293669400789 a n04174413478113537->n2251293669400789 n14898783458426013 b n04174413478113537->n14898783458426013
n9849323985905296 S n34746746863513 a n9849323985905296->n34746746863513 n4084008056356516 n9849323985905296->n4084008056356516 n4688423559580479 b n4084008056356516->n4688423559580479 n03872821398784421 c n4084008056356516->n03872821398784421
n10526115858635832 S n09316439228723494 a n10526115858635832->n09316439228723494 n5859196758794063 b n10526115858635832->n5859196758794063 n0747713707155695 c n10526115858635832->n0747713707155695

General recursion

Classical example for recursion is S -> "" | S "a";.

n11703089481518747 S n3895187226479764 ϵ n11703089481518747->n3895187226479764 n19799433052563997 n11703089481518747->n19799433052563997 n43746899581915955 S n11703089481518747->n43746899581915955 n19799433052563997->n43746899581915955 n19503110025817239 a n19799433052563997->n19503110025817239

"ϵ\epsilon" is an empty string. Dotted line shows that this is cycle (loop) e.g. this is the same "SS" node. It’s drawn twice so it would be easier to understand the structure.

”LCRS” representation

Maybe it would make easier to understans as LCRS-tree, rather than DAG. For example: S -> "a" "b" "c";

As DAGAS LCRS-tree
n10526115858635832 S n09316439228723494 a n10526115858635832->n09316439228723494 n5859196758794063 b n10526115858635832->n5859196758794063 n0747713707155695 c n10526115858635832->n0747713707155695
n10526115858635832 S n09316439228723494 a n10526115858635832->n09316439228723494 n5859196758794063 b n09316439228723494->n5859196758794063 n0747713707155695 c n5859196758794063->n0747713707155695