Skip to content

Grammar graph

Grammar can be interpreted as directed graph (tree in a trivial cases). There is always a starting point from each you start traversing and the same node would be a “root” of a parse tree.

Let’s explore examples to understand the concept.

Token

The simplest possible grammar: S -> "a";. Grammar tree for it would look like:

n8921194238452024 S n6722614495857564 a n8921194238452024->n6722614495857564

Concatenation

Concatenation of two tokens S -> "a" "b";. In this case we can consider concatentaion as a separate node with label "\cdot" or not. I prefer second version because it’s more compact:

❌ With explicit node✅ Without explicit node
n7591190372099177 S n5085643930235768 n7591190372099177->n5085643930235768 n5660360940094078 a n5085643930235768->n5660360940094078 n6116686858855926 b n5085643930235768->n6116686858855926
n7753889280461777 S n6520488001328095 a n7753889280461777->n6520488001328095 n6377343997124383 b n7753889280461777->n6377343997124383

Unordered choice

S -> "a" | "b". Again we can show unordered choice as separate node with label "\cup" or not.

❌ With explicit node✅ Without explicit node
n978556781311595 S n01635994105147298 n978556781311595->n01635994105147298 n8979602425063267 a n01635994105147298->n8979602425063267 n8291911466882995 b n01635994105147298->n8291911466882995
n9749093414597456 S n785066164099053 a n9749093414597456->n785066164099053 n48621922861316835 b n9749093414597456->n48621922861316835

For unordered choice order of children doesn’t matter (but for concatentaion it’s important).

Kleene star

S -> "a"*;. Again we can show Kleene star as separate node with label "\ast" or not.

❌ With explicit node✅ Without explicit node
n7000061756859932 S n07220175832158904 n7000061756859932->n07220175832158904 n9204291042830439 a n07220175832158904->n9204291042830439
n3417251507116832 S n5475591377563569 a n3417251507116832->n5475591377563569

For all other operations situation is the same.

Named vs nameless nodes

Let’s take a look at this example A -> "a" "b"; S -> A | "c". In this case each node is named. But what about this example: S -> "a" "b" | "c". It is essentially the same grammar so it should have the same structure.

✅ Named nodes✅ Nameless nodes
n9598178299418272 S n03358702664259572 A n9598178299418272->n03358702664259572 n48419835712815273 c n9598178299418272->n48419835712815273 n34651155938538225 a n03358702664259572->n34651155938538225 n6100191797438563 b n03358702664259572->n6100191797438563
n9200274388098262 S n8179342826545957 n9200274388098262->n8179342826545957 n6969727676320634 c n9200274388098262->n6969727676320634 n5852173098966 a n8179342826545957->n5852173098966 n983914855311137 b n8179342826545957->n983914855311137

Binary vs N-arry operator

Concatentation is a binary operator (per se) so S -> "a" "b" "c"; can be represented as binary-tree (depending on associativity), or we can treat concatenation as N-arry operator.

❌ Binary left associative❌ Binary right associative✅ N-arry
n7338257840434075 S n8400245710882595 n7338257840434075->n8400245710882595 n731951148948057 c n7338257840434075->n731951148948057 n2612168793406757 a n8400245710882595->n2612168793406757 n5169664494543562 b n8400245710882595->n5169664494543562
n13338383015278366 S n17450703460407913 a n13338383015278366->n17450703460407913 n643073029325139 n13338383015278366->n643073029325139 n7230769866064406 b n643073029325139->n7230769866064406 n39936042340481714 c n643073029325139->n39936042340481714
n2849584692568534 S n8755318475591227 a n2849584692568534->n8755318475591227 n3038187883360499 b n2849584692568534->n3038187883360499 n843906144850191 c n2849584692568534->n843906144850191

General recursion

Classical example for recursion is S -> "" | S "a";.

n4023620896395641 S n24359841443456043 ϵ n4023620896395641->n24359841443456043 n5051023128229557 n4023620896395641->n5051023128229557 n9088705941669604 S n4023620896395641->n9088705941669604 n5051023128229557->n9088705941669604 n3605846400157917 a n5051023128229557->n3605846400157917

"ϵ\epsilon" is an empty string. Dotted line shows that this is cycle (loop) e.g. this is the same "SS" node. It’s drawn twice so it would be easier to understand the structure.

”LCRS” representation

Maybe it would make easier to understans as LCRS-tree, rather than DAG. For example: S -> "a" "b" "c";

As DAGAS LCRS-tree
n2849584692568534 S n8755318475591227 a n2849584692568534->n8755318475591227 n3038187883360499 b n2849584692568534->n3038187883360499 n843906144850191 c n2849584692568534->n843906144850191
n2849584692568534 S n8755318475591227 a n2849584692568534->n8755318475591227 n3038187883360499 b n8755318475591227->n3038187883360499 n843906144850191 c n3038187883360499->n843906144850191