Skip to content

Grammar graph

Grammar can be interpreted as directed graph (tree in a trivial cases). There is always a starting point from each you start traversing and the same node would be a “root” of a parse tree.

Let’s explore examples to understand the concept.

Token

The simplest possible grammar: S -> "a";. Grammar tree for it would look like:

n6323348069246328 S n8563473828016634 a n6323348069246328->n8563473828016634

Concatenation

Concatenation of two tokens S -> "a" "b";. In this case we can consider concatentaion as a separate node with label "\cdot" or not. I prefer second version because it’s more compact:

❌ With explicit node✅ Without explicit node
n5442639512321714 S n9902588215667831 n5442639512321714->n9902588215667831 n4637618757019757 a n9902588215667831->n4637618757019757 n265550282380016 b n9902588215667831->n265550282380016
n36553348519931794 S n7407211999024093 a n36553348519931794->n7407211999024093 n9933975378990967 b n36553348519931794->n9933975378990967

Unordered choice

S -> "a" | "b". Again we can show unordered choice as separate node with label "\cup" or not.

❌ With explicit node✅ Without explicit node
n7799786619527815 S n14505927812328023 n7799786619527815->n14505927812328023 n571216663212303 a n14505927812328023->n571216663212303 n586957449510296 b n14505927812328023->n586957449510296
n8364278286160858 S n8801520155465694 a n8364278286160858->n8801520155465694 n10886501954671357 b n8364278286160858->n10886501954671357

For unordered choice order of children doesn’t matter (but for concatentaion it’s important).

Kleene star

S -> "a"*;. Again we can show Kleene star as separate node with label "\ast" or not.

❌ With explicit node✅ Without explicit node
n19140148546410352 S n5597958159432606 n19140148546410352->n5597958159432606 n563353813856879 a n5597958159432606->n563353813856879
n2974682386391698 S n7441071443432257 a n2974682386391698->n7441071443432257

For all other operations situation is the same.

Named vs nameless nodes

Let’s take a look at this example A -> "a" "b"; S -> A | "c". In this case each node is named. But what about this example: S -> "a" "b" | "c". It is essentially the same grammar so it should have the same structure.

✅ Named nodes✅ Nameless nodes
n255926113530881 S n2980793264848218 A n255926113530881->n2980793264848218 n852258982768241 c n255926113530881->n852258982768241 n33782097739588735 a n2980793264848218->n33782097739588735 n26796008215704736 b n2980793264848218->n26796008215704736
n1619042117540499 S n27330597561653613 n1619042117540499->n27330597561653613 n6086004784777388 c n1619042117540499->n6086004784777388 n1398410407826174 a n27330597561653613->n1398410407826174 n8054029661658992 b n27330597561653613->n8054029661658992

Binary vs N-arry operator

Concatentation is a binary operator (per se) so S -> "a" "b" "c"; can be represented as binary-tree (depending on associativity), or we can treat concatenation as N-arry operator.

❌ Binary left associative❌ Binary right associative✅ N-arry
n2502240604434758 S n9788763726822238 n2502240604434758->n9788763726822238 n20596474171411616 c n2502240604434758->n20596474171411616 n6710974862697519 a n9788763726822238->n6710974862697519 n29548557645646345 b n9788763726822238->n29548557645646345
n04342246512045156 S n07818604396738293 a n04342246512045156->n07818604396738293 n08918645583685514 n04342246512045156->n08918645583685514 n19298619810276096 b n08918645583685514->n19298619810276096 n8822988696756555 c n08918645583685514->n8822988696756555
n686754259159841 S n8023823671230372 a n686754259159841->n8023823671230372 n15275214735151077 b n686754259159841->n15275214735151077 n8072092317687527 c n686754259159841->n8072092317687527

General recursion

Classical example for recursion is S -> "" | S "a";.

n820474374266768 S n5006716812016372 ϵ n820474374266768->n5006716812016372 n7048608958408185 n820474374266768->n7048608958408185 n27989360924243134 S n820474374266768->n27989360924243134 n7048608958408185->n27989360924243134 n24405149580038232 a n7048608958408185->n24405149580038232

"ϵ\epsilon" is an empty string. Dotted line shows that this is cycle (loop) e.g. this is the same "SS" node. It’s drawn twice so it would be easier to understand the structure.

”LCRS” representation

Maybe it would make easier to understans as LCRS-tree, rather than DAG. For example: S -> "a" "b" "c";

As DAGAS LCRS-tree
n686754259159841 S n8023823671230372 a n686754259159841->n8023823671230372 n15275214735151077 b n686754259159841->n15275214735151077 n8072092317687527 c n686754259159841->n8072092317687527
n686754259159841 S n8023823671230372 a n686754259159841->n8023823671230372 n15275214735151077 b n8023823671230372->n15275214735151077 n8072092317687527 c n15275214735151077->n8072092317687527