Skip to content

Grammar graph

Grammar can be interpreted as directed graph (tree in a trivial cases). There is always a starting point from each you start traversing and the same node would be a “root” of a parse tree.

Let’s explore examples to understand the concept.

Token

The simplest possible grammar: S -> "a";. Grammar tree for it would look like:

n4780519420139364 S n7600004744529527 a n4780519420139364->n7600004744529527

Concatenation

Concatenation of two tokens S -> "a" "b";. In this case we can consider concatentaion as a separate node with label "\cdot" or not. I prefer second version because it’s more compact:

❌ With explicit node✅ Without explicit node
n31777237174119777 S n6426126692421048 n31777237174119777->n6426126692421048 n29079710489530064 a n6426126692421048->n29079710489530064 n8582399354112908 b n6426126692421048->n8582399354112908
n06159579458745701 S n4926247183260324 a n06159579458745701->n4926247183260324 n5329876531220505 b n06159579458745701->n5329876531220505

Unordered choice

S -> "a" | "b". Again we can show unordered choice as separate node with label "\cup" or not.

❌ With explicit node✅ Without explicit node
n32578077491515756 S n6057308439642972 n32578077491515756->n6057308439642972 n9963426302551659 a n6057308439642972->n9963426302551659 n0259291101766419 b n6057308439642972->n0259291101766419
n5035111553707665 S n682444143688719 a n5035111553707665->n682444143688719 n6051656392432236 b n5035111553707665->n6051656392432236

For unordered choice order of children doesn’t matter (but for concatentaion it’s important).

Kleene star

S -> "a"*;. Again we can show Kleene star as separate node with label "\ast" or not.

❌ With explicit node✅ Without explicit node
n12496012730337491 S n9378484015202035 n12496012730337491->n9378484015202035 n013265682816046231 a n9378484015202035->n013265682816046231
n8360828754352065 S n2517410858472915 a n8360828754352065->n2517410858472915

For all other operations situation is the same.

Named vs nameless nodes

Let’s take a look at this example A -> "a" "b"; S -> A | "c". In this case each node is named. But what about this example: S -> "a" "b" | "c". It is essentially the same grammar so it should have the same structure.

✅ Named nodes✅ Nameless nodes
n7937777410049291 S n899932062460578 A n7937777410049291->n899932062460578 n3891736917768729 c n7937777410049291->n3891736917768729 n12992379234595486 a n899932062460578->n12992379234595486 n32178530581242115 b n899932062460578->n32178530581242115
n4005223089921346 S n6372228667049937 n4005223089921346->n6372228667049937 n3772209103056883 c n4005223089921346->n3772209103056883 n1789694330474043 a n6372228667049937->n1789694330474043 n5471779859057893 b n6372228667049937->n5471779859057893

Binary vs N-arry operator

Concatentation is a binary operator (per se) so S -> "a" "b" "c"; can be represented as binary-tree (depending on associativity), or we can treat concatenation as N-arry operator.

❌ Binary left associative❌ Binary right associative✅ N-arry
n2873255805829471 S n6841641552347772 n2873255805829471->n6841641552347772 n31004350863133223 c n2873255805829471->n31004350863133223 n2352752708910506 a n6841641552347772->n2352752708910506 n2147796485295348 b n6841641552347772->n2147796485295348
n13089553031056855 S n4107595488357749 a n13089553031056855->n4107595488357749 n009122058322815318 n13089553031056855->n009122058322815318 n8762220117488859 b n009122058322815318->n8762220117488859 n3276618371212172 c n009122058322815318->n3276618371212172
n2341941691086895 S n09881496994417271 a n2341941691086895->n09881496994417271 n21145501216300921 b n2341941691086895->n21145501216300921 n13787715692542468 c n2341941691086895->n13787715692542468

General recursion

Classical example for recursion is S -> "" | S "a";.

n6901671597984473 S n15881499454410708 ϵ n6901671597984473->n15881499454410708 n3517327538850563 n6901671597984473->n3517327538850563 n633063677073634 S n6901671597984473->n633063677073634 n3517327538850563->n633063677073634 n33736945892188164 a n3517327538850563->n33736945892188164

"ϵ\epsilon" is an empty string. Dotted line shows that this is cycle (loop) e.g. this is the same "SS" node. It’s drawn twice so it would be easier to understand the structure.

”LCRS” representation

Maybe it would make easier to understans as LCRS-tree, rather than DAG. For example: S -> "a" "b" "c";

As DAGAS LCRS-tree
n2341941691086895 S n09881496994417271 a n2341941691086895->n09881496994417271 n21145501216300921 b n2341941691086895->n21145501216300921 n13787715692542468 c n2341941691086895->n13787715692542468
n2341941691086895 S n09881496994417271 a n2341941691086895->n09881496994417271 n21145501216300921 b n09881496994417271->n21145501216300921 n13787715692542468 c n21145501216300921->n13787715692542468