# Notation

Name | Math notation | Plain text | Other names | Other notations |
---|---|---|---|---|

Alphabet | $\Sigma$ | |||

Empty string | $\varepsilon$ | `""` | ||

Empty language | $\emptyset$ | set: $\{\}$, Br: $\phi$ | ||

Trivial language | $\epsilon$ | set: $\{\varepsilon\}$, RE: $\epsilon$, Br: $\lambda$ | ||

Singleton language | $\{a\}$ | RE: $a$ | ||

Symbol | $X$ | `X` | Variable, Non-terminal | |

Token | $a$ | `"a"` | Letter, Terminal | |

Rule | $S \rightarrow \ldots$ | `S -> ...` | Language equation | BNF: `S := ...` , PEG: `S <- ...` |

Concatenation | $XY$ | `X Y` | sequence, cartesian p. | $X \cdot Y$, Might: $X \circ Y$, set: $X \times Y$ |

Unordered choice | $X \vert Y$ | X|Y | union, disjunction | RE: $X+Y$, set: $X \cup Y$ |

Kleene star | $X^\ast$ | `X*` | Kleene closure | Might: $X \star$ |

Kleene plus | `X+` | $X \cdot X^*$, `X{1,}` | ||

Optional | `X?` | $X \vert \epsilon$, `X{0,1}` | ||

Quantifier | `X{n,m}` , `X{n,}` , `X{n}` | |||

Any character | $\Sigma$ | `.` | ||

Intersection | $X \cap Y$ | conjuction | Br: $X \& Y$ | |

Negation | $X^\prime$ | absolute complement | set: $\Sigma^* \setminus X$, $!X$, $L^c$ | |

Positive lookahead | $(\&X) Y$ | `&X Y` | ||

Negative lookahead | $(!X) Y$ | `!X Y` | ||

Range | `[a-z]` | |||

Set of characters | $a \vert b \vert c$ | `[abc]` | set: $\{a, b, c\}$ | |

Negation of set | $\Sigma \setminus (a \vert b)$ | `[^ab]` | ||

Escape sequences | `"\n"` | |||

Ordered choice | $X / Y$ | `X / Y` | ||

End of file | `!.` |

## Examples

There are a lot of different formalisms about languges and they all use slightly different notations, but in the end it’s the same thing.

### Regular Expressions

Regular expressions doesn’t use symbols and rules to define language. There is only one rule - regular expression. Classical regular expressions use only tokens, concatenation, unordered choice and Kleene star. Also in science papers they don’t bother to put quotes around letters - this way notation is more terse.

For example

RE | Language |
---|---|

$\epsilon$ | `{""}` |

$a$ | `{"a"}` |

$ab$ | `{"ab"}` |

$a \vert b$ | `{"a", "b"}` |

$a^*$ | `{"", "a", "aa", ...}` |

$(a \vert b)c$ | `{"ac", "bc"}` |

### Regular Expressions Extended

Same as classical regular expressions, but also adds intersection and negation (which makes it full set of Boolean operations). Those two operations doesn’t add to calculation power (still can produce only regular languages).

See: Janusz A. Brzozowski. 1964. Derivatives of Regular Expressions. J. ACM 11, 4 (Oct. 1964), 481–494. https://doi.org/10.1145/321239.321249

### Modern regular expressions

Modern regular expressions add some more tricks, like: Kleene plus, Optional, Quantifier, Any character, Range, Set of characters, Negation of set, Escape sequences. Depends on the flavor (implementation). Those, again, doesn’t add to computational power, but they are quite convenient.

### Chomsky generative grammar

Chomsky grammar uses: symbols (non-terminals), rules, tokens (terminals), concatenation, unordered choice.

Chomsky | Language |
---|---|

$S \rightarrow \epsilon$ | `{""}` |

$S \rightarrow a$ | `{"a"}` |

$S \rightarrow ab$ | `{"ab"}` |

$S \rightarrow a \vert b$ | `{"a", "b"}` |

$S \rightarrow Sa \vert \epsilon$ | `{"", "a", "aa", ...}` |

$X \rightarrow a \vert b; S \rightarrow X c$ | `{"ac", "bc"}` |

Becuase "$\vert$" is unordered choice those are the same rules:

$S_1 \rightarrow S_1 a \vert \epsilon \\ S_2 \rightarrow \epsilon \vert S_2 a \\ S_3 \rightarrow \epsilon \\ S_3 \rightarrow S_3 a$Depending on which restrictions you put on the rules it can genrate: regular, context-free, context-sensitive, recursive enumerable languages. This formalism is used to define Chomsky hierarchy.

### Backus–Naur form

BNF is variation of Chomsky grammar which can generate only context-free languages. There are a lot of variations which also add: Kleene plus, Optional, Quantifier, Any character, Range, Set of characters, Negation of set, Escape sequences. Which doesn’t add anything to computational power, but makes it more practical. BNF looks like this

### Parsing Expression Grammars

PEG again can be treated as variation of Chomsky grammar, but instead of unoredered choice it uses ordered chocie and also adds positive lookahead, negative lookahead and any character. PEG can recognize some context-sensitive grammars. PEG looks like this:

$A \leftarrow a A b / \epsilon \\ B \leftarrow b B c / \epsilon \\ S \leftarrow \&(A !b) a^\ast B!.$### Parsing with Derivatives

PwD adds to classical regular expressions: rules and symbols (which allows to do general recursion). PwD can handle context-free languages. You can express Kleene closure in three ways:

$S_1 \rightarrow Sa \vert \epsilon \\ S_2 \rightarrow aS \vert \epsilon \\ S_3 \rightarrow a^* \\$You can as well say that they added Kleene star to Chomsky grammar.

### Conjuctive grammar

It adds intersection to Chomsky grammar. It can handle some context-sensitive languages. It looks like this (in original paper they use $\&$ instead of $\cap$):

$A \rightarrow Aa \vert \epsilon \\ C \rightarrow Cc \vert \epsilon \\ X \rightarrow a A b \vert \epsilon \\ Y \rightarrow b B c \vert \epsilon \\ S \rightarrow XC \cap AY$### Boolean grammar

It adds intersection and negationto Chomsky grammar. It can handle some context-sensitive languages.

**Interesting**: in PwD they added general recursion to classical regular expressions to recieve context-free Chomsky grammar. If we add general recursion to REE we will recieve Boolean grammar.

### Regular Expressions with Lookahead

REwLA adds positive and negative lookaheads to classical regular expressions.

**Interesting**: in PwD they added general recursion to classical regular expressions to recieve context-free Chomsky grammar. If we add general recursion to REwLA we will recieve PEG. And ordered choice can be expressed as $A / B = A \vert (!A)B$.

## History

There is an interesting story behind how notation changed over time. See this video

**1951**Stephen Kleene develops regular expressions to describe McCulloch-Pitts (1943) nerve nets (uses`∨`

for choice; considers postfix`∗`

, but decides to make it a binary operator to avoid having empty strings:`x∗y`

means any number of copies of`x`

, followed by`y`

).**1956**Journal publication of Kleene’s technical report: binary`∗`

only.**1958**Copi, Elgot, and Wright formulate REs using`·`

and`∨`

and postfix`∗`

.**1962**Janusz Brzozowki uses binary`+`

for`∨`

and introduces postfix`+`

.**1968**Ken Thompson’s paper “Regular Expression Search Algorithm” uses`|`

.**1973**Thompson creates`grep`

from`ed`

editor for use by Doug McIlroy.**1975**Alfred Aho creates`egrep`

(includes`( )`

,`|`

,`*`

,`+`

,`?`

).**1978**CMU Alphard project uses regular expressions with`*`

,`+`

, and`#`

.**1981**CMU FEG and IDL use regular expressions with`*`

,`+`

, and`?`

.

## Operations concept map

**TODO**: I need to update it and explain abbreviations.