Skip to content


Token - one letter or a sequence of letters which is treated as one item (if there is tokenization step).

Alphabet (Σ\Sigma - sigma) is a finite, non-empty set of tokens. Example: Σ={a,b}\Sigma = \{a, b\}.

String (ww) is a finite sequence of tokens chosen from Σ\Sigma. Example: ababbaababba.

Language (LL) a (possibly infinite) set of strings. Example: L={a,aa,aaa}L = \{a, aa, aaa\}.

String length is a number of tokens in it: aba=3|aba| = 3.

String concatenation: w1w2w_1w_2. Example: abba=abbaab \cdot ba = abba. Alternative notations: w1w2w_1 \cdot w_2. In plain text space can be used: a b

Empty string: ε\varepsilon (epsilon). ε=0|\varepsilon| = 0, w:wε=εw=w\forall w: w \varepsilon = \varepsilon w = w.

Language concatenation: L1L2={w1w2 | w1L1,w2L2}L_1L_2 = \{w_1w_2 \text{ | } w_1 \in L_1, w_2 \in L_2\} - Cartesian product of two sets. Example: {1,2}{a,b,}={1a,2a,1b,2b,}\{1,2\}\cdot\{a,b,\ldots\} = \{1a,2a,1b,2b,\ldots\}.

String exponentiation: wk=wwww^k = ww \ldots w (k times). Example: a3=aaaa^3 = aaa.

Language exponentiation: Lk=LLLL^k = LL \ldots L (k times). LL=L2LL = L^2, Lk=L(LK1)L^k = L(L^{K-1}), L0={ε}L^0 = \{\varepsilon\}. Example: {0,1}32\{0,1\}^{32} - all binary words of length of 32.

String reversal: wRw^R. Example: (aabc)R=cbaa(aabc)^R=cbaa.

Language reversal: LR=wR | wLL^R = {w^R \text{ | } w \in L}. Example: {ab,cd}R={ba,dc}\{ab, cd\}^R = \{ba, dc\}.

Language union: L1L2={w | wL1 or wL2}L_1 \cup L_2 = \{w \text{ | } w \in L_1 \text{ or } w \in L_2\} - set union. Other name: disjunction. Example: {a,b}{b,aa}={a,b,aa}\{a,b\} \cup \{b,aa\} = \{a,b,aa\}.

Language intersection: L1L2={w | wL1 and wL2}L_1 \cap L_2 = \{w \text{ | } w \in L_1 \text{ and } w \in L_2\} - set intersection. Other name: conjunction. Example: {a,b}{b,aa}={b}\{a,b\} \cap \{b,aa\} = \{b\}.

Language difference: L1L2={w | wL1 and w∉L2}L_1 \setminus L_2 = \{w \text{ | } w \in L_1 \text{ and } w \not{\in} L_2\} - set difference. Example: {a,b}{b,d}={a}\{a,b\} \setminus \{b,d\} = \{a\}.

Kleene closure: L=L0L1L2L^\ast = L^0 \cup L^1 \cup L^2 \ldots, L+=L1L2L3L^+ = L^1 \cup L^2 \cup L^3 \ldots. Example: {a}={ε,a,aa,aaa}\{a\}^\ast = \{\varepsilon,a,aa,aaa \ldots\}, {a}+={a,aa,aaa}\{a\}^+ = \{a,aa,aaa \ldots\}

All finite strings (over alphabet Σ\Sigma): Σ\Sigma^\ast, L:LΣ\forall L: L \subseteq \Sigma^\ast. Note: strictly speking I didn’t provide definition how to exponentiate alphabet - just treat it as language with strings of length 1.

“Trivial” language: ϵ={ε}\epsilon = \{\varepsilon\}. L:ϵL=Lϵ=L\forall L: \epsilon L = L \epsilon = L.

Empty language: ={}\emptyset = \{\}. =ϵ\emptyset^\ast = \epsilon, L:L=L=\forall L: \emptyset L = L \emptyset = \emptyset.


  • \forall - for all
  • \in - in a set
  • ∉\not{\in} - not in a set