Parsing

# Definitions

Token - one letter or a sequence of letters which is treated as one item (if there is tokenization step).

Alphabet ($\Sigma$ - sigma) is a finite, non-empty set of tokens. Example: $\Sigma = \{a, b\}$.

String ($w$) is a finite sequence of tokens chosen from $\Sigma$. Example: $ababba$.

Language ($L$) a (possibly infinite) set of strings. Example: $L = \{a, aa, aaa\}$.

String length is a number of tokens in it: $|aba| = 3$.

String concatenation: $w_1w_2$. Example: $ab \cdot ba = abba$. Alternative notations: $w_1 \cdot w_2$. In plain text space can be used: a b

Empty string: $\varepsilon$ (epsilon). $|\varepsilon| = 0$, $\forall w: w \varepsilon = \varepsilon w = w$.

Language concatenation: $L_1L_2 = \{w_1w_2 \text{ | } w_1 \in L_1, w_2 \in L_2\}$ - Cartesian product of two sets. Example: $\{1,2\}\cdot\{a,b,\ldots\} = \{1a,2a,1b,2b,\ldots\}$.

String exponentiation: $w^k = ww \ldots w$ (k times). Example: $a^3 = aaa$.

Language exponentiation: $L^k = LL \ldots L$ (k times). $LL = L^2$, $L^k = L(L^{K-1})$, $L^0 = \{\varepsilon\}$. Example: $\{0,1\}^{32}$ - all binary words of length of 32.

String reversal: $w^R$. Example: $(aabc)^R=cbaa$.

Language reversal: $L^R = {w^R \text{ | } w \in L}$. Example: $\{ab, cd\}^R = \{ba, dc\}$.

Language union: $L_1 \cup L_2 = \{w \text{ | } w \in L_1 \text{ or } w \in L_2\}$ - set union. Other name: disjunction. Example: $\{a,b\} \cup \{b,aa\} = \{a,b,aa\}$.

Language intersection: $L_1 \cap L_2 = \{w \text{ | } w \in L_1 \text{ and } w \in L_2\}$ - set intersection. Other name: conjunction. Example: $\{a,b\} \cap \{b,aa\} = \{b\}$.

Language difference: $L_1 \setminus L_2 = \{w \text{ | } w \in L_1 \text{ and } w \not{\in} L_2\}$ - set difference. Example: $\{a,b\} \setminus \{b,d\} = \{a\}$.

Kleene closure: $L^\ast = L^0 \cup L^1 \cup L^2 \ldots$, $L^+ = L^1 \cup L^2 \cup L^3 \ldots$. Example: $\{a\}^\ast = \{\varepsilon,a,aa,aaa \ldots\}$, $\{a\}^+ = \{a,aa,aaa \ldots\}$

All finite strings (over alphabet $\Sigma$): $\Sigma^\ast$, $\forall L: L \subseteq \Sigma^\ast$. Note: strictly speking I didn’t provide definition how to exponentiate alphabet - just treat it as language with strings of length 1.

“Trivial” language: $\epsilon = \{\varepsilon\}$. $\forall L: \epsilon L = L \epsilon = L$.

Empty language: $\emptyset = \{\}$. $\emptyset^\ast = \epsilon$, $\forall L: \emptyset L = L \emptyset = \emptyset$.

Notations:

• $\forall$ - for all
• $\in$ - in a set
• $\not{\in}$ - not in a set