Token - one letter or a sequence of letters which is treated as one item (if there is tokenization step).
Alphabet (Σ - sigma) is a finite, non-empty set of tokens. Example: Σ={a,b}.
String (w) is a finite sequence of tokens chosen from Σ. Example: ababba.
Language (L) a (possibly infinite) set of strings. Example: L={a,aa,aaa}.
String length is a number of tokens in it: ∣aba∣=3.
String concatenation: w1w2. Example: ab⋅ba=abba. Alternative notations: w1⋅w2. In plain text space can be used: a b
Empty string: ε (epsilon). ∣ε∣=0, ∀w:wε=εw=w.
Language concatenation: L1L2={w1w2 | w1∈L1,w2∈L2} - Cartesian product ↗ of two sets. Example: {1,2}⋅{a,b,…}={1a,2a,1b,2b,…}.
String exponentiation: wk=ww…w (k times). Example: a3=aaa.
Language exponentiation: Lk=LL…L (k times). LL=L2, Lk=L(LK−1), L0={ε}. Example: {0,1}32 - all binary words of length of 32.
String reversal: wR. Example: (aabc)R=cbaa.
Language reversal: LR=wR | w∈L. Example: {ab,cd}R={ba,dc}.
Language union: L1∪L2={w | w∈L1 or w∈L2} - set union ↗. Other name: disjunction. Example: {a,b}∪{b,aa}={a,b,aa}.
Language intersection: L1∩L2={w | w∈L1 and w∈L2} - set intersection ↗. Other name: conjunction. Example: {a,b}∩{b,aa}={b}.
Language difference: L1∖L2={w | w∈L1 and w∈L2} - set difference ↗. Example: {a,b}∖{b,d}={a}.
Kleene closure: L∗=L0∪L1∪L2…, L+=L1∪L2∪L3…. Example: {a}∗={ε,a,aa,aaa…}, {a}+={a,aa,aaa…}
All finite strings (over alphabet Σ): Σ∗, ∀L:L⊆Σ∗. Note: strictly speking I didn’t provide definition how to exponentiate alphabet - just treat it as language with strings of length 1.
“Trivial” language: ϵ={ε}. ∀L:ϵL=Lϵ=L.
Empty language: ∅={}. ∅∗=ϵ, ∀L:∅L=L∅=∅.
Notations:
- ∀ - for all
- ∈ - in a set
- ∈ - not in a set