Skip to content

Regular languages

Often denoted as “REG”.

Regular languages correspond to:

  • Regular expressions (RE) and Regular expressions extended (REE)
  • Deteremenisitc finite automata (DFA) and Non-deteremenisitc finite automata (NFA)
  • Kleene algebra
  • Type 3 Chomsky grammar

Automata

There are a lot of algorithms to construct automata from Regular expressions:

Diagram taken from A taxonomy of finite automata construction algorithms, 1995

Other:

Regular expressions

RE

Regular expression can be defined recursively:

r::=αϵr1r2r1+r2r1(r1)r ::= \alpha | \epsilon | r_1 r_2 | r_1 + r_2 | r_1^* | (r_1)

Let assume that L(r)L(r) is languages defined by regulat expression rr then:

  • L(α)={α}L(\alpha) = \{ \alpha \}, where αΣ\alpha \in \Sigma and Σ\Sigma is alphabet of the language. Examples: L(a) = { "a" }, L(b) = { "b" }, etc.
  • L(ϵ)={ε}L(\epsilon) = \{ \varepsilon \}, where ε\varepsilon is empty string. Example: L(ϵ) = {""}
  • L(r1r2)=L(r1)×L(r2)L(r_1 r_2) = L(r_1) \times L(r_2), where L(r1)×L(r2)L(r_1) \times L(r_2) is cartesian product of sets. Examples: L(ab) = { "a" }×{ "b" } = { "ab" }, L(ba) = { "ba" }, etc.
  • L(r1+r2)=L(r1)L(r2)L(r_1 + r_2) = L(r_1) \cup L(r_2), where L(r1)L(r2)L(r_1) \cup L(r_2) is union of languages (union of sets). Examples: L(a + b) = { "a", "b" }, L(b + a) = { "b", "a" }, etc.
  • L(r1)=L(r1)L(r_1^*) = L(r_1)^*, where L(r1)L(r_1)^* is Kleene closure (star). Example, L(a*) = {"", "a", "aa", "aaa"...}
  • L((r1))=L(r1)L((r_1)) = L(r_1). Parentheses used for grouping
  • For convienience we can also define, L()={}L(\empty) = \{\}

Variation of notation:

REsetsChomskyKleene algebra
concatenationr1r2r_1 r_2r1×r2r_1 \times r_2r1r2r_1 r_2r1r2r_1 \cdot r_2
alternation (1)r1+r2r_1 + r_2r1r2r_1 \cup r_2r_1 | r_2r1+r2r_1 + r_2
empty string (2)ϵ\epsilon or ε\varepsilon{ε}\{ \varepsilon \}ϵ\epsilon or ε\varepsilon11
empty language\empty\empty or {}\{\}00
Kleene closurerr^*rr^*rr^*
  • (1) other name for alternation is unordered choice
  • (2) in older books they may use λ\lambda for empty string

REE

Extended regular expression can be defined recursively:

r::=αϵr1r2r1+r2r1(r1)r1&r2r1r ::= \alpha | \epsilon | r_1 r_2 | r_1 + r_2 | r_1^* | (r_1) | r_1 \& r_2 | r_1'

REE is the same as RE, but with two additonal operations intersection (&\&) and negation (r1r_1').

  • L(r1&r2)=L(r1)L(r2)L(r_1 \& r_2) = L(r_1) \cap L(r_2), where L(r1)L(r2)L(r_1) \cap L(r_2) is intersection of sets.
  • L(r1)=L(r1)L(r_1') = L(r_1)', where L(r1)L(r_1)' is absolute complement of a set.

Variation of notation:

REEsetsLogic
concatenationr1r2r_1 r_2r1×r2r_1 \times r_2
alternationr1+r2r_1 + r_2r1r2r_1 \cup r_2r1r2r_1 \lor r_2
empty stringϵ\epsilon or ε\varepsilon{ε}\{ \varepsilon \}
empty language\empty\empty or {}\{\}\bot
Kleene closurerr^*rr^*
intersectionr1&r2r_1 \& r_2r1r2r_1 \cap r_2r1r2r_1 \land r_2
negationrr'rr' or rcr^c or UrU \setminus r¬r\lnot r
universeΣ\Sigma^*UU\top

Modern syntactic sugar

Modern regular expressions “engines” support several more extensions for convenience (aka syntactic sugar):

  • “Kleene plus” in plain text notation r+ = rrr r^*
  • “Kleene question” in plain text notation r? = r+ϵr + \epsilon
  • Quantifiers
    • r{n} = rrn-times\underbrace{r \ldots r}_{n\text{-times}}
    • r{n,m} = rrn-times(r+ϵ)(r+ϵ)(mn)-times\underbrace{r \ldots r}_{n\text{-times}}\underbrace{(r+\epsilon) \ldots (r+\epsilon)}_{(m-n)\text{-times}}
    • r{n,} = rrn-timesr\underbrace{r \ldots r}_{n\text{-times}}r^*
    • r{,n} = (r+ϵ)(r+ϵ)n-times\underbrace{(r+\epsilon) \ldots (r+\epsilon)}_{n\text{-times}}
  • Any character: . = Σ\Sigma
  • Interval: [a-z] = a++za + \ldots + z
  • Set: [abc] which is the same as a|b|c = a+b+ca + b + c
  • Negation of set: [^abc] = Σ(a+b+c)\Sigma \setminus (a + b + c)
  • Char classes, for example:
    • \d is the same as [0-9] = 0++90 + \ldots + 9
    • \w, \s, \S etc.

RELA

See RE with lookahead

RE with backreferences

Some modern regular expressions “engines” may contain extensions which do not correspond to regular languages - see RE with backreferences.

References

Kleene algebra

A Kleene algebra is an algebraic structure

K=(Σ,+,,,0,1)K = (\Sigma, +, \cdot, {}^*, 0, 1)

satisfying the following equations and equational implications:

a+(b+c)=(a+b)+ca+b=b+aa+0=aa+a=aa(bc)=(ab)c1a=aa1=aa(b+c)=ab+ac(a+b)c=ac+bc0a=0a0=01+aaa1+aaab+axx    abxb+xax    bax\begin{align} a + (b + c) &= (a + b) + c \\ a + b &= b + a \\ a + 0 &= a \\ a + a &= a \\ a(bc) &= (ab)c \\ 1a &= a \\ a1 &= a \\ a(b + c) &= ab + ac \\ (a + b)c &= ac + bc \\ 0a &= 0 \\ a0 &= 0 \\ 1 + aa^* &\leq a^* \\ 1 + a^*a &\leq a^* \\ b + ax \leq x &\implies a^*b \leq x \\ b + xa \leq x &\implies ba^* \leq x \\ \end{align}

where \leq refers to the natural partial order on KK:

ab    a+b=ba \leq b \iff a + b = b
  • Axioms 1-4 say that (Σ,+,0)(\Sigma, +, 0) is an idempotent commutative monoid.
  • Axioms 5-7 say that (Σ,,1)(\Sigma, \cdot, 1) is a monoid.
  • Axioms 1-11 say that (Σ,+,,0,1)(\Sigma, +, \cdot, 0, 1) is an idempotent semiring.

References

Closure properties

TODO

Decidability

TODO

Pumping lemma

TODO