Regular languages
Often denoted as “REG”.
Regular languages correspond to:
- Regular expressions (RE) and Regular expressions extended (REE)
- Deteremenisitc finite automata (DFA) and Non-deteremenisitc finite automata (FA)
- Kleene algebra
- Hence Brzozowski derivative
- Type 3 Chomsky grammar
Automata
There are a lot of algorithms to construct automata from Regular expressions:
Diagram taken from A taxonomy of finite automata construction algorithms, 1995 ↗
- Myhill-Nerode construction, 1957-1959
- MYHILL, J. “Finite automata and the representation of events,” WADD TR-57-624, pp. 112-137, Wright Patterson AFB, Ohio, 1957
- NERODE, A. “Linear automaton transformations,” Proc. AM8 9: 541-544, 1958
- RABIN, M.O AND D. SCOTT. “Finite automata and their decision problems,” IBM J. Res. 3(2): 115-125, 1959
- McNaughton-Yamada-Glushkov construction, 1960-1961
- McNAUGHTON, R. AND H. YAMADA. “Regular expressions and state graphs for automata,” IEEE Trans. on Electronic Computers 9(1): 39-47, 1960.
- GLUSHKOV, V.M. “The abstract theory of automata,” Russian Mathematical Surveys 16: 1-53, 1961
- An optimal parallel algorithm to convert a regular expression into its Glushkov automaton, 1999 ↗
- Characterization of Glushkov automata, 2000 ↗
- Brzozowski construction, 1964
- Thompson’s construction, 1968
- THOMPSON, K. “Regular expression search algorithms,” C. ACM 11(6): 419-422, 1968
- DeRemer’s construction, 1974
- DEREMER, F.L. “Lexical analysis,” in Compiler Construction: an Advanced Course, (F.L. Bauer and J. Eickel, eds.), pp. 109-120, Lecture Notes in Computer Science 21, Springer-Verlag, Berlin, 1974
- Berry-Sethi construction, 1986
- BERRY, G. AND R. SETHI. “From regular expressions to deterministic automata,” Theoretical Computer Science, 48: 117-126, 1986.
- Aho-Sethi-Ullman DFA construction, 1986
- AHO, A.V., R. SETHI, AND J.D. ULLMAN. Compilers: Principles, Techniques, and Tools, Addison-Wesley Publishing Co., Reading, M.A., 1988
- Antimirov construction, 1996
- Follow automata, 2003 ↗
- Tree automata, 2011 ↗
- Counting-set automata, 2020 ↗
- Bit vector automata, 2023 ↗
Other:
- A simple way to construct NFA with fewer states and transitions, 2004 ↗
- Bitwise Data Parallelism in Regular Expression Matching, 2014 ↗
- PaREM: A Novel Approach for Parallel Regular Expression Matching, 2014 ↗
Regular expressions
RE
Regular expression can be defined recursively:
Let assume that is languages defined by regulat expression then:
- , where and is alphabet of the language. Examples:
L(a) = { "a" }
,L(b) = { "b" }
, etc. - , where is empty string. Example:
L(ϵ) = {""}
- , where is cartesian product of sets. Examples:
L(ab) = { "a" }×{ "b" } = { "ab" }
,L(ba) = { "ba" }
, etc. - , where is union of languages (union of sets). Examples:
L(a + b) = { "a", "b" }
,L(b + a) = { "b", "a" }
, etc. - , where is Kleene closure (star). Example,
L(a*) = {"", "a", "aa", "aaa"...}
- . Parentheses used for grouping
- For convienience we can also define,
Variation of notation:
RE | sets | Chomsky | Kleene algebra | |
---|---|---|---|---|
concatenation | ||||
alternation (1) | r_1 | r_2 | |||
empty string (2) | or | or | ||
empty language | or | |||
Kleene closure |
- (1) other name for alternation is unordered choice
- (2) in older books they may use for empty string
REE
Extended regular expression can be defined recursively:
REE is the same as RE, but with two additonal operations intersection () and negation ().
- , where is intersection of sets.
- , where is absolute complement of a set.
Variation of notation:
REE | sets | Logic | |
---|---|---|---|
concatenation | |||
alternation | |||
empty string | or | ||
empty language | or | ||
Kleene closure | |||
intersection | |||
negation | or or | ||
universe |
Modern syntactic sugar
Modern regular expressions “engines” support several more extensions for convenience (aka syntactic sugar):
- “Kleene plus” in plain text notation
r+
= - “Kleene question” in plain text notation
r?
= - Quantifiers
r{n}
=r{n,m}
=r{n,}
=r{,n}
=
- Any character:
.
= - Interval:
[a-z]
= - Set:
[abc]
which is the same asa|b|c
= - Negation of set:
[^abc]
= - Char classes, for example:
\d
is the same as[0-9]
=\w
,\s
,\S
etc.
REwLA
RE with backreferences
Some modern regular expressions “engines” may contain extensions which do not correspond to regular languages - see RE with backreferences.
Related
- Brzozowski derivative
- Antimirov partial derivative
References
- CS4114 Formal Languages Spring 2021, Chapter 4 Regular Languages ↗
- Foundations of Computation (Critchlow and Eck), 3.2: Regular Expressions ↗
Kleene algebra
A Kleene algebra is an algebraic structure
satisfying the following equations and equational implications:
where refers to the natural partial order on :
- Axioms 1-4 say that is an idempotent commutative monoid.
- Axioms 5-7 say that is a monoid.
- Axioms 1-11 say that is an idempotent semiring.
References
- A Completeness Theorem for Kleene Algebras and the Algebra of Regular Events ↗
- KLEENE ALGEBRAS: THE ALGEBRA OF REGULAR EXPRESSIONS ↗
Closure properties
TODO
Decidability
TODO
Pumping lemma
TODO