Grammar
Chomsky generative grammar
Chomsky generative grammar is a system used for generating strings over some alphabet. It is defined by a 4-tuple:
Where:
-
is a finite set of non-terminal symbols. These symbols are never present in the strings of the language, but are used in the production rules to generate those strings.
-
is a finite set of terminal symbols. These symbols appear in the strings generated by the grammar and are distinct from the non-terminals, i.e., . The symbols in form the alphabet of the language generated by the grammar.
-
is a finite set of production rules. Each production rule is of the form: Where is a string of symbols that contains at least one non-terminal (i.e., and contains at least one element from ), and is a string of symbols from , i.e., .
-
is the start symbol, where . This is the symbol from which derivation or string generation starts.
A grammar specifies how to construct valid strings of the language. A string is considered part of the language if there exists some sequence of production rule applications (a derivation) that starts with the start symbol and results in that string.
Chomsky hierarchy
The Chomsky hierarchy classifies grammars into four levels based on their generative power:
-
Type 0 - Recursively Enumerable (RE) Grammar
- Production Rules: where and
- Language: Recursively Enumerable Language
- Recognized by: Turing Machine
-
Type 1 - Context-Sensitive Grammar (CSG)
- Production Rules: where and
- Language: Context-Sensitive Language (CSL)
- Recognized by: Linear Bounded Automaton (LBA)
-
Type 2 - Context-Free Grammar (CFG)
- Production Rules: where and
- Language: Context-Free Language (CFL)
- Recognized by: Pushdown Automaton (PDA)
-
Type 3 - Regular Grammar
- Production Rules:
- or or for right-linear regular grammars.
- or or for left-linear regular grammars.
- Language: Regular Language
- Recognized by: Finite Automaton (either Deterministic (DFA) or Nondeterministic (FA))
- Production Rules:
However, this is a classical hierarchy proposed by Chomsky in 1956 in his paper “Three Models for the Description of Language.” Since then, it has been extended to capture more nuances.
Extended Chomsky hierarchy
- Linear grammar or Linear Context-Free grammar
- Is a Context-Free Grammar that has at most one non-terminal in the right-hand side of each of its productions
- Examples: , ,
- Right-Linear grammar
- Examples: or or
- Left-Linear grammar
- Examples: or or
- Deterministic Context Free
- A language is deterministic context-free (DCF) iff accepted by a determensitic pushdown automaton (DPDA)
- Example: “balanced” language
- Closed under negation?
- (Non-deterministic) Context Free
- A language is context-free (CF) iff accepted by a (non-determensitic) pushdown automaton (PDA)
- Example: set of all palindromes
- Indexed language
- A language is indexed iff accepted by a nested stack automaton (NSA)
- A language is indexed iff produced by a indexed grammar ↗
- Context-sensitive language
- A language is context-sensitive iff accepted by a non-determensitic TM with linearly bounded memory (LBA)
- A language is context-sensitive iff generated by a context-sensitive grammar
- Closed under negation?