Skip to content

Grammar

There are different ways to specify language, for example:

  • automata - if automata accpets string than it’s part of the language
  • regular expression
  • grmmar, for example
    • Chomsky generative grammar
    • BNF (EBNF, ABNF)
    • PEG
    • etc.
  • system of languge equations

Those all approaches equivalent. In this article we’re gonna focus on grammars.

Chomsky generative grammar

Chomsky generative grammar GG is a system used for generating strings over some alphabet. It is defined by a 4-tuple:

G=(N,Σ,P,S)G = (N, \Sigma, P, S)

Where:

  1. NN is a finite set of non-terminal symbols. These symbols are never present in the strings of the language, but are used in the production rules to generate those strings.

  2. Σ\Sigma is a finite set of terminal symbols. These symbols appear in the strings generated by the grammar and are distinct from the non-terminals, i.e., NΣ=N \cap \Sigma = \emptyset. The symbols in Σ\Sigma form the alphabet of the language generated by the grammar.

  3. PP is a finite set of production rules. Each production rule is of the form: αβ\alpha \rightarrow \beta Where α\alpha is a string of symbols that contains at least one non-terminal (i.e., α(NΣ)+\alpha \in (N \cup \Sigma)^+ and α\alpha contains at least one element from NN), and β\beta is a string of symbols from NΣN \cup \Sigma, i.e., β(NΣ)\beta \in (N \cup \Sigma)^*.

  4. SS is the start symbol, where SNS \in N. This is the symbol from which derivation or string generation starts.

A grammar specifies how to construct valid strings of the language. A string is considered part of the language if there exists some sequence of production rule applications (a derivation) that starts with the start symbol SS and results in that string.

Chomsky hierarchy

The Chomsky hierarchy classifies grammars into four levels based on their generative power:

  1. Type 0 - Recursively Enumerable (RE) Grammar

    • Production Rules: αβ\alpha \rightarrow \beta where α,β(NΣ)+\alpha, \beta \in (N \cup \Sigma)^+ and αβ|\alpha| \leq |\beta|
    • Language: Recursively Enumerable Language
    • Recognized by: Turing Machine
  2. Type 1 - Context-Sensitive Grammar (CSG)

    • Production Rules: αβ\alpha \rightarrow \beta where α,β(NΣ)+\alpha, \beta \in (N \cup \Sigma)^+ and αβ|\alpha| \leq |\beta|
    • Language: Context-Sensitive Language (CSL)
    • Recognized by: Linear Bounded Automaton (LBA)
  3. Type 2 - Context-Free Grammar (CFG)

    • Production Rules: AβA \rightarrow \beta where ANA \in N and β(NΣ)\beta \in (N \cup \Sigma)^*
    • Language: Context-Free Language (CFL)
    • Recognized by: Pushdown Automaton (PDA)
  4. Type 3 - Regular Grammar

    • Production Rules:
      • AaBA \rightarrow aB or AaA \rightarrow a or AϵA \rightarrow \epsilon for right-linear regular grammars.
      • ABaA \rightarrow Ba or AaA \rightarrow a or AϵA \rightarrow \epsilon for left-linear regular grammars.
    • Language: Regular Language
    • Recognized by: Finite Automaton (either Deterministic (DFA) or Nondeterministic (NFA))

However, this is a classical hierarchy proposed by Chomsky in 1956 in his paper “Three Models for the Description of Language.” Since then, it has been extended to capture more nuances.

Extended Chomsky hierarchy

  1. Linear grammar or Linear Context-Free Grammar
    • Is a Context-Free Grammar that has at most one non-terminal in the right-hand side of each of its productions
    • Examples: AaBA \rightarrow aB, ABaA \rightarrow Ba, AaBaA \rightarrow aBa
  2. Right-Linear grammar
    • Examples: AaBA \rightarrow aB or AaA \rightarrow a or AϵA \rightarrow \epsilon
  3. Left-Linear grammar
    • Examples: ABaA \rightarrow Ba or AaA \rightarrow a or AϵA \rightarrow \epsilon