Ambiguity
a word or expression that can be understood in two or more possible ways
In regards to parsing ambiguity means that there is more than one parse tree for the same string.
For example:
For string 1 + 1 + 1
we can produce two parse trees:
Note: this grammar would work only for parser which can handle Left recursion and ambigious grammars, which is not always the case. For example, classical PEG can’t do either.
Why ambiguity is a problem
- You need only one interpretation of a string in order to create practical parser for programming language
- Ambigious parsing requires more computation power (higher time complexity) and more space. Number of possible parse trees grows according to Catalan Number ↗ (it grows very fast)
Dealing with ambiguity
SPPF
First of all we can take care about space. Instead of producing list of separate arrays we can re-use shared parts across different trees. This data structure is called SPPF.
Rewrite grammar
Second approach is to rewrite grammar to unambiguous version. But:
- this is manual process
- this will (most likely) change the shape of the tree - due to introfuction of new none-terminals
For example, grammar above can be changed to
As the result:
- PRO grammar takes into account precedence and associativity of operators, which makes it unambiguous
- CON parse tree became more convoluted
- CON grammar became more convoluted
We added only one level of precedence - MD (*
, /
) more than PM (+
, -
). Real-life languages can use 7 or more levels. Imagine how much more convoluted grammar and tree would become.
Disambiguating rules
Third approach is to add disambiguating rules on top of ambigious grammar. Typical example of such rules would be precedence and associativity. Something like this:
For example precedence and/or associativity supported by:
Other
Another well-known example of ambiguity is “dangling else”.