Parsing Expressions | Vihan Bhargava

So you’re writing a parser? Expressions are probably one of the more difficult items you’ll come upon to parse but hopefully this article will explain how to handle them.

So here’s the grammar from my The grammar to end all grammars! post:

\begin{array}{rl} E \rightarrow & E_\alpha;E_\beta \\ E_\alpha \rightarrow & (;E;) \\ & \gamma \\ & \sigma_{pre};E \\ E_\beta \rightarrow& \sigma_{in};E\\ & \sigma_{post};\\ & \epsilon \end{array}

$\sigma$ represents an expression “item”. What is an expression item? I’ll explain

So an expression item is basically what goes inside the expression, this can be any of your literals, for example I might define $\gamma$ as:

\begin{aligned} N &\in \lbrace{E\rbrace} \\ \Sigma &\in \gamma \cup \sigma \\ S &\in \sigma \end{aligned}

This would work for a pretty standard expression allowing for basic literals but what if you want each item to be allowed to have certain suffixes/prefixes? What do I mean by a “suffix” or “prefix”? I mean:

foo + bar? + baz

Where ? could be applied upon any expression item:

foo + (bar + baz)?

Now this may seem completely unnecessary for your parser but realize these are what properties are:

foo.bar.baz

The variable is “foo” but it’s suffix is .baz.bar. Having . as an operator would make no sense as;

. is not an operator
. does not allow entire expressions between them, 1 + 1 . 1 + 1 makes no sense.

Here’s a typical C-like expression item written out:

\begin{aligned} \gamma &= \text{Terminal literals} \\ \sigma &= \begin{cases} \begin{array}{ll} a_{pre} &= \text{Prefix operators} \\ a_{in} &= \text{Infix operators} \\ a_{post} &= \text{Postfix operators} \end{array} \end{cases} \\ \end{aligned}

Where $\rho_{\beta}$ is the suffix and $\rho_{\alpha}$ is the root. In this case you’d want to remove the $(;E;)$ from the $E_\alpha$ rule

So hopefully that made sense and cleared things up. You’ll want to allow whitespace between all the above tokens listed. (You usually want to allow whitespace between all tokens in a non-terminal rule).