
# Defining a parser

The file `lalr.scm` declares a macro called `lalr-parser` :

```scheme
(lalr-parser [options] tokens rules ...)
```

To use this macro, you must first load `lalr.scm` in your Scheme
system using either `load` or the `include` special form.


This macro, when given appropriate arguments, generates an LALR(1)
syntax analyzer.  The macro accepts at least two arguments. The first
is a list of symbols which represent the terminal symbols of the
grammar. The remaining arguments are the grammar production rules.


# Running the parser

The parser generated by the `lalr-parser` macro is a function that
takes two parameters. The first parameter is a lexical analyzer while
the second is an error procedure.

The lexical analyzer is zero-argument function (a thunk)
invoked each time the parser needs to look-ahead in the token stream.
A token is usually a pair whose `car` is the symbol corresponding to
the token (the same symbol as used in the grammar definition). The
`cdr` of the pair is the semantic value associated with the token. For
example, a string token would have the `car` set to `'string`
while the `cdr` is set to the string value `"hello"`.

Once the end of file is encountered, the lexical analyzer must always
return the symbol `'*eoi*` each time it is invoked.

The error procedure must be a function that accepts at least two
parameters.

# The grammar format

The grammar is specified by first giving the list of terminals and the
list of non-terminal definitions. Each non-terminal definition
is a list where the first element is the non-terminal and the other
elements are the right-hand sides (lists of grammar symbols). In
addition to this, each rhs can be followed by a semantic action.

For example, consider the following (yacc) grammar for a very simple
expression language:

```yacc
   e : e '+' t
     | e '-' t
     | t
     ;
   t : t '*' f
     : t '/' f
     | f
     ;
   f : ID
     ;
```

The same grammar, written for the scheme parser generator, would look
like this (with semantic actions)

```scheme
(define expr-parser
  (lalr-parser
    ; Terminal symbols
    (ID + - * /)
    ; Productions
    (e (e + t)    : (+ $1 $3)
       (e - t)    : (- $1 $3)
       (t)        : $1)
    (t (t * f)    : (* $1 $3)
       (t / f)    : (/ $1 $3)
       (f)        : $1)
    (f (ID)       : $1)))
```

In semantic actions, the symbol `$n` refers to the synthesized
attribute value of the *nth* symbol in the production. The value
associated with the non-terminal on the left is the result of
evaluating the semantic action (it defaults to `#f`).

# Operator precedence and associativity

The above grammar implicitly handles operator precedences. It is also
possible to explicitly assign precedences and associativity to
terminal symbols and productions *à la* Yacc. Here is a modified
(and augmented) version of the grammar:

```scheme
(define expr-parser
  (lalr-parser
   ;; Terminal symbols
   (ID
    (left: + -)
    (left: * /)
    (nonassoc: uminus))
   (e (e + e)              : (+ $1 $3)
      (e - e)              : (- $1 $3)
      (e * e)              : (* $1 $3)
      (e / e)              : (/ $1 $3)
      (- e (prec: uminus)) : (- $2)
      (ID)                 : $1)))
```

The `left:` directive is used to specify a set of left-associative
operators of the same precedence level, the `right:` directive for
right-associative operators, and `nonassoc:` for operators that
are not associative. Note the use of the (apparently) useless
terminal `uminus`. It is only defined in order to assign to the
penultimate rule a precedence level higher than that of `*` and  `/`.
The `prec:` directive can only appear as the last element of a
rule. Finally, note that precedence levels are incremented from
left to right, i.e. the precedence level of `+` and `-` is less
than the precedence level of `*` and `/` since the former appear
first in the list of terminal symbols (token definitions).

# Options

The following options are available.

- `(output: name filename)` - copies the parser to the given
   file. The parser is given the name `name`.
- `(out-tables: filename)`  - outputs the parsing tables in
   `filename` in a more readable format
- `(expect:  n)` - don't warn about conflits if there are
   `n` or less conflicts.

# Error recovery

`lalr-scm` implements a very simple error recovery strategy. A production can
be of the form

```scheme
   (rulename
      ...
      (error TERMINAL) : action-code
   )
```

(There can be several such productions for a single rulename.) This will cause
the parser to skip all the tokens produced by the lexer that are different
than the given `TERMINAL`. For a C-like language, one can synchronize on
semicolons and closing curly brackets by writing error rules like these:

```scheme
   (stmt
      (expression SEMICOLON) : ...
      (LBRACKET stmt RBRACKET) : ...
      (error SEMICOLON)
      (error RBRACKET))
```

#  A final note on conflict resolution

Conflicts in the grammar are handled in a conventional way.
In the absence of precedence directives,
Shift/Reduce conflicts are resolved by shifting, and Reduce/Reduce
conflicts are resolved by choosing the rule listed first in the
grammar definition.

You can print the states of the generated parser by evaluating
`(print-states)`. The format of the output is similar to the one
produced by bison when given the `-v` command-line option.
