# The grammar notation

Grammars can be written in a textual notation:A : "0" A "0" A : "1" A "1" A : B B : "0" B : "1" B :This grammar contains two

**nonterminals**,

`A`and

`B`, which both have three

**productions**.

**Terminals**can be written as quoted strings, such as

`"0"`(see also regular expressions below). The first nonterminal in a grammar is the

**start**nonterminal. Terminals and nonterminals on the right-hand side of a production are called

**entities**.

Multiple productions of the same nonterminal can be written in a shorter form:

A : "0" A "0" | "1" A "1" | B B : "0" | "1" |

Terminals can also be defined by **regular expressions**:

NUMERAL = [0-9]+ ... N : <NUMERAL>The regular expressions use the RegExp notation from dk.brics.automaton, except that character escaping can also be done with

`\uXXXX`and

`\n`notation (representing Unicode UTF-16 code blocks and special symbols as in Java).

`EOF`is a predefined expression that matches the empty string but only at end-of-file.

Note that these regular expressions do not define tokens - the formalism is scannerless.
However, a regular expression can be declared `MAX`^{*} which means that it only matches maximal substrings:

TEXT = [a-z]* (MAX)

**Comments** can be written as in Java:

// this is a one line comment /* this is a multi line comment */

Productions can be **labeled**:

A[zeros] : "0" A "0" [ones] | "1" A "1" [done] | B B[zero] : "0" [one] | "1" [epsilon] |These labels are used in syntax trees and in ambiguity analysis reports. If omitted, the productions are automatically labeled

`#1`,

`#2`, etc. for each nonterminal. The ambiguity analyzer by default skips vertical ambiguity checks of pairs of non-explicitly labeled productions, unless no productions at all have labels.

Nonterminal entities and regular expression entities can similarly be labeled:

A : "0" A[a] "0" | "1" A[a] "1" | B[b] B : "0" | "1" |Entities that are

*not*labeled are called

**ignorable**and are omitted from the syntax trees. (String entities are always ignorable.) However, dummy labels are assumed for all entities if the grammar contains no entity labels at all.

As an experimental feature,
two entities within the same production are **equality**^{*} entities if their labels are the same:

X : Y[q] Y[q]The parse trees of such equality entities must unparse to identical strings.

Productions can be **prioritized** using the `>` marker:

A : A1 | A2 >| A3 | A4In this case, the first two productions have higher priority than the latter two.

Productions can be **unordered**^{*} using the `&` marker:

A :& B C Dwhich means the same as

A : B C D | B D C | C B D | C D B | D B C | D C B

Ignorable nonterminal entities and regular expression entities can have **example strings**,
which are used in unparsing:

IF = [iI][fF] stm : <IF>["if"] exp stmThe example string must be in the language of the entity. If an example string is not provided for such an entity, the unparser picks a representative string from the language of the entity.

See also the "grammar for grammars" and the example grammars.

*

*The ambiguity analyzer currently does not support unordered productions, equality entities, and*

`MAX`regexps.