 |
<Formats> |
[
Introduction |
Syntax |
Semantics |
String Matching |
File Scanning
]
<bigwig> has built-in support for regular expressions which can be defined and used for various purposes:
Regular expressions are declared using the format construct. The syntax and semantics of regular expression formats is described below.
- id (format reference)
- This rule is for referencing regular expressions defined by other
formats. The effect of referring to another regular expression is as
if the expression had been written on the spot directly. Due to the
non-recursive nature of regular expressions, these id's may not be
used recursively. The same thing goes for mutual recursion.
- stringconst (constant)
- This regular expression will only match the string constant itself.
- anychar
- This is a constant regular expression that will match any ascii character.
- complement
- This regular expression will match anything the regular expression
argument to this construct does not. It constitutes complement with
respect to ascii*.
- concat
- This will be the regular expression corresponding to the
concatenation of the regular expression arguments.
- fix
- This will be the regular expression matching any number in the
interval (both end-points included) specified by the two intconst
arguments.
- intersection
- This will be the regular expression corresponding to the
intersection of the regular expression arguments. That is, a string is
matched if and only if it is matched by all of the regular expression
arguments.
- range
- This regular expression will only match characters in the interval
(both end-points included) specified by the two charconst arguments.
- regexp
- This regular expression allows regular expressions to be specified
in Perl style. Characters escaping requires prefixing of two backslash
characters.
- relax
- This will be the regular expression matching any number in the
interval (both end-points included) specified by the two intconst
arguments. It will however not match numbers prefixed with
zeros (as in "007").
- star
- This will be the regular expression corresponding to kleene's star
of the supplied regular expression. That is, any number (including
zero) of repetitions of the regular expression supplied.
- union
- This will be the regular expression corresponding to the union of
the regular expression arguments. That is, a string is matched if and
only if it is matched by at least one of the regular expression
arguments.
- record
- This regular expression is the identity on the regexp argument specified
to the right of the equality character, but with the exception of one side-effect.
The effect is that when used for testing strings
the regexp is ``recorded'' and becomes available for reception using
the name specified in the identifier.
For all defined formats, the match
construct is available for testing whether a string complies with the
associated regular expression defined by the format. The result will
be a boolean stating whether or not the string is in the language
induced by the regular expression. Any recordings are available as the
right hand sides of assignments in the comma-separated list enclosed
in square brackets.
Formats can also be used for scanning in files. Syntactically,
scan takes an expression designating a file and an identifier
denominating a format.
A call to scan will return the longest (possibly empty) string
in the regular expression defined by the format at the current
position in the file specified. After this, the current file position
will be updated accordingly.