<Formats>

[ Introduction | Syntax | Semantics | String Matching | File Scanning ]

Introduction

<bigwig> has built-in support for regular expressions which can be defined and used for various purposes:

Regular expressions are declared using the format construct. The syntax and semantics of regular expression formats is described below.


Syntax for format related constructs

format ::= format id = regexp ; format definition
regexp_list ::= regexp
regexp ::= id format reference
| stringconst constant
| anychar anychar
| complement ( regexp ) complement
| concat ( regexp_list ) concatenation
| fix ( intconst , intconst ) fixed integer interval
| intersection ( regexp_list ) intersection
| range ( charconst , charconst ) character range
| regexp ( stringconst ) (perl) regexp
| relax ( intconst , intconst ) relaxed integer interval
| star ( regexp ) kleene's star
| union ( regexp_list ) union
| [ id = regexp ] record regexp


Semantics for format related constructs

id (format reference)
This rule is for referencing regular expressions defined by other formats. The effect of referring to another regular expression is as if the expression had been written on the spot directly. Due to the non-recursive nature of regular expressions, these id's may not be used recursively. The same thing goes for mutual recursion.

stringconst (constant)
This regular expression will only match the string constant itself.

anychar
This is a constant regular expression that will match any ascii character.

complement
This regular expression will match anything the regular expression argument to this construct does not. It constitutes complement with respect to ascii*.

concat
This will be the regular expression corresponding to the concatenation of the regular expression arguments.

fix
This will be the regular expression matching any number in the interval (both end-points included) specified by the two intconst arguments.

intersection
This will be the regular expression corresponding to the intersection of the regular expression arguments. That is, a string is matched if and only if it is matched by all of the regular expression arguments.

range
This regular expression will only match characters in the interval (both end-points included) specified by the two charconst arguments.

regexp
This regular expression allows regular expressions to be specified in Perl style. Characters escaping requires prefixing of two backslash characters.

relax
This will be the regular expression matching any number in the interval (both end-points included) specified by the two intconst arguments. It will however not match numbers prefixed with zeros (as in "007").

star
This will be the regular expression corresponding to kleene's star of the supplied regular expression. That is, any number (including zero) of repetitions of the regular expression supplied.

union
This will be the regular expression corresponding to the union of the regular expression arguments. That is, a string is matched if and only if it is matched by at least one of the regular expression arguments.

record
This regular expression is the identity on the regexp argument specified to the right of the equality character, but with the exception of one side-effect. The effect is that when used for testing strings the regexp is ``recorded'' and becomes available for reception using the name specified in the identifier.


String Matching

exp ::= match ( exp , exp ) [ getinput_list ] string match

For all defined formats, the match construct is available for testing whether a string complies with the associated regular expression defined by the format. The result will be a boolean stating whether or not the string is in the language induced by the regular expression. Any recordings are available as the right hand sides of assignments in the comma-separated list enclosed in square brackets.


File Scanning

exp ::= scan ( exp , id ) file scan

Formats can also be used for scanning in files. Syntactically, scan takes an expression designating a file and an identifier denominating a format. A call to scan will return the longest (possibly empty) string in the regular expression defined by the format at the current position in the file specified. After this, the current file position will be updated accordingly.


bigwig@brics.dk
Last updated: November 2, 2001
Valid HTML 4.01!