The DSD Schema Language and its Applications
XML (eXtensible Markup Language), a linear syntax for trees, has gathered a remarkable amount of interest in industry. The acceptance of XML opens new venues for the application of formal methods such as specification of abstract syntax tree sets and tree transformations.
A user domain may be specified as a set of trees. For example, XHTML is a user domain corresponding to the set of XML documents that make sense as HTML. A notation for defining such a set of XML trees is called a schema language. We believe that a useful schema notation must identify most of the syntactic requirements that the documents in the user domain follow; allow efficient parsing; be readable to the user; allow a declarative default notation a la CSS; and be modular and extensible to support evolving classes of XML documents.
In the present paper, we give a tutorial introduction to the DSD (Document Structure Description) notation as our bid on how to meet these requirements. The DSD notation was inspired by industrial needs, and we show how DSDs help manage aspects of complex XML software through a case study about interactive voice response systems (automated telephone answering systems, where input is through the telephone keypad or speech recognition).
The expressiveness of DSDs goes beyond the DTD schema concept that is already part of XML. We advocate the use of nonterminals in a top-down manner, coupled with boolean logic and regular expressions to describe how constraints on tree nodes depend on their context. We also support a general, declarative mechanism for inserting default elements and attributes that is reminiscent of Cascading Style Sheets (CSS), a way of manipulating formatting instructions in HTML that is built into all modern browsers. Finally, we include a simple technique for evolving DSDs through selective redefinitions. DSDs are in many ways much more expressive than XML Schema (the schema language proposed by the W3C), but their syntactic and semantic definition in English is only 1/8th the size. Also, the DSD notation is self-describable: the syntax of legal DSD documents and all static semantic requirements can be captured in a DSD document, called the meta-DSD