Version: 2006-11-20
Author: Oliver Horn
This document describes ERNC, a syntactic extension of RELAX NG‘s Compact Syntax (RNC). ERNC introduces a lightweight syntax to make the definition of so-called “content classes” more convenient. Note that all extensions are just syntactic sugar. Every ERNC grammar can be transformed into a pure RELAX NG grammar; a simple processor is available for download.
Content classes are named groups of element types of the same class. Such content classes occur frequently in documentation-oriented schemata like DocBook, TEI or XHTML. For example, in XHTML the group of the list element types or the group of the heading element types are “classes”. There are a lot of other examples for content classes, many can be found in the current RELAX NG grammars for XHTML 2 or DocBook.
Actually, in usual RELAX NG schemata content classes are just named patterns containing all element types as choice. Organizing element types in such a way makes a schema more modular and flexible. However, the current solution for handling content classes in RELAX NG is
A named pattern can be declared as member of a content class by using the <:
operator in its definition. A content class is just a named pattern representing the combination of all members by means of the choice-operator |
.
The following ERNC grammar looks very similar to a pure RNC grammar. It defines two patterns UnorderedList
and OrderedList
. The new thing is the <: List
following the pattern name on the left-hand-side of the definition. By this declaration, each pattern becomes a member of the List
content class.
UnorderedList <: List = element ul { ... } OrderedList <: List = element ol { ... }
An equivalent pure RNC grammar is:
UnorderedList = element ul { ... } List |= UnorderedList OrderedList = element ol { ... } List |= OrderedList
It is even possible to associate more than one content class with a pattern. For example, the following definition puts UnorderedList
into both content classes List
and Block
.
UnorderedList <: List, Block = element ul { ... }
The schema is equivalent to:
UnorderedList = element ul { ... } List |= UnorderedList Block |= UnorderedList
Note: Only regular definitions can be associated with a content class. Definitions using the combining operators &=
or |=
cannot be associated with a content class.
A content class can be declared by the (new) keyword class
followed by the name of the class. Note that there is no requirement to declare a content class explicitly. In that case, the content class is created implicitly by means of the class associations.
However, an explicit declaration of a content class may be useful, e.g. for documentation purposes. Similar to other constructs, a content class declaration can be decorated with documentation or annotations. For example, the following declaration defines the content class List
and annotates it with a small documentation:
## Class for list elements class List
Further, the declaration can be also be used to define the content class itself as member other content classes. In the following example, the content class List
is declared and becomes also member of content class Block
:
## Class for list elements class List <: Block
This example is equivalent to the following pure RNC grammar:
## Class for list elements List = notAllowed Block |= List
Note: The default content of a content class is notAllowed
. This is correct, because the members are combined by means of the choice-operator |
. The notAllowed
alternative is removed when the grammar is processed (see RELAX NG specification, section 4.20).
In documentation formats it is usual that some content classes have mixed content, i.e. they contain text as part of their content models. For this purpose, ERNC allows to declare the text
pattern as member of a context class by means of the membership operator.
text <: Inline
Note: The text
pattern is treated as normal member of a content class, i.e. it is combined by means of choice. It is not equivalent to the mixed
pattern which combines text by interleaving. However, it is a usual way to deal with mixed content, see e.g. the RELAX NG schemata for DocBook or XHTML 2.
The ERNC processor is a little Python script which takes a ERNC grammar as input and generates a pure RNC grammar as output.
Download: ernc-0.1zip (License: MIT License, see file LICENSE in the distribution)
The ERNC processor preserves additional information about the content classes as annotations: Content class declarations are annotated with ernc:class
and class memberships are annotated with ernc:member
.
For example, the simple declaration class List <: Block
is transformed to something like:
[ ernc:class ] List |= notAllowed [ ernc:member ] Block |= List
The preprocessor generates some additional definitions suffixed with _ernc
. You can ignore them! The reason is simply that the current version of the preprocesor is just a better tokenizer. It does not parse a grammar file and hence cannot detect the end of a definition. However, the generated definitions has to inserted after the original definition, otherwise it would break documentation and annotations. The workaround used by the preprocessor is to introduce each time a useless interim definition (those suffixed with _ernc
).