XML mirrors in Scheme: XML in LAML

Kurt Nørmark ©     normark@cs.aau.dk
Department of Computer Science, Aalborg University

Abstract.

This is the chapter in which you learn how to make a mirror of an XML language, which is formally defined via a Document Type Definition (DTD).

The LAML DTD parser and the XML-in-LAML mirror generation stuff is made available in the LAML distribution from version 19.00.

Some of the other XML transformation examples, not addressed in this tutorial, are also available.

 

1     Introduction

The starting point of this part of the tutorial is a standard XML Document Type Definition - also known as a DTD. We will see how to parse it using the LAML DTD parser. After that we will generate a fully validating Scheme mirror of the DTD, and we will see how to make use of the XML language in Scheme and LAML.

1.1     Overview
 


1.1     Overview

The example we will study is written in a very simple XML language for description of a bike. This is not a very interesting examples. In fact, we have a number of examples around which would more interesting, see for instance the DTDs of the transformation examples.

We define the grammar (DTD) of the language in 2.1, parse it in 2.2, and we make a mirror of it in Scheme (see 3.1 ). The major efforts is here to define a validation predicate for one of the elements in the language, see validation.

In 4.1 we illustrate how to make a simple transformation of a bike document to XHTML

Finally, in 5.1 we study a couple of bike documents and their transformations to XHTML.

All the examples source files are available in the tutorial/xml-in-laml directory of the LAML examples.

2     The DTD

We start by defining and parsing the DTD

2.1     The DTD
2.2     Parsing the DTD
 


2.1     The DTD

The bikes document type definition (DTD) is the first to be constructed. Indeed, it is very simple; We have written it for purposes and for this tutorial. It is not used in other contexts. The DTD is here:

<!ENTITY % Number "CDATA">
    <!-- one or more digits -->

<!ENTITY % Boolean "(true | false)">
    <!--  spaces -->



<!ELEMENT bikes (bike)*>
<!ATTLIST bikes
>

<!ELEMENT bike (frame, wheel+, brake*, lock*)>
<!ATTLIST bike
  kind   (mountain-bike, racer-bike, tourist-bike, other)  "tourist-bike"
>

<!ELEMENT frame EMPTY>

<!ATTLIST frame
  frame-number CDATA #REQUIRED
>

<!ELEMENT wheel EMPTY>

<!ATTLIST wheel
  size        %Number; #REQUIRED
  tube-kind   CDATA    #IMPLIED
>

<!ELEMENT brake EMPTY>

<!ATTLIST brake
  kind    CDATA   #IMPLIED
  brand   CDATA   #IMPLIED
>

<!ELEMENT lock EMPTY>

<!ATTLIST lock
  brand   CDATA   #IMPLIED
  insurance-approved  %Boolean; #REQUIRED
>

If you wish, you can bring the DTD up in the right frame of this elucidator by clicking here .

Let us briefly explain what the DTD means. - The first two clauses are entities. Basically, you can think of them as textual macros named 'Number' and 'Boolean'. DTDs are weak with respect to pre-defined types. Therefore it is common to introduce some ad hoc types in the way we have done it in bikes.dtd. Entities should, in addition, be used whenever you need to use the same fragment of text more than once.

Next we encounter the elements and the attributes. The elements define a conventional context free grammar. We see that bikes consist of zero, one, or more bike clauses. In turn, a single bike clause is an aggregation of a frame, one or more wheels, zero or more brakes, and zero or more locks. Both frame, wheel, brake, and lock are terminal concepts. In DTD parlance, they are EMPTY elements. All of them have a number of attributes, however.


2.2     Parsing the DTD

It must be stressed that the LAML DTD parser is an ad hoc parser, which does not recognize all aspects of a DTD. However, it is good enough to handle all the XHTML DTDs (strict, transitional, frameset), the SVG DTD, as well as the DTD we have made in the scope of the LAML project. Early versions of the parser (which are not part of the LAML distributions) have also been used to parse the HTML4.01 DTD (which is a non-XML DTD).

In order to use the information in the DTD it is necessary to parse it. I.e., the natural structure of the DTD needs to be revealed and represented in some kind of hierarchical structure.

The central parsing command is parse-dtd. This procedure can be called from a Scheme prompt if the 'appropriate software' is loaded. We find it easier to make a little script that first loads the software and next calls parse-dtd. The parsing script is parsing-script. After loading laml and the DTD parser from the tools/dtd-parser directory of the LAML distribution, the parse-dtd function is called with the proper name of the dtd file as input.

As the result, the DTD parser writes a file bikes.lsp - parsed-bikes-dtd-raw - which contains the parsed DTD data structure. No kind of pretty printing is performed, so at the first glance the result may be difficult to grasp. To make it a little easier to understand we provide a somewhat pretty printed version of bikes.lsp in parsed-bikes-dtd (made, in part, by using the LAML Scheme pretty printer scheme-pp on the file). One of the main things to notice is the contents models in the element clauses. This is element number five in each of the element forms. The element content model is a symbol (empty or any) or a list structure prefixed by either element-content or mixed-content). The symbol or list is a natural and straightforward Lisp representation of the content model, as defined in section 3.2 or the XML1.0 Recommendation. As examples, the parsed representation of the bikes and bike elements are

respectively.

You should read section 1 of dtd-parser for additional details about the format of the parsed DTD file.

3     Mirror synthesis

In this section we will describe how to make the mirror of the bikes language in Scheme.

3.1     Making the mirror
3.2     The resulting mirror
 


3.1     Making the mirror

Given the parsed DTD from above it is easy to synthesize a fully validating mirror of XML language in Scheme. Stated in simple terms, we go for a one-to-one mapping between elements of the DTD and functions in Scheme. Thus, for each element in the XML language there will be a mirror function in Scheme.

As we did for parsing, we also make a simple script which activates the mirror synthesizer. This is mirroring-script.

We explain it briefly. After initial loading of laml and tools-xml-in-laml we set the tools parameters in the section of the script called tool-parameters . All these parameters are described in Section 1 of laml.scm. We give the name of the mirror (mirror-name), a full path to the parsed dtd (parsed-dtd-path), and the full path of the mirror target directory ( mirror-target-dir ).

The main mirror generation procedure is generate-mirror from tools-xml-in-laml. Please click on the name to learn about its formal parameters. See also the activation of generate-mirror and the actual parameters in the section of the script called tool-activation.

The action element bikes is the root element in a bikes document. See section 5.1 for an example of such a document. The consequence of announcing the bikes element as an action element is that a procedures bikes! will be called with the purpose of initiating some kind of transformation of the bike ast - see section 4 .

Notice also the default language properties, described in section 2 of tools-xml-in-laml. In case the default default values do not fit your needs, you can change the defaults before generate-mirror is called. Please read Section 2 of tools/xml-in-laml.scm. The values of the variables are strings, which are inserted in the synthesized mirror library. This aspect of the mirror synthesis is a little primitive, and we may be able to improve it in a future version.


3.2     The resulting mirror

In earlier versions of LAML the DTD author should be prepared to write som of the XML validation predicates manually. From version 20, this is all fully automated.

It is now time to take a look at the generated mirror. As many other auto-generated programs it is not really intended to be read. More important, you should never edit it!

Take a look at the mirror in bikes-mirror. There are two sections:

  1. The validation procedures starting with bikes-bike-management-laml-validate!. The vectors inside some of the the validation procedures represent the automatically generated final state automata, as provided and supported by the library Finite State Automaton library.
  2. The mirror functions starting with bikes.

For convenience we usually make a LAML style function for easy loading of software which belongs to a given XML-in-LAML language. These styles are located in styles/xml-in-laml/ in the LAML distribution. The bikes.scm LAML style can be seen in bikes-style. Notice that the definition of the action procedure of bike, called bikes! is defined here. It must be defined before the generated mirror functions are loaded. In this file we also program the appropriate transformations of a bikes AST. This is the theme of section 4.

At this point in time it will be possible to play with bikes documents in Scheme/LAML syntax. If you are curious, you can already now jump to section 5.

4     Transformation of the mirror AST

In this section we will study a simple transformation of a bikes AST.

4.1     Transformation of the mirror AST
 


4.1     Transformation of the mirror AST

At this point we are able to construct a bike AST. In order to make it useful, we must somehow transform the AST - typically to an HTML page. We will here see a very simple example of a transformation which illustrates some useful LAML function for these purposes.

The bikes-1 document gives rise to an abstract syntax tree. Normally, we do not look at the internal list representation of such a tree. It may, however, be instructive to see what it looks like, see bikes-1-ast. The boolean #t constants are white space markers, and really of no relevance for the bikes language. (You may get rid of the white space markers once and for all via definition of default-xml-represent-white-space ).

The xml-in-laml library, which is shared between all the XML-in-LAML languages, defines functions for constructing and accessing abstract syntax trees (see section 4).

We program the transformation functions in the bikes style file bikes-style which - for support of this tutorial - is located in the styles/xml-in-laml/tutorials-and-demos/ directory of the LAML distribution. The transformation is initiated in the action procedure of the bikes element called bikes!. This procedure writes a HTML file via use of the write-html procedure from laml. In the procedure bikes! we traverse the bikes AST in order to locate the bike subclauses (). The traverse-and-collect-all-from-ast function applies the bike-table function on each bike clause. The resulting HTML document can be seen here. At the HTML level the bikes style uses XHTML 1.0 transitional, and the HTML file is written in pretty printed mode. You can take a look at the HTML source via the 'view source' in your browser.

The function bike-table accesses some of the constituents of a bike clause, and in in turn some of their attributes. The function returns a list of tr elements, which in bikes! conveniently can be passed to a modified () XHTML table mirror function.

Notice that bikes-style loads the XHTML transitional mirror, xhtml-loading, such that the resulting transformation returns an XHTML document. In case of XHTML validation problem, you will get warnings.

5     Mirror usage

In this section we will look at an example usage of the generated mirror functions.

5.1     Using the mirror
 


5.1     Using the mirror

Let us look at a couple of bikes documents, see bikes-1 and bikes-2. In both we first load laml.scm and then the bikes style, which we discussed in section 3.2 and 4.

In bikes-1 we see a bikes clause with two bike clauses. The document is valid relative to the DTD. The generated HTML document is here.

In bikes-2 we also see a bikes clause with two bike clauses. This document is invalid, however. When we process it we get several validation errors, relative to the bikes DTD. The error report looks like:

LAML Emacs Processing
Welcome to MzScheme version 101, Copyright (c) 1995-99 PLT (Matthew Flatt)
XML Warning: The XML attribute  tube-kind-attribute  is not valid in the wheel element.
XML Warning: Encountered a misplaced frame in a bike construct: 
    &lt;frame frame-number="IQ7W36-56"/&gt; SPACE &lt;wheel size="25" tube-kind-attribute="standard"/&gt; SPACE &lt;wheel size="25" tube-kind="standa...
XML Warning: Encountered a misplaced lock in a bike construct: 
    &lt;lock brand="basta" insurance-approved="true"/&gt; SPACE &lt;frame frame-number="IQ98T33-00"/&gt; SPACE &lt;wheel size="22" tube-kind="standar...

Therefore we cannot (and should not) transform it to an HTML document.

6     Summary

To summarize, the major work is two-fold. First an XML DTD has to be written. Second, the transformation of a document in the new XML language to HTML has to be programmed - in Scheme. The DTD parsing and mirror synthesis (including full XML validation) is done automatically.

You are invited to study some of the other XML transformation examples together with the accompanying paper called XML Transformations in Scheme with LAML - a Minimalistic Approach.