Using the HTML Mirror Functions

Kurt Nørmark ©     normark@cs.aau.dk
Department of Computer Science, Aalborg University

Abstract. In this chapter we will take a systematic look at the HTML mirror functions, which are derived from an XML DTD (Document Type Definition).
 

1     What is a HTML mirror function
We first describe what makes a function an HTML mirror function.
1.1     What is a HTML mirror function
1.2     The context of a HTML mirror fragment.
 


1.1     What is a HTML mirror function

The building blocks of HTML are called elements. In the following HTML document we see instances of the head, title, body and p elements.

 <html>
   <head><title>Doc Title</title></head>
   <body><h1>Title</h1> 
      <p> First paragraph </p>
   </body>
 </html>

LAML provides a specialized function for each element in HTML. In the early days of LAML these functions were programmed manually, in an ad hoc fashion. In the current version of LAML, they are all generated automatically from the formal definition of an XML language, such as XHTML. In another chapter of the tutorial we will see how to generate such function from our own XML languages. The HTML mirror functions are pretty advanced, not least because they carry out a comprehensive validation of the generated HTML code. As we will see below, the mirror functions all observe some special parameter passing and interpretation rules.

In this tutorial we use and explain the most recent version of the HTML mirrors in Scheme. These are the family of mirrors known as XHTML 1.0 transitional, strict, and frameset. We recommend that all LAML users stick to the XHTML mirrors, and more generally to the software based on 'XML in LAML'. Only use the HTML 4 mirrors if you absolutely need to due to legacy concerns.

The document above corresponds to doc0 . (As always in the LAML tutorial, click on the red or blue name to make the document appear in the other big frame of the browser). Take at look at the write-clause (more specifically the html clause at ). (The color dots - source markers - appear in pairs: One in this frame and a similar one in the other frame. You can click on both of them for navigation purposes). As we see, the direct counterpart to the HTML fragment shown above is:

(html 
   (head 
    (title "Doc Title") 
   )
  
   (body
    (h1 "Title")
    (p "First paragraph")
   )
 )

We notice the following:


1.2     The context of a HTML mirror fragment.

It is easy to recognize the HTML mirror fragment in write-clause of doc0 (). But what about the other aspects of doc0? Let us explain each of them, from first to last line.

Please notice that all the LAML processing parameters, like xml-validate-contents? and xml-check-attributes? have default values. Therefore you do probably not need them explicitly in most of your documents.

2     Parameters to mirror functions
In this section we will study the basic parameter passing conventions of the HTML mirror functions
2.1     Parameters
2.2     More Parameters
2.3     Whitespace handling
2.4     Character references
2.5     CSS attributes
2.6     HTML Comments, CDATA Sections, and Processing Instructions
2.7     Delayed Procedural Content Items
 


2.1     Parameters

Now that we have seen the overall picture we will study the parameter conventions of the HTML mirror functions in LAML. At the raw Scheme level, HTML mirror functions accept an arbitrary parameter list by virtue of (lambda parameter-list ...) At the LAML level, however, parameter-list is interpreted and error checked in a number of ways. In this section we will concentrate on the interpretation of parameter-list.

We will turn our interest to another document, namely mirror1. As promised, we load the necessary libraries directly, thus circumventing the use of the simple document style 'simple-xhtml1.0-strict-validating'.

The resulting - and frankly - not very interesting HTML document is mirror1.html

The most basic parameter passing rules are the following:

For a concrete illustration take a look at item one in the-write-clause (), namely:

(a 'href laml-url "LAML")

This corresponds to the HTML fragment:

<a href="laml-url">LAML</a>

The first parameter of the Scheme form is the symbol href, which plays the role of an attribute name. Therefore, the following parameter should be a string, and it serves as the attribute value. And it is, namely the value of the variable laml-url. The third parameter is a string which is not preceded by a symbol; As such, the third parameter "LAML" is part of the contents of the instance of the a element.

The XML-in-LAML processing parameter xml-accept-only-string-valued-attributes is normally true (#t). This parameter is set by the procedure set-xml-accept-only-string-valued-attributes-in, because it needs a language argument. (If you mix two or more XML-in-LAML languages, you can have different values of xml-accept-only-string-valued-attributes for the languages involved). As the name indicates, this variable controls the rigidness of LAML with respect to the attribute values. If you set this variable to false (#f) LAML will use the function as-string on the parameter which follows an attribute name (a symbol).

In the second list element of the-write-clause () we have rotated the parameters, but still there is one contents parameter, "LAML" and a single attribute, href. In LAML is OK to have the attributes after the element contents as long as Rule 1 os obeyed. You can even have some attributes before, some in the middle, and some after the element contents.

Notice one difference between

(a 'href laml-url "LAML")

and

<a href="laml-url">LAML</a>

In the Scheme form laml-url is a variable, the value of which is the string "http://www.cs.auc.dk/~normark/laml/". The corresponding HTML fragment is much longer:

<a href="http://www.cs.auc.dk/~normark/laml/">LAML</a>

In XML it is generally not allowed to apply the same attribute more than once in a given element instance. Therefore, we issue a warning if you attempt so. Thus

(a 'href laml-url 'href "http://www.google.com" "Google")

will give a warning. From a pragmatic point of view, this XML convention is in fact not very practical. We have found that it sometimes is quite useful to pass an attribute more than once. (This is the case if we work with elements that have many attributes, some of which are set at a general level). At mirror generation time it is decided if all attributes (even duplicates) are passed on, if only the first is passed on, or if only the last is passed. The function xml-duplicated-attribute-handling informs you about the handling of duplicated attributes in a given XML language. In all three XHTML mirrors, duplicated attributes are passed on.

We see that LAML/Scheme programs can use all Scheme forms and functions side by side with use of the HTML mirror functions. This is useful in many cases, such as in situations where URLs are defined as symbolic constants, and used several times in a document.


2.2     More Parameters

Let us now illustrate another mirror function parameter rule:

We still look at mirror1 document, which is rendered as mirror1.html (and part of the tutorial example documents.)

In the third unordered list item in the-write-clause () we pass a single list to the mirror function, namely

(list "LAML" 'href laml-url)

In the fourth item at line only the attributes, as bound to the local variable my-attributes, are passed as a list, and this line is therefore identical with the previous line, . By use of a variable like my-attributes a collection of attributes can be handled as a first class citizen.

This parameter convention just describes makes it possible to organize part of the content, or a collection of attributes, in lists, and just pass the list as a parameter side by side with content and attribute parameters. This is important for a flexible handling of Lisp data, which typically is organized in lists.

Let us finally see a more elaborate example at the last line . Here the variable target-attributes is bound to the list

(target "t" href "http://www.cs.auc.dk/~normark/laml/")

in the surrounding let* special form. The textual contents of the given a element mirror function is

LAML Power

By the way, you will find out that the target attribute of the a element is not valid in XHTML1.0 strict. LAML tells you that when you process the document. We should really use XHTML1.0 transitional if we want a valid document. Try it out yourself!

In summary, we have seen that data, which is organized in lists, can be easily passed as both contents, attributes, and mixed contents/attributes.

To see more advanced, but extremely useful examples of this, please take a look at a set of additional examples, especially those about tables. (These additional examples accompany one of the papers we I have written about LAML). Notice that these particular examples use the HTML4.01 transitional mirror.


2.3     Whitespace handling

Let us now turn to a crucial detail, namely the handling of white space. We illustrate the discussion with yet another document namely mirror2. You can also see the rendering of the document. In this document we have not given any LAML processing attributes at all. It means that we use the default values.

The relevant mirror rules are:

First notice that the choice of boolean value false (#f) as the space suppress value is rather arbitrary. Any value which can be distinguished (via a type predicate) from symbols and strings at run time would be equally good. (The value of the variable explicit-space-suppress controls which value to use for white space suppressing).

In the first paragraph of html-write-clause three content strings are passed to the mirror of the p element . Notice that the interspacing after "the" and before "paragraph" is due to the first of the rules.

Normally there is no space just in front of punctuations. The second line

(p "This is the second" (b "paragraph") _ ".")

illustrates the second rule, where the space between "paragraph" and "." is suppressed by the variable named underscore.

The third line shows that the space handling also applies in the case where the contents is passed as a list of words:

(p (list (kbd "This") "is" "the" "third" 
   "paragraph" _ "."))

Again, take a look at the rendered HTML page .

The fourth paragraph at shows that we can have attributes, here a class attribute, side by side with the elements of the list.

The fifth paragraph illustrates that the lists can be nested, and - in - that the attributes can be buried in the inner list.

The last, and seventh paragraph (a very strange example, I should admit) shows that white space suppress work well together with content lists.


2.4     Character references

Character references are used to denote special characters, which cannot be typed directly due to the limitation of the keyboards in normal use. Character references are also used to get access to characters such as <, > and & which have special meanings in HTML.

In LAML there is no technical need to give special interpretations of <, >, and &. Therefore these characters are transformed behind the scene, such that they appear in the HTML document as entered in the input. This is done through use of the so-called character transformation table. The transformations are as follows:

< is transformed to &lt;
> is transformed to &gt;
& is transformed to &amp;

The HTML character transformation table is described in in section 5 .

Imagine that <, > and & were output verbatim in the HTML text. This would compromise the validation (see section 3) because we in that way could emit unbalanced an arbitrary HTML markup. Using LAML, the only way to produce markup is via use of the HTML mirror functions.

In XHTML a character reference is denoted (char-ref x). The parameter x can be a number or a symbolic name. The document char-ref shows a number of examples. You can see the resulting HTML page here. Internally, a character reference is a structural entity in the same way as element instances and white space markers. When the structure (the AST) is linearized to text, the character references are translated to HTML character references.

XML attribute values may also contain (native, HTML) character references. The document char-ref shows a couple of examples. In order to make this available, the XML-in-LAML rendering functions enforces that character number 38 ('&') is never transformed within attribute values. This ensures that textually represented XML character entities within attribute values are rendered exactly as they are written in the LAML source.

Character references in CSS attribute values are currently treated in the same way, but it is probably not meaningful to have character references in CSS attributes. CSS follows its own rules, and CSS validation is currently not supported by LAML.

As a somewhat related issue, XML prescribes attribute normalization. In the current version of LAML (version 24) attribute normalization is not implemented. In a future version we may implement this normalization.


2.5     CSS attributes

Inline CSS attributes are handled in a special way in a LAML document. Such attributes are often useful when we form HTML presentational abstractions in which we want to make use of the special rendering made possible with Cascading Style Sheet attributes.

The CSS mirror rule in LAML is the following:

Let us now study a document with CSS attributes, namely mirror3. You can also see the rendering of the document and hereby get a concrete feeling for the effect of the CSS attributes in the document.

Take a look at a-write-clause. When we use inline CSS attributes in a HTML document it normally happens through the HTML style attribute. The value of a style attribute is a CSS fragment which uses a slightly different syntax than HTML and XML. At we illustrate this.

In the fragment at we show the preferred LAML handling of CSS attributes in LAML. We see that CSS attributes can be given side by side with HTML attributes. In other words, all attributes are handled uniformly, and in the same syntax.

Notice that LAML does not currently check that the names of the CSS attributes makes sense. Thus, the attribute checking does only apply to the HTML attributes.


2.6     HTML Comments, CDATA Sections, and Processing Instructions

When you use LAML your document source is written in Scheme, and therefore it is natural to use Scheme Comments in you documents:

 (p "Some text") ; Some comment

If you wish to use native XML comments in your XHTML documents this can also be done:

 (p "Some text") (xml-comment "Some comment")

This generates the following XHTML fragment:

 <p>Some text</p> <!--Some comment-->

CDATA sections can be used for portions of a document that contains scripting program fragments. As an example, the LAML fragment

 (cdata-section "if (x < y) &x else &y")

generates

 <![CDATA[if (x < y) &x else &y]]>

CDATA Sections are not rendered by browsers.

XML Processing Instructions is "a rarely used XML technicality". Processing instructions also be dealt with in LAML. I have never used them, however... See the function processing-instruction for some details.

You can consult the extra.laml document an its rendering extra.html in the XHTML1.0 transitional example directory if you wish to explore XML comments, CDATA sections, and processing instructions.


2.7     Delayed Procedural Content Items

A LAML document is constructed bottom up, because each (mirror) function needs its parameters (content items) before it can construct their surrounding document structure. In some situations it is useful to include some document part that depend on its context. In this section we will show how to produce an index of all links from the current document.

Delayed procedural content items give rise to the following mirror rule:

Let us study the example document procedural-items and the rendered HTML page. Most important, notice that the function link-index is inserted at the very top of the document, as the first constituent of body . Please observe, that is not called explicitly. The procedural object (the closure) is simply referred as one of the document content items, side by side with for instance (hr) (which becomes a LAML AST), text strings, and white space markers.

When the entire html document is built, it is scanned for delayed procedural content items. The procedure expand-procedural-content-items-in-ast does that. write-html calls expand-procedural-content-items-in-ast, so the document author does not have to be concerned with this. In case the document is not handled by write-html or write-xml, the document author is resposible of calling expand-procedural-content-items-in-ast. When link-index is called it traverses the entire (root) document by means of find-asts, it locates all a elements, and it produces a table by means of the XHTML convenience function multi-column-list. The table is inserted instead of the procedure in the document - at . Thats it! And that is the way LAML provides for context sensitive document items.

Delayed procedural content items have been introduced in LAML version 29. In LAML version 33, delayed procedure content items have been enabled in empty elements. (Actually, this corrects an error in earlier versions). This supports "delayed attributes" in elements without contents.

3     Validation aspects
Here we will discuss the validation aspects of the XHTML mirrors.
3.1     Validation
 


3.1     Validation

As one of the attractive properties of the XHTML mirror functions in Scheme (an similarly mirrors of other XML languages) these functions validate the generated HTML code on the fly.

Take a look at the document in mirror4. Please notice that this document is invalid in several respects. Can you already now spot the problematic places in the document?

Well, if not, LAML will help you. We get the following feedback when processing the document:

LAML Emacs Processing with mzscheme-200
Welcome to MzScheme version 205, Copyright (c) 1995-2003 PLT
Welcome to LAML Version 24.00 (December, 2003, development).
(C) Kurt Normark, Aalborg University, Denmark.
XML Warning: Encountered a misplaced  em  element within a  head  element: 
    (title "An Invalid document")  (em "Emphasis not allowed here")
XML Warning: The XML attribute  x  is not valid in the  p  element.
XML Warning: The attribute  id  is not allowed to appear more than once in a  p  element.
XML Warning: Encountered a misplaced  ol  in a  p  element. 
    (ol (li "Ordered list not allowed here"))
XML Warning: Encountered a misplaced  li  element within a  body  element: 
    (p 'x "y" "A paragraph with an invalid attribute")  (p 'id "p1" 'id "p2" (ol (li "Ordered list not allowed here")))  (li "li item ...
LAML processing time: 470 milliseconds.
End of LAML processing

You can control the amount of checking via the boolean variables and settings in the top of the document. Try it out yourself by processing the file mirror4.laml in the accompanying example directory.

The variable xml-check-error, which originally is defined in the common part of XML-in-LAML, determines the system's reaction on validation errors. As the default value, LAML displays a warning. You can reassign or redefine it, however:

(set! xml-check-error laml-error)

The procedure laml-error is a function akin to Scheme's error procedure.

With this redefinition, the first encountered validation error stops the LAML processing. Try it out!

You can also control the length of the error messages via the variable xml-error-truncation-length.

4     Using more than one mirror
In this section we will show how to use more than one mirror in the same document.
4.1     Using more than one mirror
 


4.1     Using more than one mirror

The XML-in-LAML facilities of LAML permit use of two or more sets of mirror functions in the same document. In this section we will illustrate an often occurring situation where both the XHTML 1.0 transitional and the XHTML1.0 frameset mirrors are used (loaded) in the same LAML document.

The LAML document frameset-1 serves as the example. See also the resulting web page. Three documents are generated: frameset-doc , left-doc , and right-doc . The frameset-doc uses the XHTML1.0 frameset mirror functions. The two others use the XHTML1.0 transitional mirror function. The crucial observation is that there is a substantial overlap between these two sets of mirror functions (in the meaning of similarly named functions).

In order to deal with this, we need to use the so-called language map. As an example, the name html is ambiguous in the sense that it occurs in both the XHTML transitional mirror and in the XHTML frameset mirror. Using the language map, it is possible to get hold on a specific mirror function in the following way:

(xhtml10-frameset 'html)

In frameset-1 we have used the respective language map to extract unambiguous mirror functions from both XHTML languages. See the functions such as fs:html and tr:html .

Strictly speaking, it is possible to use the language map for only the first loaded mirror library. In our example, this is the XHTML frameset library. LAML will complain or warn about possible language overlap when we use the transitional functions, but we can safely ignore the warnings. In frameset-2 we have turned off the language overlap checking, cf. the no-overlap-checking by assigning the variable xml-check-language-overlap?.

5     Character transformation
Here we will discuss use of the character transformation table.
5.1     Transformations
 


5.1     Transformations

We will now illustrate the impact of the html-char-transformation-table. Every character in the contents of a HTML document can be transformed through this table. Needless to say, most characters are mapped to themselves via the table. Only characters such as < and > are normally transformed to the corresponding character entities &lt; and &gt;. Let us return to mirror2 document, which is rendered as mirror2.html. We will play a little with the document by using a slightly non-standard html-char-transformation-table .

The mirror2 document is copied to mirror5, which we now work on. Please consult it!

In the fundamental LAML library laml.scm the HTML character transformation table, referred by html-char-transformation-table, is created as the identity mapping

(define html-char-transformation-table
    (list->vector (make-list 256 #t)))

which next is mutated at a few locations, such as

  (set-html-char-transformation-entry! 
  html-char-transformation-table (char->integer #\<) "&lt;")

 (set-html-char-transformation-entry!
  html-char-transformation-table (char->integer #\>) "&gt;")

You can find the actual mutations of the HTML character transformation table in the file lib/xml-in-laml/xml-in-laml.scm in your LAML directory.

This situation reflects the default setup. Now we rotate all entries in html-char-transformation-table :

(define html-char-transformation-table
    (list->vector (append (number-interval 1 255) (list 0)) ))

The original definition corresponds to

(define html-char-transformation-table
    (list->vector (append (number-interval 0 255)) ))

Numbers are possible entries in the table, cf. the documentation of html-char-transformation-table (click the name).

Now take a look at mirror5.html and compare it with mirror2.html. Can you read it?

This exercise is - of course - just for fun. As a more serious application, we use the HTML character transformation table to convert national characters to the corresponding character entities in HTML. This is useful, because we can now type the national characters in our web documents, and they will be converted to safe and good values in the resulting HTML document. Because I often write in Danish, I have added the following to my .laml initialization file (which is loaded automatically at startup time when laml.scm is loaded):

 (set-html-char-transformation-entry! html-char-transformation-table 
                                      197 "&Aring")
 (set-html-char-transformation-entry! html-char-transformation-table
                                      198 "&AElig;")
 (set-html-char-transformation-entry! html-char-transformation-table
                                      216 "&Oslash;")

 (set-html-char-transformation-entry! html-char-transformation-table
                                      229 "&aring;")
 (set-html-char-transformation-entry! html-char-transformation-table
                                      230 "&aelig;")
 (set-html-char-transformation-entry! html-char-transformation-table
                                      248 "&oslash;")

   

We control character transformation by set-xml-transliterate-character-data-in and set-xml-char-transformation-table-in. The first form enable character transformation if a boolean #t is passed as second parameter. The second one defines the transformation table. When we define a new XML in LAML language it is possible to define at set of element for which we never use character transformations.

Character transformation works for both raw and pp writing mode.

6     Building abstractions on top of mirror functions
We will here see how we can use the power of Scheme programming together with the HTML mirror functions.
6.1     Building abstractions
6.2     A CD archive
 


6.1     Building abstractions

We have now seen how to write simple documents with the HTML mirror functions. More or less, the LAML documents are simple counterparts to the corresponding HTML documents. But this need not to be the case.

The coast is clear for definition of all kinds of abstractions on top of the mirror functions. We have done a lot of work in this direction. Both LENO, the Scheme Elucidator, the Manual document styles, and others useful stuff have been created by writing abstractions on top of the mirror functions.

You can probably imagine that a LAML source document, which makes heavy use of your own functions, can be quite difficult to approach for others than the original author and developer. The original LENO system is a good examples. In order to avoid arbitrary drifting of LAML document styles, we have come up with the XML-in-LAML framework. With this, the higher level abstractions are defined by an XML DTD (document type definition), which gives rise to a new XML-in-LAML language. As such, the high level LAML document is closely connected to XML. Still in the XML document, it is possible to develop arbitrary (non-XML related) abstractions - both on the content side and the attribute side. The Scheme Elucidator 2 (which is used to write the source of this tutorial) is a good example of non-trivial and high level document handling in LAML.

We will discuss the XML in LAML framework in much more detail later in this tutorial .


6.2     A CD archive

In order to illustrate the basic idea of forming abstractions on top of the mirror functions we will make yet another document, namely abstract1 . Our domain will be a CD archive overview. The final document, abstract1.html is also available for viewing.

At the application level, document-body, we see an instance of cd-archive () and three nested cd-entry clauses (, , ). This is an example of a relative high level document that enumerates a small CD collection.

The implementation of cd-archive and cd-entry is done in the separate file cd-stuff. Let us first study the function cd-entry. First it is noticed that cd-entry takes a number of subentries, each of which is defined by its own function: cd-number, cd-artist, cd-title, and cd-playing-time. These simple functions just tags the parameter(s) by a unique symbol. Thus, these four functions serve as syntactic sugar, which makes the source document abstract1 nice and comprehensible. For internal purposes in cd-number, cd-artist, cd-title, and cd-playing-time the function tag-data is used. As an example

(cd-playing-time 3 4)

returns the list (cd-playing-time "3:4"). cd-entry binds the constituents to local names in the let clause () and it outputs a table (), which is made by use of the XHTML mirror functions.

In a real situation we would recommend to keep the data free of HTML markup until they are encountered by the top-level function, here cd-archive. But for the demo purpose here, our solution is OK.

The top-level function cd-archive returns the "HTML envelope". Due to the flexibility of the HTML mirror function parameter, the list at

(map spacy cd-entries)

Can be passed as an HTML content contribution just after (h1 page-title) () .

With these remarks we are about to end this example. Notice, however, that the CD example is only a simple example of how to form an ad hoc language (a CD archive language), and to implement it by means of Scheme functions. In this example we have not touched on XML-in-LAML at all. We will give an XML-in-LAML example in a later part of this tutorial.

7     Epilogue
We finally conclude and summarize this part of the tutorial
7.1     Epilogue
 


7.1     Epilogue

We have now in relatively great detail studied various aspects of the XHTML 1.0 strict mirror. LAML also contains a fully validating HTML4.01 mirror, and XHTML 1.0 transitional and frameset mirrors. We strongly recommend that you stick to the XHTML mirrors! In addition there are a number of older mirrors, which you should never use for your development. They are only included in the distribution, because a number of document styles and tools still use them internally.

And what is then the most important aspects of the HTML mirror functions? Well, here is my answer:

  1. The flexible parameter passing conventions, most important the handling of attributes and data in lists.
  2. The completeness of the mirror - all elements and all attributes are accounted for in a precise manner.
  3. The validation done by the mirror - if you make an composition error or an attribute error in the document it will be discovered right away, at document generation time. You are either stopped or warned (your choice).
  4. The possibility of programming your own functions and abstractions on top of the mirror functions. This can be done 'XML systematically', using the XML-in-LAML framework, or in a more ad hoc fashion, as illustrated by the CD archive example from above.

This ends this part of the tutorial. Use the right arrow in the top bar to navigate to the next part of tutorial, or use the home icon to get back to the top level page of the tutorial.