Document Description and Processing in Scheme

Kurt Nřrmark ©
Department of Computer Science, Aalborg University, Denmark


Abstract

Index References Contents
This is a presentation of document description and processing in Scheme, given i Görlitz, December 16, 2005.


Introduction

Plan of this talk
Slide Note Contents Index
References 

  • Doing web work in Scheme - approaches

  • XHTML Mirrors in LAML

  • Working with higher-order functions in LAML


Doing web work in Scheme

Different Approaches
Slide Note Contents Index
References 

There are several different ways to do web work in Scheme


XHTML Mirrors in LAML

The idea of mirroring
Slide Note Contents Index
References 

The mirror of XHTML brings this particular XML language into the programming language Scheme

XML concepts are given Scheme counterparts

Mirroring as precise as possible

The idea of bringing the XML language XHTML into the programming language Scheme by mirroring.
To see this image you must download and install the SVG plugin from Adobe

Mirroring of XHTML (1)
Slide Note Contents Index
References 

Attributes are handled by simulated keyword parameters

Textual content is passed as quoted strings

There is white space between content strings unless explicitly suppressed

Program: An illustration of the most basic mirror rules of LAML.
(a 'href "http://www.cs.aau.dk" 'target "main" "A link to" "CS at Aalborg")

Program: The XHTML counterpart.
<a href="http://www.cs.aau.dk" target="main">A link to CS at Aalborg</a>

We will make actual LAML demos while we go along...

Mirroring of XHTML (2)
Slide Note Contents Index
References 

Instead of specifying where to add white space we tell where to suppress it

Program: Illustration of white space suppression.
(p "Use" (kbd "HTML") _ ","  (kbd "XHTML") _ ","
    (kbd "XML")  _ ","  "or" (kbd "LAML") _ ".")

Program: The XHTML counterpart.
<p>
  Use <kbd>HTML</kbd>, <kbd>XHTML</kbd>, 
  <kbd>XML</kbd>, or <kbd>LAML</kbd>.
</p>

Mirroring of XHTML (3)
Slide Note Contents Index
References 

Lists of contents and lists of attributes are processed recursively and spliced together with their context

Program: Passing of attribute lists and lists of textual contents.
(body
    (ul 
      (map li (list "one" "two" "three")) 
    )
  )
 )
)

Program: The XHTML counterpart.
<body>
  <ul><li>one</li> <li>two</li> <li>three</li></ul>
</body>

Reference

Mirroring of XHTML (4) - More of the same...
Slide Note Contents Index
References 

Lists of contents and lists of attributes are processed recursively and spliced together with their context

Program: Passing of attribute lists and lists of textual contents.
(body
    (let ((attributes (list 'start "3"))
          (contents   (map li (list "one" "two" "three"))))
       (ol 'id "demo" contents attributes)
    )
  )
 )
)

Program: The XHTML counterpart.
<body>
  <ol id="demo" start="3">
    <li>one</li> 
    <li>two</li> 
    <li>three</li>
  </ol>
</body>

Reference

Mirroring of XHTML (5)
Slide Note Contents Index
References 

CSS attributes and XHTML attributes are uniformly specified

CSS attributes are prefixed with css:

Program: Using CSS attributes together with XHTML attributes.
(em 'class "c1" 'css:background-color "yellow" "Görlitz, December, 2005")

Program: The XHTML counterpart.
<em style="background-color: yellow;" class="c1">Görlitz, December, 2005</em>

Reference

Character transformation
Slide Note Contents Index
References 

Every character can be transliterated by means of a HTML character transformation table.

This transformation is done during the final textual document rendering of the abstract syntax tree

Program: A sample XHTML document with a custom HTML character transformation table.
/user/normark/scheme/slides/goerlitz-05/document-description-in-scheme/includes/html-char-transf/transf-1.laml

Reference

We will play with different XHTML character transformation tables.

Summarizing the XHTML mirrors
Slide Note Contents Index
References 

An XHTML mirror maps each HTML element to a named function in Scheme

An XHTML mirror is automatically derived from an XML DTD

  • Properties:

    • Each mirror function accepts parameters in a flexible way

    • Generates well-formed and valid HTML documents

    • Prevents accidental emission of '<' and '>' as part of the textual contents

    • The mirror functions return abstract syntax trees which, for instance, can be rendered as 'HTML text'

    • Supports optional pretty printing of the resulting HTML code

    • Supports checking of both relative and absolute links

All this is generalized to arbitrary XML languages in LAML


Working with higher-order functions in LAML

Combination of mirror functions: Tables (1)
Slide Note Contents Index
References 

An idiomatic table presentation pattern in LAML

Program: A idiomatic table presentation pattern in LAML.
(load (string-append laml-dir "laml.scm"))
(laml-style "simple-xhtml1.0-transitional-validating")

(define map (curry-generalized map))
(define row list)

(define sample-table
 (list
  (row "This" "is" "row" "1")
  (row "This" "is" "row" "2")
  (row "This" "is" "row" "3")
  (row "This" "is" "row" "4")))

(write-html '(pp prolog)
 (html 
  (head 
   (title "Table Examples"))
  (body 
    (table 'border "1" (map (compose tr (map td)) sample-table) )
  )
 )
)

(end-laml)

Reference

Combination of mirror functions: Tables (2)
Slide Note Contents Index
References 

A similar table example

Switching the first two columns of the table

Program: A idiomatic table presentation pattern in LAML.
(define row list)  (define cell list)

(define (col-switch row-lst) 
 (cons (second row-lst)
  (cons (first row-lst) (cddr row-lst))))

(define sample-table
 (list
  (row "This" "is" "row" "1")
  (row "This" "is" "row" "2")
  (row "This" "is" "row" "3")
  (row "This" "is" "row" "4")))

(write-html '(pp prolog)
 (html 
  (head 
   (title "Table Examples"))
  (body 
    (table 'border "1" (map (compose tr (map td) col-switch) sample-table) )
   )))   

Reference

Open table forms
Slide Note Contents Index
References 

In the early days of LAML I used specialized table convenience functions that abstracted away both tr and td

In that way, it was not possible to affect the tr and td attributes

References

Program: A idiomatic table presentation pattern in LAML.
(define row list)  (define cell list)

(write-html '(pp prolog)
 (html 
  (head 
   (title "Table Examples"))
  (body 
    (table 
	 'border "1" 
	 (map (compose tr (map td))
	       (list
		(row (cell "This" 'rowspan "2") "is" "row" "1")
		(row                            "is" "row" "2")
		(row "This"                     "is" "row" "3")
		(row (cell "This is" 'colspan "2")   "row" "4"))) ) )))

Program: Entire document.
(load (string-append laml-dir "laml.scm"))
(laml-style "simple-xhtml1.0-transitional-validating")

(define map (curry-generalized map))
(define row list)  (define cell list)

(write-html '(pp prolog)
 (html 
  (head 
   (title "Table Examples"))
  (body 
    (table 
	 'border "1" 
	 (map (compose tr (map td))
	       (list
		(row (cell "This" 'rowspan "2") "is" "row" "1")
		(row                            "is" "row" "2")
		(row "This"                     "is" "row" "3")
		(row (cell "This is" 'colspan "2")   "row" "4"))) ) )))

(end-laml)

XML-in-LAML abstractions
Slide Note Contents Index
References 

How can I make my own functions with mirror function parameter passing?

Program: The indent-pixels function. annotation
(define indent-pixels
 (xml-in-laml-abstraction
  (lambda (content attributes)
    (let ((i (get-prop 'indentation attributes))
          (reduced-attributes (but-props attributes '(indentation))))
      (table 'border 0
         (tr (td 'width i)
             (td 'width "*" content reduced-attributes)))))
  (required-implied-attributes '(indentation) '(*) "index-pixels")
  "indent-pixels"))

Reference

Program: The use of the indent-pixels function.
(write-html '(pp prolog)
 (html 
  (head 
   (title "Table Examples"))
  (body
    (p "First paragraph")
    (indent-pixels 'indentation "50" 'id "x"
      (p "Second paragraph")
      (p "Third paragraph"))
    (p "Fourth paragraph")))

Reference

How to query HTML documents
Slide Note Contents Index
References 

LAML supports a number of simple tree traversal functions that can extract information from HTML documents

Program: A sample HTML document.
(define doc
 (html 
  (head 
   (title "Table Examples"))
  (body 
    (p 'id "5" "First paragraph") 
    (p (em "Second") "paragraph")
    (div
      (p "Third paragraph")))))

  • All text in all paragraphs

    • (find-asts doc "p" ast-text)

  • The first paragraph

    • (find-first-ast doc "p")

  • All constituents that satisfy a given predicate

    • (traverse-and-collect-all-from-ast doc (lambda (ast) ...) id-1)

Another way to query HTML documents
Slide Note Contents Index
References 

LAML supports an extraction of information from HTML documents in a way similar to X-path

Program: A sample HTML document.
(define doc
 (html 
  (head 
   (title "Table Examples"))
  (body 
    (p 'id "5" "First paragraph") 
    (p (em "Second") "paragraph")
    (div
      (p "Third paragraph")))))

  • The div instances in the body instance

    • (match-ast doc (location-step 'child "body") (location-step 'child "div"))

  • All p instances

    • (match-ast doc (location-step 'descendant "p"))

  • All p instances that have an id attribute

    • (match-ast doc (location-step 'descendant "p" (nt:for-which (location-step 'attribute "id"))))


Being a little more advanced...

Delayed procedural content items
Slide Note Contents Index
References 

A Scheme function within the document contents serves as a delayed procedural content item.

Such as function is automatically called with two parameters: the whole document and the immediate parent.

The function may deliver both document contents and attributes.

Program: Use of a text steeling function.
(load (string-append laml-dir "laml.scm"))
(laml-style "simple-xhtml1.0-transitional-validating")
(lib-load "xhtml1.0-convenience.scm")

; Higher-order function that 'steels' text from the first
; occurrence (if any) of element-name.
(define (steel-text-from element-name)
  (lambda (root-ast parent-ast)
    (let ((relevant-ast (find-first-ast root-ast element-name)))
      (if relevant-ast (ast-text-deep relevant-ast) "???"))))


(write-html '(pp prolog)
 (html
  (head 
    (title (steel-text-from "h1")))
  (body 
    (h1 "Demo of simple document reflection")

    (p "This is a demo of simple document introspection aided by" 
       (em "delayed procedural content items")_"." )

    (when-generated)
  )))

(end-laml)

Program: Use of meta keywords.
(load (string-append laml-dir "laml.scm"))
(laml-style "simple-xhtml1.0-transitional-validating")
(lib-load "xhtml1.0-convenience.scm")

(define meta-props 
 (list 'http-equiv "Content-Type" 'content "text/html; charset=iso-8859-1"))

; Return a meta text keywords from the emphasized words in the entire document.
(define (meta-from-keywords root-ast parent-ast)
  (let ((meta-contributions 
          (traverse-and-collect-all-from-ast 
             root-ast
             (lambda (ast)
               (and (equal? (ast-element-name ast) "span")
                    (equal? (ast-attribute ast 'class #f) "keyword")))
             ast-text)))
    (meta 'name "keywords" 'content (list-to-string meta-contributions ","))))

(define keyword 
  (xml-in-laml-abstraction 
    (lambda (c a) (span 'class "keyword" c a))))


(write-html '(pp prolog)
 (html 
  (head 
   (meta meta-props) 
   meta-from-keywords    
   (title "Demo of simple document reflection"))
  (body 
    (h1 "Illustration of 'distributed meta'")

    (p "This is a demo of simple document introspection aided by" 
       (keyword "delayed procedural content items")_ "." )

    (p "A delayed procedural content item is a" (keyword "closure") 
       "which is evaluated at" (keyword "documentation expansion time") _ "." )

    (p "We show how to extract" (keyword "meta keywords") 
       "from designated and marked up keywords in the text. Notice that the keyword
        element mirror function is produced by an XML-in-LAML abstraction." )

    (when-generated))))

(end-laml)

Program: Element counting.
(load (string-append laml-dir "laml.scm"))
(laml-style "simple-xhtml1.0-transitional-validating")
(lib-load "xhtml1.0-convenience.scm")
(set-xml-accept-extended-contents-in 'xhtml10-transitional #t)

(define meta-props 
 (list 'http-equiv "Content-Type" 'content "text/html; charset=iso-8859-1"))

; Return a meta text keywords from the emphasized words in the entire document.
(define (count-elements el-name)
  (lambda (root-ast parent-ast)
    (length (match-ast root-ast (location-step 'descendant "p")))))

(define keyword 
  (xml-in-laml-abstraction 
    (lambda (c a) (span 'class "keyword" c a))))

(write-html '(pp prolog)
 (html 
  (head 
   (meta meta-props) 
   (title "Demo of simple document reflection"))
  (body 
    (h1 "Illustration of element counting")

    (p "This document has" (count-elements "p") "paragraphs.")

    (p "This is a demo of simple document introspection aided by" 
       (keyword "delayed procedural content items")_ "." )

    (p "A delayed procedural content item is a" (keyword "closure") 
       "which is evaluated at" (keyword "documentation expansion time") _ "." )

    (p "We show how to extract" (keyword "meta keywords") 
       "from designated and marked up keywords in the text. Notice that the keyword
        element mirror function is produced by an XML-in-LAML abstraction." )

    (when-generated))))

(end-laml)

Program: Elements that depend on their immediate context.
(load (string-append laml-dir "laml.scm"))
(laml-style "simple-xhtml1.0-transitional-validating")
(lib-load "xhtml1.0-convenience.scm")
(set-xml-accept-extended-contents-in 'xhtml10-transitional #t)

(define meta-props 
 (list 'http-equiv "Content-Type" 'content "text/html; charset=iso-8859-1"))

; Return a meta text keywords from the emphasized words in the entire document.
(define (count-elements el-name)
  (lambda (root-ast parent-ast)
    (length (match-ast root-ast (location-step 'descendant "p")))))

(define word
 (xml-in-laml-abstraction
   (lambda (c a)
     (lambda (root-ast parent-ast)
       (cond ((equal? (ast-element-name parent-ast) "b") 
                 (span 'css:font-size "150%" c a))
             ((equal? (ast-element-name parent-ast) "em") 
                 (font 'color (rgb-color-encoding red) c a))
             (else (list c a)))))))

(write-html '(pp prolog)
 (html 
  (head 
   (meta meta-props) 
   (title "Demo of simple document reflection"))
  (body 
    (h1 "Illustration of element elements that depend on their immediate contexts." )

    (p "This document shows who to define context sensite elements")

    (p "This is a demo of simple document introspection aided by" 
       (word "delayed procedural content items")_ "." )

    (p "A delayed procedural content item is a" (b (word "closure")) 
       "which is evaluated at" (em (word "documentation expansion time")) _ "." )

    (p "We show how to extract" (word "meta" "keywords") 
       "from designated and marked up keywords in the text. Notice that the keyword
        element mirror function is produced by an XML-in-LAML abstraction." )

    (when-generated))))

(end-laml)

References


Close to the end

So, what is LAML really?
Slide Note Contents Index
References 

LAML is a set of XML language mirrors in Scheme together with the Scheme programs that process these

LAML is a software package of Internet related (and mostly educational) Scheme software, which I have written the last 7 years.

  • Substantial components of LAML:

    • LENO - Lecture Notes

    • Course Plan - Course Home Pages

    • The Scheme Elucidator - Internal documentation of Scheme programs

    • SchemeDoc - External interface documentation of Scheme programs

    • And more...

LAML can be used together with different Scheme Systems, on different operating systems, and on different platforms

Status
Slide Note Contents Index
References 

  • LAML is

    • relative mature

    • regularly maintained

    • now and then extended

    • supported if errors are reported

    • regularly downloaded for a number of purposes

LAML is free and open source available from

http://www.cs.aau.dk/~normark/laml/

Exercises
Slide Note Contents Index
References 

Exercise 1. Time tables

In this exercise we will make a local Cottbus - Görlitz train time table.

Assume, for instance, that we have the following list of defined stations:

 (define stations
  (list "Cottbus" "Spremberg" "Weisswasser" "Horka" "Görlitz"))

Also assume, that the following list gives the number of minutes in between the stations

 (define minutes
  (list 22 12 26 18))

Write a function (time-table start-time) that generates a time table given that the train leaves Cottbus at start-time. Thus for instance this table.

Next, write a function (time-table list-of-start-times) that can generate a more comprehensive time table. Like this one.

I will propose that you give start-time in the universal time format. The Scheme function (current-time) gives the current time.

You can use these functions if you prefer.

Exercise 2. A color-text function

Write a Scheme markup function color-text of the following form:

  (color-text r g b . content-and-attributes)

The three first parameters are integers between 0 and 255. The tail parameter list text-and-attributes is arbitrary XML contents and attributes.

Examples of calls:

  (color-text 255 0 0 "This is some nice text")
  (color-text 255 0 0 "This is some" (em "nice") "text")
  (color-text 255 0 0 'class "my-class" "This is some" (em "nice") "text" _ ".")

Alternatively you may go for a form:

  (color-text 'red r 'green g 'blue b content-and-attributes)

such as

  (color-text 'red 255 'green 0 'blue 0 "This is some nice text")

Implement the desired function in terms of an HTML span element. Use the function xml-in-laml-positional-abstraction to implement the mixed positional and LAML parameter passing. The function rgb-color-encoding is also useful.

Exercise 3. Table Column Exercise

This exercises is oriented towards columns of tabular data.

Given three columns of numeric data. Each column is a list of numbers or boolean false values.

The column length may vary from column to column

As an example, the first column col1 may be the list:

  (1 5 #f 9 17)

This column may, together with two other columns, give the following table:


   1 15  8
   5 #f #f
  #f  0 11
   9  6 
  17 

Now, write a Scheme program supported by LAML that presents the table of columns. The table can be represented as the three individual columns. (And this is of course a little tricky, because HTML works with list of rows). #f values should be presented as blank table entries. In the fourth column, you should add the values in each row. When adding, the #f value and missing values (due to short columns) should be treated as 0.

Thus, a function call like (present-table table) should show this table.


Collected references
Contents Index
The resulting XHTML page
The resulting XHTML page
The resulting XHTML page
The resulting XHTML page
HTML
HTML
HTML
Table convenience functions
SchemeDoc documentation of xml-in-laml-abstraction
HTML
Context sensitive element
Counting
Meta
Text steeling

 

Document Description and Processing in Scheme
Course home     Author home     About producing this web     Previous lecture (top)     Next lecture (top)     Previous lecture (bund)     Next lecture (bund)     
Generated: December 19, 2005, 14:16:58