Generated: September 4, 2002, 15:50:06Copyright ©2002, Kurt NørmarkThe local LAML software home page

Reference Manual of the HTML parser and pretty printer for LAML

Kurt Nørmark ©    normark@cs.auc.dk    Department of Computer Science    Aalborg University    Denmark    

Source file: tools/xml-html-support/html-support.scm
LAML Version 18.00 (August 31, 2002) full

This is a non-validating HTML parser built on top of the simple XML parser for LAML. In addition there are HTML pretty printing procedures in this tool. The implementation of the parser is done by redefining functions from the XML parser. Most of the xml-parser stuff is reused in this parser.

The top-level node is called a html-tree, which may hold top level comment nodes and declaration nodes (docttype nodes). The parser represents HTML comments within the document as special comment nodes.

The parser will be very confused if it meets a less than or greater than character which isn't part of tag symbol. Such character must be HTML protected (use the special character entities in HTML).

This tool assumes that laml.scm and the general library are loaded. The tool loads xml-support (which is the starting of this html support tool), collect-skip and file-read libraries.

See the XML support for information about the format of parse trees and variables that control the pretty printing. See also the illustrative examples of the HTML parsing and pretty printing tools.

The typographical rebreaking and re-indenting of running text is still missing.

The LAML interactive tool procedures html-pp and html-parse in laml.scm are convenient top-level pretty printing and parse procedures respectively.

Known problem: The handling of spaces after the start tag and before the end tag is not correct.

Please notice that this is not a production quality parser and pretty printer! It is currently used for internal purposes.

Table of Contents:
1. Top level HTML parsing function.2. HTML pretty printing functions.

Alphabetic index:
parse-html(parse-html file-path)This function parses a file and return the parse tree.
parse-html-file(parse-html-file in-file-path out-file-path)Parse the file in in-file-path, and deliver the parse tree in out-file-path.
parse-html-string(parse-html-string str)Parse the string str which is supposed to contain a HTML document.
pretty-print-html-parse-tree(pretty-print-html-parse-tree parse-tree)Pretty prints a HTML parse tree, and return the result as a string.
pretty-print-html-parse-tree-file(pretty-print-html-parse-tree-file in-file-path [out-file-path])Pretty prints the HTML parse tree (lisp file) in in-file-path.

 

1.   TOP LEVEL HTML PARSING FUNCTION.


parse-html-file


Form
(parse-html-file in-file-path out-file-path)

Description
Parse the file in in-file-path, and deliver the parse tree in out-file-path. If in-file-path has an empty file extension, html is added.


parse-html


Form
(parse-html file-path)

Description
This function parses a file and return the parse tree. Thus, the difference between this function and parse-html-file is that this function returns the parse tree (no file output). file-path is a file path (relative or absolute). An html extension is added, if necessary.


 

2.   HTML PRETTY PRINTING FUNCTIONS.


pretty-print-html-parse-tree-file


Form
(pretty-print-html-parse-tree-file in-file-path [out-file-path])

Description
Pretty prints the HTML parse tree (lisp file) in in-file-path. Outputs the pretty printed result in out-file-path.


pretty-print-html-parse-tree


Form
(pretty-print-html-parse-tree parse-tree)

Description
Pretty prints a HTML parse tree, and return the result as a string.


parse-html-string


Form
(parse-html-string str)

Description
Parse the string str which is supposed to contain a HTML document. The parsing is done by writing str to the temp dir in the LAML directory, and then using the function parse-html-file. Precondition: The temp dir of the LAML directory must exist.


Generated: September 4, 2002, 15:50:06
This documentation has been extracted automatically from the Scheme source file by means of the Schemedoc tool