This is a non-validating HTML parser built on top of the simple XML parser for LAML.
In addition there are HTML pretty printing procedures in this tool.
The implementation of the parser is done by redefining functions from the XML parser.
Most of the xml-parser stuff is reused in this parser.
The top-level node is called a html-tree, which may hold top level comment nodes and
declaration nodes (docttype nodes).
The parser represents HTML comments within the document as special comment nodes.
The parser will be very confused if it meets a less than or greater than character which isn't part of tag symbol.
Such character must be HTML protected (use the special character entities in HTML).
This tool assumes that laml.scm and the general library are loaded.
The tool loads xml-support (which is the starting of this html support tool), collect-skip and file-read libraries.
See the XML support for information about the format of
parse trees and variables that control the pretty printing. See also
the illustrative
examples
of the HTML parsing and pretty printing tools.
The typographical rebreaking and re-indenting of running text is still missing.
The LAML interactive tool procedures html-pp and html-parse
in laml.scm are convenient top-level pretty printing and parse procedures respectively.
Known problem: The handling of spaces after the start tag and before the end tag is not correct.
Please notice that this is not a production quality parser and pretty printer! It is currently used for
internal purposes.