Copyright © 2011 , Kurt Nørmark |
The top-level node is called a html-tree, which may hold top level comment nodes and declaration nodes (docttype nodes). The parser represents HTML comments within the document as special comment nodes.
The parser will be very confused if it meets a less than or greater than character which isn't part of tag symbol. Such character must be HTML protected (use the special character entities in HTML).
As of LAML version 31, the parser is able to parse certain non-wellfored HTML document (documents with crossing tags). This tool assumes that laml.scm and the general library are loaded. The tool loads xml-support (which is the starting of this html support tool), collect-skip and file-read libraries.
See the XML support for information about the format of parse trees and variables that control the pretty printing. See also the illustrative examples of the HTML parsing and pretty printing tools.
The typographical rebreaking and re-indenting of running text is still missing.
The LAML interactive tool procedures html-pp and html-parse in laml.scm are convenient top-level pretty printing and parse procedures respectively.
Known problem: The handling of spaces after the start tag and before the end tag is not correct.
Please notice that this is not a production quality parser and pretty printer! It is currently used for internal purposes.
parse-html | (parse-html file-path) | This function parses a file and return the parse tree. |
parse-html-file | (parse-html-file in-file-path out-file-path) | Parse the file in in-file-path, and deliver the parse tree in out-file-path. |
parse-html-string | (parse-html-string str) | Parse the string str which is supposed to contain a HTML document. |
pretty-print-html-parse-tree | (pretty-print-html-parse-tree parse-tree) | Pretty prints a HTML parse tree, and return the result as a string. |
pretty-print-html-parse-tree-file | (pretty-print-html-parse-tree-file in-file-path [out-file-path]) | Pretty prints the HTML parse tree (lisp file) in in-file-path. |
1 Top level HTML parsing function. | |||
parse-html-file | |||
Form | (parse-html-file in-file-path out-file-path) | ||
Description | Parse the file in in-file-path, and deliver the parse tree in out-file-path. If in-file-path has an empty file extension, html is added. | ||
See also | Scheme source file | parse-html-file | |
parse-html | |||
Form | (parse-html file-path) | ||
Description | This function parses a file and return the parse tree. Thus, the difference between this function and parse-html-file is that this function returns the parse tree (no file output). file-path is a file path (relative or absolute). An html extension is added, if necessary. | ||
See also | Scheme source file | parse-html | |
2 HTML pretty printing functions. | |||
pretty-print-html-parse-tree-file | |||
Form | (pretty-print-html-parse-tree-file in-file-path [out-file-path]) | ||
Description | Pretty prints the HTML parse tree (lisp file) in in-file-path. Outputs the pretty printed result in out-file-path, which defaults to in-file-path if not explicitly passed. | ||
See also | Scheme source file | pretty-print-html-parse-tree-file | |
pretty-print-html-parse-tree | |||
Form | (pretty-print-html-parse-tree parse-tree) | ||
Description | Pretty prints a HTML parse tree, and return the result as a string. | ||
See also | Scheme source file | pretty-print-html-parse-tree | |
parse-html-string | |||
Form | (parse-html-string str) | ||
Description | Parse the string str which is supposed to contain a HTML document. The parsing is done by writing str to the temp dir in the LAML directory, and then using the function parse-html-file. Precondition: The temp dir of the LAML directory must exist. | ||
See also | Scheme source file | parse-html-string | |