Copyright © 2011 , Kurt Nørmark |
Given a well-formed XML document this parser returns a Lisp tree structure that represents the parse tree of the XML document. The parser handles start tags, end tags, and empty tags (in this parser called start-end tags). Entities and their declarations are not handled at all.
The top level functions are xml-parse and xml-parse-file. The xml-parser can be loaded as a library as well.
There exists elucidative documentation of this parser. See also the HTML parsing and pretty printing support, which is built on top of the XML tools, and the illustrative examples of the XML parser and pretty printer.
This tool assumes that laml.scm and the general library are loaded. The tool loads the collect-skip and the file-read libraries.
The typographical rebreaking and re-indenting of running text is still missing.
The LAML interactive tool procedures xml-pp and xml-parse in laml.scm are convenient top-level pretty printing and parse procedures respectively.
Please notice that this is not a production quality parser and pretty printer! It is currently used for internal purposes.
From LAML version 20, the XML pretty printing in lib/xml-in-laml/xml-in-laml.scm replaces the XML pretty printing in this library.
collect-attributes-in-tree | (collect-attributes-in-tree tree attr-key) | Traverse the parse tree, tree, and return the list all attribute values of the attribute attr-key found in the tree. |
indentation-delta | indentation-delta | An integer which gives the level of indentation |
is-tag-of-kind? | (is-tag-of-kind? tag-kind) | Return a predicate which tests whether a subtree or node is of tag-kind (a symbol or string). |
parse-tree-to-ast | (parse-tree-to-ast pt language) | Convert a HTML/XML parse tree to a LAML abstract syntax tree in language. |
parse-tree-to-element-structure | (parse-tree-to-element-structure pt) | Convert a HTML/XML parse tree to an element structure ala LENO. |
parse-tree-to-laml | (parse-tree-to-laml tree output-file) | Transform an XML or HTML parse tree to a similar surface LAML expression on output-file. |
parse-xml | (parse-xml file-path) | This function parses a file and returns the parse tree. |
parse-xml-file | (parse-xml-file in-file-path out-file-path) | Top level parse function which takes an XML file name as input, and delivers a parse tree on out-file-path. |
parse-xml-file-to-ast | (parse-xml-file-to-ast in-file-path out-file-path xml-language) | Top level parse function which takes an XML file name as input, and delivers an XML-in-LAML AST on out-file-path. |
parse-xml-string | (parse-xml-string xml-string) | This function parses a string with XML contents and returns the parse tree. |
parse-xml-string-to-ast | (parse-xml-string-to-ast xml-string xml-language) | This function parses a string with XML contents and returns an XML-in-LAML AST. |
parse-xml-to-ast | (parse-xml-to-ast file-path xml-language) | This function parses an XML file and returns the corresponding XML-in-LAML AST. |
parser-status | (parser-status) | Display parser status in case of error in the parse process. |
prefered-maximum-width | prefered-maximum-width | An integer that expresses the preferred maximum column width |
pretty-print-xml-parse-tree | (pretty-print-xml-parse-tree parse-tree) | Pretty prints a HTML parse tree, and return the result as a string. |
pretty-print-xml-parse-tree-file | (pretty-print-xml-parse-tree-file in-file-path [out-file-path]) | Pretty prints the XML parse tree (Lisp file) in in-file-path. |
resulting-parse-tree | resulting-parse-tree | A global varible holding the latest produced parse tree |
traverse-and-collect-from-parse-tree | (traverse-and-collect-from-parse-tree tree node-interesting? result-transformer) | Traverse the parse tree, tree, and return a list of result-transformed nodes that satisfy the node-interesting? predicate in the parse tree. |
use-single-lining | use-single-lining | A boolean which controls the application of single line pretty printing. |
white-space-preserving-tags | white-space-preserving-tags | A list of tag names for which white space is preserved. |
xml-parser-preserve-white-space | xml-parser-preserve-white-space | A constant that controls if white space is uniformly preserved by the XML parser. |
1 The format of the parse tree. | |||
A parse tree T produced by this tool is of the form (tree N ST1 ST2 ... STn)where STi, i=1..n are parse trees (recursively) and N is a node (see below). A leaf node N may be of the form (tree N)or just N if N is a string (corresponding to textual contents) or an empty tag (a tag without contents). An inner node of a parse tree corresponds to a tag (an element) with contents. Such a node is represented by the following 'tag structure': (tag kind tag-name . attr-info)tag is a symbol (for tagging). kind is either start or start-end (both symbols). tag-name is a string. Attr-info is the attribute on property list format. A terminal node may be a start-end node, a comment node or just a contents string. End tags are not represented in the parse tree. Here is an example of a start-end node (empty node) with two properties: (tag start-end "title" role "xxx" size "5")Comments are represented as comment nodes of the form (comment comment-string) Declaration nodes of the form (declaration kind value)are also possible. They are for instance used for document type (???) information in HTML. Finally nodes of the form (xml-declaration attribute-property-list)are supported. | |||
2 Constants. | |||
Constants that affect the working of the parser. | |||
xml-parser-preserve-white-space | |||
Form | xml-parser-preserve-white-space | ||
Description | A constant that controls if white space is uniformly preserved by the XML parser. | ||
See also | Scheme source file | xml-parser-preserve-white-space | |
white-space-preserving-tags | |||
Form | white-space-preserving-tags | ||
Description | A list of tag names for which white space is preserved. A list of strings. No angle brackets ('<' or '>')should be given in the names. Must be assigned to the appropriate value before a parse function is called. | ||
See also | Scheme source file | white-space-preserving-tags | |
3 Native low-level level parser functions. | |||
The top level parser functions in this section deliver specialized low level parse trees (in some ad hoc list structure). | |||
parse-xml-file | |||
Form | (parse-xml-file in-file-path out-file-path) | ||
Description | Top level parse function which takes an XML file name as input, and delivers a parse tree on out-file-path. file-path is a file path (relative or absolute) with or without an extension. The default extension is xml. The parse tree is written on the file out-file-path. | ||
See also | Scheme source file | parse-xml-file | |
parse-xml | |||
Form | (parse-xml file-path) | ||
Description | This function parses a file and returns the parse tree. file-path is a file path (relative or absolute) without any extension. | ||
Returns | The parse tree in the original, low level parse tree format (a list structure) | ||
See also | Scheme source file | parse-xml | |
parse-xml-string | |||
Form | (parse-xml-string xml-string) | ||
Description | This function parses a string with XML contents and returns the parse tree. xml-string is a string with xml contents | ||
Returns | The parse tree in the original, low level parse tree format (a list structure) | ||
See also | Scheme source file | parse-xml-string | |
4 AST-level parser functions. | |||
The top level parser functions in this section deliver XML-in-LAML abstract syntax trees. These trees are much more useful than the low-level parse trees delivered by the functions in the previous section. | |||
parse-xml-file-to-ast | |||
Form | (parse-xml-file-to-ast in-file-path out-file-path xml-language) | ||
Description | Top level parse function which takes an XML file name as input, and delivers an XML-in-LAML AST on out-file-path. The AST is written on the file out-file-path. | ||
Parameters | in-file-path | a file path (relative or absolute) with or without an extension. The default extension is xml. | |
out-file-path | path the output file. | ||
xml-language | the name of the XML language in LAML, to which the resulting AST belongs. A symbol. | ||
See also | Scheme source file | parse-xml-file-to-ast | |
parse-xml-to-ast | |||
Form | (parse-xml-to-ast file-path xml-language) | ||
Description | This function parses an XML file and returns the corresponding XML-in-LAML AST. | ||
Parameters | file-path | a file path (relative or absolute) without any extension. | |
xml-language | the name of the XML language in LAML, to which the resulting AST belongs. A symbol. | ||
Returns | An XML-in-LAML AST. | ||
See also | Scheme source file | parse-xml-to-ast | |
parse-xml-string-to-ast | |||
Form | (parse-xml-string-to-ast xml-string xml-language) | ||
Description | This function parses a string with XML contents and returns an XML-in-LAML AST. | ||
Parameters | xml-string | a string with xml contents. | |
xml-language | the name of the XML language in LAML, to which the resulting AST belongs. A symbol. | ||
Returns | An XML-in-LAML AST. | ||
See also | Scheme source file | parse-xml-string-to-ast | |
resulting-parse-tree | |||
Form | resulting-parse-tree | ||
Description | A global varible holding the latest produced parse tree | ||
See also | Scheme source file | resulting-parse-tree | |
5 Utility parser functions. | |||
The functions in this section are all miscelaneous and utility functions of the parser. | |||
traverse-and-collect-from-parse-tree | |||
Form | (traverse-and-collect-from-parse-tree tree node-interesting? result-transformer) | ||
Description | Traverse the parse tree, tree, and return a list of result-transformed nodes that satisfy the node-interesting? predicate in the parse tree. In other words, apply the node-interesting? predicate to all subtrees of the tree during the traversal, and return the result-transformed list of subtrees. Both the functions node-interesting? and result-transformer are applied on trees and subtrees. | ||
Examples | (traverse-and-collect-from-parse-tree resulting-parse-tree (is-tag-of-kind? 'a) parse-tree-to-laml-expression) | ||
See also | Scheme source file | traverse-and-collect-from-parse-tree | |
collect-attributes-in-tree | |||
Form | (collect-attributes-in-tree tree attr-key) | ||
Description | Traverse the parse tree, tree, and return the list all attribute values of the attribute attr-key found in the tree. | ||
Examples | (collect-attributes-in-tree tree 'href) | ||
See also | Scheme source file | collect-attributes-in-tree | |
is-tag-of-kind? | |||
Form | (is-tag-of-kind? tag-kind) | ||
Description | Return a predicate which tests whether a subtree or node is of tag-kind (a symbol or string). This function is a useful second parameter to traverse-and-collect-from-parse-tree. | ||
See also | Scheme source file | is-tag-of-kind? | |
related function | traverse-and-collect-from-parse-tree | ||
parser-status | |||
Form | (parser-status) | ||
Description | Display parser status in case of error in the parse process. | ||
See also | Scheme source file | parser-status | |
6 Top level XML pretty printing functions. | |||
pretty-print-xml-parse-tree-file | |||
Form | (pretty-print-xml-parse-tree-file in-file-path [out-file-path]) | ||
Description | Pretty prints the XML parse tree (Lisp file) in in-file-path. Outputs the pretty printed result in out-file-path, which defaults to in-file-path if not explicitly passed. | ||
See also | Scheme source file | pretty-print-xml-parse-tree-file | |
Note | For XML-in-LAML ASTs use pretty-render-to-output-port instead of this function | ||
pretty-print-xml-parse-tree | |||
Form | (pretty-print-xml-parse-tree parse-tree) | ||
Description | Pretty prints a HTML parse tree, and return the result as a string. | ||
See also | Scheme source file | pretty-print-xml-parse-tree | |
Note | For XML-in-LAML ASTs use pretty-xml-render instead of this function | ||
7 Variables that control the pretty printing. | |||
These variables apply for both HTML and XML. | |||
indentation-delta | |||
Form | indentation-delta | ||
Description | An integer which gives the level of indentation | ||
See also | Scheme source file | indentation-delta | |
use-single-lining | |||
Form | use-single-lining | ||
Description | A boolean which controls the application of single line pretty printing. If true, the pretty printer will pretty print short list forms on a single line | ||
See also | Scheme source file | use-single-lining | |
prefered-maximum-width | |||
Form | prefered-maximum-width | ||
Description | An integer that expresses the preferred maximum column width | ||
See also | Scheme source file | prefered-maximum-width | |
8 Parse tree conversions. | |||
In this section we provide a number of conversion functions that work on parse trees. | |||
parse-tree-to-laml | |||
Form | (parse-tree-to-laml tree output-file) | ||
Description | Transform an XML or HTML parse tree to a similar surface LAML expression on output-file. This function accept parse tree rooted by the symbols html-tree, xml-tree, as well the symbol tree. | ||
Parameters | tree | an XML or HTML parse tree | |
output-file | The name of the file on which to write the LAML expression. Can be full path. Must include extension. | ||
See also | Scheme source file | parse-tree-to-laml | |
laml.scm function | html-to-laml | ||
Note | When the resulting file is LAML processed, the LAML file will write the a LAML file, say f.laml, to f.html in the same directory as the laml file. | ||
parse-tree-to-ast | |||
Form | (parse-tree-to-ast pt language) | ||
Description | Convert a HTML/XML parse tree to a LAML abstract syntax tree in language. The returned LAML abstract syntax tree will have positive spacing (which means that a white space is represented explicitly by the #t value). This function accept parse tree rooted by the symbols html-tree, xml-tree, as well the symbol tree. Recall that the syntax trees are used as the internal format by the validating mirrors of LAML. | ||
Parameters | pt | The parse tree | |
language | The name of the XML language, such as xhtml10-transitional. A symbol. | ||
See also | Scheme source file | parse-tree-to-ast | |
related function | parse-tree-to-element-structure | ||
parse-tree-to-element-structure | |||
Form | (parse-tree-to-element-structure pt) | ||
Description | Convert a HTML/XML parse tree to an element structure ala LENO. This function accept parse tree rooted by the symbols html-tree, xml-tree, as well the symbol tree. Modelled after parse-tree-to-ast. | ||
See also | Scheme source file | parse-tree-to-element-structure | |
related function | parse-tree-to-ast | ||