The Scheme Elucidator

Kurt Normark ©     normark@cs.aau.dk
Aalborg University, Denmark

Abstract.

This program implements an Elucidator for the Scheme programming language. An elucidator is a programming tool which supports elucidative programming, which in turn is a practical and modern variant of literate programming. The main functionality of the elucidator is oriented towards generation of HTML pages via which we can present documentation and source programs in an Internet browser. In addition, the program generates information which the editor part of the elucidator can use for navigation purposes. The editor part, which is implemented in Emacs Lisp, is not described in this documentation.

This document describes the internal working of the Scheme Elucidator. There exists a Reference Manual which describes how to produce documentation, like the documentation you are reading now. A brief user guide of the Scheme elucidator (as it appears in a browser) can be navigated to via the yellow question mark icons in the top frame of the elucidator.

This elucidative program primarily documents The Scheme Elucidator 2. However, the overall evolution of the Scheme Elucidator is also touched on, and some historical documentation is also included in particularly marked sections.

 

1     Overall understanding of the Elucidator

The Scheme Elucidator is primarily based on the doc-prog relation. Secondarily, the Elucidator depends on the prog-prog and doc-doc relations. In this section we will describe the important overall picture of the tool, and especially how the doc-prog relation is dealt with.

1.1     The overall picture
 


1.1     The overall picture

When an elucidative program is processed by the Scheme Elucidator, the documentation-section elements and the documentation-entry elements are first processed by their action procedures do-documentation-section! and do-documentation-entry!. This collects the raw documentation in the list documentation-elements. At this stage, the numering of documentation sections and entries are also dealt with.

The remaining processing of the elucidative program is initiated in the action procedure of end-documentation - in reality by do-end-documentation!. This processing is elaborated in 2. In this procedure, many different things happen. Among the more interesting is the reading of the program source files, as enumerated in the source-files clause. This is done by the function read-source. Hereby the source files are available as Lisp list structures, via the variable source-list-list-process. Notice however, that this structure does not represent the concrete program layout! Before the call of read-source the source files have been preprocessed with the purpose of eliminating lexical comments.

Now we traverse the read source files in order to find the defined names. This work is done by make-defining-name-occurences. The list of defining name entries is stored in the variable defining-name-occurences.

Next the documentation is presented. Thus, the list documentation-elements is traversed linearly, and the documentation is rendered as HTML. This is done imperatively by laml-documentation-contents!. During this process we know the defined names, and we are hereby able to link to these. Via a few levels of calls we encounter present-and-process-section-or-entry-body-ast! in which the program references are processed. The most central procedure for handling of the program references happens to be destructured-linking-from-doc-to-prog. The procedure generates the anchored links from the documentation to the source programs. As an important side effect, which will be important for elucidation of the program source files, we collect information about the definitions, which have been documented. . The information is collected in the list documented-name-occurences. As a central point, the procedure destructured-linking-from-doc-to-prog finds a number of linking targets within the defining-name-occurences.

Now the documentation is created, and it is time to make the HTML program files. The main procedure for this is make-source-program-file, and further on elucidate-program-source, elucidate-program-source-1, and not least elucidate-program-form. This is the absolute heart of the Scheme Elucidator, and it represents the most challenging aspect of the tool. The parsed source files and the actual textual program sources are processed in parallel, and a large number of anchored links are introduced. Some of theses represent the prog-doc relation, which are the reversed doc-prog relation, which is available in the list documented-name-occurences. The procedure match-symbol is the one that makes most of the prog-doc links. The link banner in front of source files definitions is made by total-doc-navigator.

2     Overall software organization.

In order to understand the overall organization of the elucidator program it is a good idea to start with the organization of the LAML setup file. This is the file which directly controls the processing of documentation and programs in a documentation

2.1     An example of an elucidator setup file.
2.2     Overall documentation processing forms.
2.3     File structure overview
2.4     Software Evolution Notes
2.5     Organization of the setup file
2.6     The documentation-entry and documentation-section clauses
 


2.1     An example of an elucidator setup file.

The documentation bundle is processed by executing an LAML file, which at least contains setup information. The documentation text is also typically authored in this LAML file. The elucidator setup file is a Scheme program, written relative to the Scheme Elucidator XML DTD. Thus, you can think of the setup file as an XML document written in Scheme.

As an example, take a look at a demo setup up file. The elucidator setup information is contained in the elucidator-front-matters clause. Notice in particular all the attributes and the enumeration of the source files involved, see source-files. The documentation appears in between the begin-documentation and the end-documentation clause.


2.2     Overall documentation processing forms.

The setup, as it occurs in elucidator-front-matters, is processed imperatively by a so-called action procedure. In reality, this action procedure is do-elucidator-front-matters!. In this function, most attributes are transferred to global variables.

The begin-documentation and end-documentation surrounds the documentation text. In the original Elucidator begin-documentation and end-documentation where hand coded functions. In the Scheme Elucidator they are mirror functions of XML elments in the Elucidator XML language. Both of these have accompanying action procedures. In reality these are do-begin-documentation! and do-end-documentation! . The latter of these is the starting point for much of the processing in the Elucidator tool, and it is discussed throughly below.

The documetation is typically authored at Scheme and LAML level. Textual documentation, authored in a text file with use of the original, simple ad hoc markup language, is still supported in the Elucidator 2, but we do not use it anymore for any significant documentation. With this approach, a number of documentation-section and documentation-entry clauses are located in between begin-documentation and end-documentation.

The action procedure behind end-documentation function is a long function do-end-documentation! where almost everything is initiated. (We should consider to break this function into several parts, if not for other reasons than to improve its documentation in this description). The different parts of the function can be seen from the comments in the source program. Here we describe briefly and in overview form the interesting and most important parts of the processing done in do-end-documentation!.


2.3     File structure overview

We recommend that the elucidator is organized in a doc directory of the directory, in which the central program source files are found. The doc directory must contain a number of sub-directories.

In order to be concrete let us assume that we document the program p.scm found in the directory p-dir. We get the following directory structure:

p-dir
  p.scm
  doc
    p.laml
    p.txt
    html
      images
      stylesheets
    internal

First notice that the Emacs editor command make-elucidator constructs all the files and directories of doc, including templates of p.laml (the setup file) and p.txt (the documentation file). In case we use LAML style documentation, p.txt is not used. The user must manually make the doc directory.

By executing the Elucidator all the icons are copied into the doc/html/images directories from the software directory, see section 9.2 .

The internal directory is used for files generated by the elucidator; Some of these are used for transfering information from the Elucidator to Emacs' elucidator mode.


2.4     Software Evolution Notes

At the overall level, the Scheme Elucidator 2 relies on XML-in-LAML, whereas the original Scheme Elucidator used a raw, ad hoc Schme format. The difference can be explained as follows.

The author of documentation in the original Scheme Elucidator writes a setup file, which is a Scheme program that calls a variety of procedures, defined by the original Elucidator Scheme program. Here is an example. These procedures are, for instance set-source-directory, set-documentation-name, program-source, begin-documentation, documentation-from, and end-documentation. In addition, a number of variables, such as toc-columns-detail, toc-columns-overall, and elucidator-color-scheme are explicitly (re)defined.

In the Scheme Elucidator 2 - the newest version of the tool - the author of the documentation writes an XML-in-LAML document. Here is an example, similar to the one shown for the original Elucidator above. This is an XML document, defined by an XML DTD, but written in Scheme using the conventions of LAML. All the forms in the setup file - as well as in the documentation - are Scheme functions that serve as mirrors of the underlying XML elements. Examples of these are elucidator-front-matters, source-files, program-source, manual-source, begin-documentation, end-documentation, documentation-section, and documentation-entry (all of which are linked to the their SchemeDoc documentation file).

Version 1

2.5     Organization of the setup file

This section documents the functions used for setup purposes in the original Scheme Elucidator. In the original Scheme Elucidator the setup was handled application of simple Scheme functions, such as set-documentation-name and program-source. In addition, a number of variables, such as toc-columns-detail and underline-program-links, were defined in ordinary define forms.

The first interesting part of the setup file defines the program files of the documentation bundle. This is done through af number of program-source clauses. The parameters of the function program-source describe the source key, the file location of a source program, and the programming language of the source program. (As of now, the programming language information is not used, but in the future it might be useful). In addition, the group field defines the group to which the given source file belongs. Groups are used to determine the coloring of program frames (see 9.3 ).

The source key is meant to be a handy and unique identification of a single source file. Internally in the function program-source we just accumulate the program-source in the variable program-source-list .

Following the program-source clauses we meet the documentation body, enclosed by begin-documentation and end-documentation clauses. The documentation text can either be inlined between the begin and end clauses as a sequence of documentation-entry and documentation-section clauses. However, more typically, we import the documentation text from a separate file via use the documentation-from clause.

We will next take a closer look at the functionality of the mentioned clauses.

Version 1

2.6     The documentation-entry and documentation-section clauses

The function documentation-section and documentation-entry are top-level forms which can contain the documentation text embedded in LAML markup. As already explained above in section 2.5 these forms are not normally used directly; Rather we use a special textual form of the documentation, see section 5. However, the textual documentation format is passed as input into the functions documentation-section and documentation-entry. In that way the functions are important anyway.

Take a look at the manual pages of the elucidator for user level documentation of these forms.

If documentation-section and documentation-entry are used directly we need functions which implement the constituent forms, such as (title ...), (body ...) etc. These functions (see for instance title and body ) are all generated via the higher order function make-syntax-function. This function and the generated functions are all trivial.

From an internal point of view the functions documentation-section and documentation-entry are almost trivial. They basically collect information and put it into a number of useful global variables, which are used in end-documentation (see section 2.2 ).

In both functions we extract the id and the title, and we add these to the elements (see the assignment to document-elements). In that way this information is available in a convenient way. We also make the section-numbering, and we add it to the elements too. The numbering is done by the functions section-numbering and subsection-numbering. These functions are based on two global variables, section-number and subsection-number that holds the section and subsection-numbers. The element called raw-numbering is a list of section number and subsection number. (Section n has raw section number (n 0)). The variables are assigned in the beginning of documentation-section and documentation-entry .

3     Making the program pages

In this section we will study one of the central aspects of the Elucidator, namely the decoration and WEB presentation of the source programs. This is one of the language dependent parts of the Elucidator; in our case the programming language is Scheme. In section 2 we described how the source files are enumerated in the setup file via program-source clauses. As one of the many tasks of end-documentation we make the program sources via calls of the function make-source-program-file for each source file which needs processing. This function is our starting point in this section.

3.1     Getting started: the top level functions
3.2     The overall program traversal and scanning.
3.3     Traversing and scanning lists
3.4     More lexical troubles
3.5     Making links from the program to the documentation
3.6     Marking detailed places in a program
3.7     Preparing the linking to program source markers.
3.8     Linking from source markers in the program.
 


3.1     Getting started: the top level functions

The function function make-source-program-file acceps a source key (the gentle name of a program file), the file location (file path), and the programming language to which the program belongs. The fourth parameter, source-list, is the parsed list of source expressions. The fifth and sixth parameters are the defining-name-occurences and the documented-name-occurences which holds the defined names in the entire documentation bundles, and the relations between documented definitions and sections in the documentation, respectively. The last parameter specifies the font size of the resulting HTML file (a symbol, either 'large or 'small).

The function make-source-program-file calls elucidate-program-source. We use the source key information to make the name of the HTML output file, the destination path, which becomes the second parameter. Appart from that, the two functions are quite similar.

The function elucidate-program-source opens the input and output files. The original source text is read from the input file, and the HTML decorated source text is written to the output file. In this function we prepare for imperative processing of the output file. Thus, instead of forming one large HTML expression which represents the output, we write piece by piece of HTML output to the output port op. The functions pre-page and post-page from the html library, together with the start-tag and end-tag functions from the html-v1 library are used for the imperative output of the necessary tags. Now the function elucidate-program-source-1 takes over.

The function elucidate-program-source-1 iterates while we have not reached the end of the input file. The function elucidate-program-form is called for (but not on) each top-level program contstruct (Scheme top level form) in the input. We investigate this function in the next section.


3.2     The overall program traversal and scanning.

The real and serious processing of the program source file starts in the function elucidate-program-form. The first two parameters ip and op are the input port and output port respectively. The raw program source text is taken from ip, and the decorated output is written to op. The parameter f is the parsed form (a Lisp sexpr). The remaining parameters are just transferred from the caller.

The basic idea is to traverse the form f (tree traversal) and scan the characters on the input port simultaneously. A decorated version of the input is written to the output port op. The decoration consist of coloring, linking, and insertion of a few special icons into the program text. Notice that we do not go for any kind of pretty printing. The source file, as presented in the browser, should basically appear as written in the text editor. The necessary decoration is made possible because we can look ahead in the input via the parsed form f. We know what is in front of us...

The function elucidate-program-form basically dispatches on the type of the form f. (This is not entirely true, but as of now we will tell the store this way. In section 3.4 we will return to a lexcial special case). As we see from the large conditional we handle symbols, strings, numbers, chars, booleans, and a number of list variants.

In the simple cases we call a matching function, such as match-symbol. Via one of the Lisp reading function (which reads a very well-defined portion of the input at the current location) we read the symbol which must be ahead of us. This knowledge is due to the synchronous scanning and tree traversal. Because some symbols may be anchors of links to program definition we look the symbol up in the defined names. If it is there, we output a HTML anchor tag with a link to the similar definition. If not, we just output the symbol.

The functions match-string, match-char, match-number, and match-boolean, are similar, and in reality trivial because there is no possible linking from these lexemes.


3.3     Traversing and scanning lists

We go on with the explanation of elucidate-program-form .

The matching of lists, which in the Lisp world represent program constructs, is of course more complicated. As on overall concern we need to keep the traversal of the program form f and the input synchronized. Here the lexcial special elements such as quote, backquote, unquote, and comments cause a number of problems.

Take a look at the conditional clause ( ) which takes care of define forms in the function elucidate-program-form. (This is the case which traverses and scans a Scheme define form). As can be seen we call the function skip-white-space in order to read over white space elements in the input. Recall that such elements do not have a counter part in the form f. As can also be seen in skip-white-space we handle comments in a special way by means of the function skip-comment (because it is a rather lengthy lexical element).

The functions match-start-parenthesis and match-end-parenthesis are low level helping functions which deals the start and end parenthesis of lists.

The call of the function total-doc-navigator should be noticed; This is the function which generates links from the program to the documentation (the yellow left arrows). We have more to say about this in section 3.5 .

The recursive nature of lists causes a recursive processing: eluciate-program-form calls itself on subforms. There is one very noteworthy thing in this respect. The fourth parameter of the function holds the defined names. This parameter is, as already seen in section 3.2, used to link from applied names to Scheme definitions. There may be local name definitions in a Scheme form. These are parameters and local name binding forms. According to the usual scope rules the local name definitions overrules the global name definitions, as found in defined-names. Therefore we want to subtract the locally defined names from defined names, when the defined names are passed to the recursive call of elucidate-program-form. This is done by the function list-difference-2, but only for parameter bindings (as located by bounded-names ). In section 4.2 we will explain the function bounded-names in some details.

In an early version we did not subtracte locally defined names from let bindings. This caused unfortunately some mis-bindings of applied names in our WEB presentations of programs. It would probably be rather tedious to implement the subtractions of let-defined names. In the current version of the elucidator this problem has been remedied, see section 10 for the details.

The processing of a define form comes before the processing of other proper lists, which comes before the processing of more general lists. As usual we handle special cases before the more general cases. The processing of pairs in elucidate-program-form (at ) reflects the recursive nature of lists. The most tricky thing is to decide when to apply dot notation or more conventional list notation.

Vectors are also handled in a special case ( ).


3.4     More lexical troubles

As promised in section 3.2 we will return to the scanning and processing of the special lexical elements of Scheme. We have already seen how to process white space and comments. What is missing is (at least) quoted expressions, such as 'symbol. In the parsed forms this will appear as (quote symbol). It should be clear that we need to explicitly match these two representations in order to succeed.

The relevant place to look is in the beginning of elucidate-program-form ; More specifically the first case in the conditional. The function quote-in-input? returns whether the input port contains a quote character in front of us. (And we check whether the similar Lisp form is a quote expression; If not a fatal error occurs). If we encounter a quote character in the input we output a similar quote on the output, and we process the quoted expression recursively.

In late 2003 we also implemented support of backquote and unquotes in elucidate-program-form (comma notation) ( and ).


3.5     Making links from the program to the documentation

We will now take a look at the mechanisms which allow us to link from the source programs to documentation sections and entries. Recall that there are no explicit links represented in the program. The information behind these link is the relation from the documentation to the program definitions, but now used in the reverse direction.

The relevant function to study is total-doc-navigator which is called by elucidate-program-form, as discussed in section 3.3. The function returns a sequences of icon anchors, or an empty string. The parameters to total-doc-navigator are:

  1. The name of the Scheme definition we want to link from.
  2. documented-names which in reality is the value of the global variable documented-name-occurences .
  3. Size is either the symbol 'large or 'small (large or small font).
  4. Source-key is the gentle name of the program source file.

We first find the relevant elements from documented-names; That is the tuples which mention name as the first component. Recall here that documented-names is a list of triples: (program-name doc-id strong/weak). We do not want to have more than one reference to a given documentation section from a definition. Therefore we remove duplicates based on the name of the documentation component of a documented name. We do also want to avoid a weak references if a similar strong reference is available. The function remove-redundant-weak-entries makes this special filtering.

The rest of total-doc-navigator returns the icon which toggles between small and large font (if wanted), the icon which links from the definition to the cross reference entry, and following that the documentation navigators. The function doc-link makes the documentation navigation icon anchor tags.


3.6     Marking detailed places in a program

Bascially, we are able to explain a single abstraction. In the Scheme elucidator, this is a Scheme function (at top level). We have, until now, not introduced any means by which we can address a particular place in an abstraction.

In this subsection we will explain the mechanism that allows us to mark a particular place in the program. In some sense, this runs counter to the principle of leavning the source file unaffected of our documentation needs. The marking takes place in the comments of a source program, using very minimal means.

In the first version we implemented the markers were entirely visual. In a later version we link from the markers (see section 6.7 ).

At the concrete level, a mark in a source file comment has the form

     @a

where a is a one-character entity (a letter or a digit).

At the program side, in the function skip-comment-1 we recognize this pattern and output an identifying image. This is done via the function source-marker-image .

At the documentation side, we use the same notation. The function program-linking-transition, which implements the state machine that govern the documentation linking, is the relevant place to implement the markers at the documentation side. From the state normal-text we may enter a new state, inside-marker. The input character encounted in this state determines the mark. We can use the same function as above, source-marker-image to produce the marker. This is done at via a call of source-marker-glyph. The result of this function call is just used as the output string in the state machine.


3.7     Preparing the linking to program source markers.

In section 6.7 we will explain how we have realized the linking from source markers in the documentation to the source mark, appearing in a program. This need preparation in the program in terms of the naming of the source marker places. Here we explain this detail.

The thing to arrange is that the source markers in the programs are tagged with anchor names. Recall from section 3.6 that skip-comment-1 is the relevant function ( ), because source markers are embedded in comments in a program. In addition to the source marker itself we need to output an anchor name, of the same form as shown above. In order to do so we need access to the name of the definition, in which we are located. This information is not available immediately in the function skip-comment-1 .

We can solve this problems in two ways: Either we pass the name of the definition through all the functions as parameter - from elucidate-program-form to skip-comment-1. This could (and perhaps should) be done, but not right now. As always, the easiest thing to do is to make an imperative solution. We go for this solution here.

The function elucidate-program-form sets the global variable enclosing-definition-name, both for define forms ( ) and for sectional, syntactical comments ( ). Now, in skip-comment-1, and more importantly, in the state inside-marker of the function program-linking-transition (which has taken over the work of skip-comment-1 ) we can easily emit an a-name tag ( ).


3.8     Linking from source markers in the program.

We also want to link from the program source markers to the source markers in the documentation. This is the opposite linking as described in section 6.7 ; And it requires the symmetric preparation relative to 3.7. This preparation is naturally a documentation side concern, and as such it is described in section 6.8.

We need to save some additional bookkeeping information about the documentation source markers in order to relate a program source mark to the proper documentation source mark. The necessary information is akkin to the information in the list documented-name-occurences, which describes the relations between program-definition-id, documentation-id, and weak/strong relationship. Here we need a tripple relation

(program-id doc-id source-mark)-list

which we save in the variable documentation-source-marker-occurences. There is one entry for each documentation source marker.

The definition of this variable is really a documentation side preparation, see again 6.8 for details.

The place to introduce the link to the documentation source mark is skip-comment-1 (at ). We introduce the function doc-source-marker-link, the responsibility of which it is to return the documentation-linked source mark.

We pass the information, which is necessary for the function to work:

  1. The bookkeeping information in documentation-source-marker-occurences .
  2. The source marker-char
  3. The name of the program definition, in which we are located.

The last information is found in the variable enclosing-definition-name, which is defined by the function elucidate-program-form (both for define forms and for sectional comments).

Let us now describe the inner working of doc-source-marker-link. We first find the relevant entries in documentation-source-marker-occurences : the entries deal with the given definition (referred by enclosing-definition-name ) and the given marker char. Next we check whether there are 0, 1 or more relevant entries. In case of 0 or more than one we issue a warning. In case of 1 we return the link (an anchor tag from the source mark glyph). We also return a link in case there is more than one relevant entry, namely to the first one.

The caption of the link reports the ambiguity via the function report-ambiguous-doc-source-markers which is called in doc-source-marker-link at position .

4     Extracting defined names

In this section we describe the task of extracting the defined names from a parsed source file. This is a relatively easy task compared with some of the other tasks of the elucidator.

4.1     The function defined-names
4.2     The function bounded-names
 


4.1     The function defined-names

The function defined-names is called in end-documentation, as explained in section 2.2. The starting point is the parameter, which contains the parsed Scheme expressions from a source file. Via the iterative helping function defined-names-1 we iterate through all the forms in the list. We only care about the top-level forms which are defininitions. These are identified by the predicate is-define-form? .

The function defined-name extract the defined name from a Scheme define form, which can be one of these two kinds:

  (define name value)
  (define (function-name par) body)

4.2     The function bounded-names

In this context it is natural to explain the function bounded-names, which we mentioned in section 3.3 above. Recall from there that bounded-names returns the names bound (in the parameter list) of a define form. Because there can be other names bound in a definitions (in let forms), the function name parameter-names would probably have been a better choice. (This has been settled as part of the continued development of the program, see section 10. The orginal version of bounded-names is now, indeed, called parameter-names ).

There are two cases, corresponding to the two forms of definition shown above. Let us first assume that the second element is a pair (proper or improper list). We have now two possibilities:

  (define (function-name p1 p2 p3) body)
  (define (function-name p1 p2 . p3) body)

In both cases we want to return the list (p1 p2 p3). We use the functions proper-part and first-improper-part from the general library to extract the proper and improper part of an improper list.

If the second element of the define form is a symbol we have the following possibilities:

  (define name (lambda (p1 p2 p3) body))
  (define name (lambda (p1 p2 . p3) body))

Again we want to return (p1 p2 p3). In any other case we return the empty list.

5     Parsing the textual documentation

In this section we will explain the processing of the textual documentation format. Recall that this is the preferred format of documentation in an Elucidator. (It is, by the way, the source text used to the markup of the text you are reading here). The alternative is to use documentation-entry and documentation-section forms with LAML markup (see section 2.6 ). This alternative is more complicated, but also more powerful because all the possible LAML abstractions are available.

5.1     Introduction to the textual documentation format
5.2     The overall ideas
5.3     The top level functions.
5.4     Organizing the parsing process
5.5     The accept functions
5.6     The collection functions
5.7     The skipping functions
5.8     Summary of parsing process
 


5.1     Introduction to the textual documentation format

Using the textual documentation format, which we describe below, we are restricted to two kinds of fixed and non-extensible markup: The specialized dot markup and HTML markup.

The textual documentation format is somewhat inspired of the good old rof notation, with dotmarkup. Here we show an introduction, a section, and an entry (a subsection):


  .TITLE       title
  .AUTHOR      author
  .EMAIL       email
  .AFFILIATION affiliation
  .ABSTRACT
  abstract
  .END
  -----------------------------------------------------------------------------
  .SECTION section-id
  .TITLE section title
  .BODY
  Section text
  More section text
  .END
  -----------------------------------------------------------------------------
  .ENTRY entry-id
  .TITLE entry title
  .BODY
  Entry text
  More entry text
  .END
  -----------------------------------------------------------------------------

The dashed lines in between sections are just for separation purposes; They play the roles of comments. section-id and entry-id are section and unit identification symbols, used for cross reference purposes. HTML markup can appear in bodies and titles. The body text usually starts at the line following body, but it may also start just after the body keyword.

The dot markup is line oriented. The dotted keywords must be at the beginning of a line, and the text after .SECTION and .ENTRY and .TITLE run to the end of the line.

By the way, this is the only reason that the dotted keywords aren't interpreted in the text above. We do not, at this level, support "escape mechanisms" which allow us to have the dotted keywords in front positions of a line. However, we support escaping of the linking characters, see 5.2 .


5.2     The overall ideas

Before we describe the details, it is relevant to see the overall lines in the processing. As we see in the next section, the top level function is documentation-from. This function defines (via other functions) the global variables documentation-elements, documentation-title, and documentation-author. The relations defined in clauses [x], {x} and {*x} have not been processed yet. This is done indirectly by the function documentation-contents! which is called by end-documentation. The details of this is described in section 6 .

If we want to use the characters

   [  ]  {  }

and * inside curly brackets we need to escape them with a backslash character:

 \[  \]  \{  \}

The implementation of the escaping mechanism is realized through the state machine, which we discuss in section 6.3 .


5.3     The top level functions.

The top level function for the processing of the textual format is documentation-from. The parameter is a file name; As such we ask for "documentation from a given file".

A documentation-from form appears between begin-documentation and end-documentation in the setup-file, see section 2.1. We have already touched on the documentation-from in section 2.5 and section 2.2 .

The function documentation-from calls functions which process the intro part (title, author, etc), and the remaining documentation units (sections and entries). Besides this, documentation-from is responsible for file opening and closing. The remaining functions take input from an input port, ip .

The function documentation-intro-from-port processes the introduction. It eats the necessary white space in front of it. The function accept-documentation-intro does the real extraction and parsing work (see section 5.5 ). The function define-documentation-intro! calls documentation-intro with the extracted constituent. In turn, this function just assigns the title, author, etc to global variables, which are used by the function documentation-contents!, which we explain in section 6.1.

Similarly, the function documentation-units-from-port eats initial white space, parses a unit (section or entry), and eats the separator. The function accept-documentation-unit does the real work (see again section 5.5 ). The collected unit is passed to define-unit!. The function define-unit! imperatively evaluates (by means of Scheme eval) the Lisp form made by make-documentation-form. This is the function which aggreates a documentation-section or documentation-entry form from the extracted information. Notice the iterative nature of documentation-units-from-port .


5.4     Organizing the parsing process

Before we explain the details of the parsing and text extraction functions we will take the opportunity to discuss the parsing problem which we face here.

We could go for the application of a general parser. However, this is not attractive. There is only a tiny set of syntactic construct. And ordinary lexical analysis would not be very useful on information, which is more or less free text.

We could alternatively take the text through a state machine which should collect the necessary constituents while reading the individual characters of the textual documentation. This could be done, but there would be many states, and it would be quite difficult to make and maintain such a state machine. (We use state machines other places, also in the Elucidator - see 6.3. We could, of course, use the general template and approach from there).

We decided to make a special set of procedures which accepts well-defined portions of the documentation. This approach is quite similar to recursive descent paring, although in our case there is no recursion involved (the language is so simple that it does not invite to recursive constructs). In the next section we will explain this approach.


5.5     The accept functions

The two top-level accept functions are accept-documentation-intro and accept-documentation-unit .

The function accept-documentation-intro accepts, in turn, the title, author, email, affiliation, and abstract by means of lower level accept functions. After successful acceptance and recognizion of these it returns the list of the constituents.

The function accept-documentation-unit generically accepts a section and a unit. This is also done by lower level, specialized accept functions. The similar structure of sections and units allows for a single function doing the job. The function accepts id, title, and body.

There are a number of lower level accept functions, as mentioned above. These realizes the kernel of the parsing in terms of collections and skippings. Let us look at one of these, accept-doc-id as a typical representative. The text which is accepted is one of the following:

  .SECTION id
  .ENTRY id 

The function accept-doc-id has an important precondition established by the context: It must be called just before the appearance of a keyword .ENTRY or .SECTION. The function collect-until collects the keyword by reading until white space is encountered. We check to see whether the collected text is either the string .ENTRY or .SECTION. If not we stop the processing via doc-check, which causes a fatal error and an error message. Next we skip white space via the funcion skip-while, and the id is collected via collect-until. Accept-doc-id returns the list reflecting the concrete syntax: (list unit id).

Most of the other accept-functions work in the same way as accept-doc-id. These are accept-doc-title, accept-doc-author, accept-doc-email, accept-doc-affiliation, and accept-doc-abstract. The function which accepts the bodies of sections and units is a little different, so we will explain it briefly.

accept-doc-body first eats white space, after which it accepts the body keyword. It finally calls accept-body-text, which in turn calls the iterative accept-body-text-1. It collects lines, again using collect-until until it meats the .END keyword at it's own line. The predicate end-unit? identifies this situation. accept-doc-body reverses the collected lines, and appends them with string-merge .

Notice that we call a function eat-eol-chars in accept-body-text-1. When called we have encountered an end of line. The end of line handling is tricky, because we want the program to run both on Unix and Windows. In windows lines are ended by CR (character 13) and LF (character 10). The eat-eol-chars read the LF and prepares for a "good start" on the next line (emptying the one character queue).


5.6     The collection functions

The central collection function is collect-until. From a given position in the input it collects a text string. The collection process stops when the predicate, which is passed as parameter, becomes true. The real work is done by the iterative collect-until-1 ; however, this procedure is straightforward.

We dont know the length of the collected text. We accumulate read characters in a variable, collection-buffer, but we cannot easily determine the length of this buffer. We could handle this by allocating longer and longer strings (or more and more strings), much as we do in the function read-text-file in the file-read (and write) library, which reads text from a file and return it as a string.

As an important observation, we collect line by line in the accepting functions. Therefore we can live with a fixed upper limit, defined by buffer-length .

Quite often we read a character, which we really did not want to read. This is a classical problem when handling input. We want to put the character back, such that the next reading will re-encounter the character. Some libraries support a put-back operation and a queue of putted back characters. We only have a "one character queue", next-doc-char. The function read-next-doc-char takes the character from next-doc-char if there is one, else it reads a character from the input port. Because this is the central place we read characters from the input we can also here handle an administration of line nubers. If we read a CR we increase the variable doc-line-number. By means of this we can give relatively good and precise error messages in the doc-check procedure.

Let us also here mention some generally useful predicates which we in convenient ways pass to collect-until and skip-while. These identify white space ( is-white-space? ), end of line ( end-of-line? ), and similar boundary conditions.


5.7     The skipping functions

The central skipping function is skip-while. It skips characters in the input port while a predicate p holds. The skipping function is similar to the collection function collect-until from the previous section. However, skipping is much easier because we dont use the read chacters at all. Notice that also this function calls read-next-doc-char .


5.8     Summary of parsing process

We have now seen how the textual documentation format is parsed and is being organized as the values of a number of global variables, most dominantly documentation-elements. In the next section - section 6 - we will see how the value of documentation-elements and other similar variables are used to present the documentation in a browser. As an integral part of the presentation process we also see how link clauses are expanded to anchor tags via the processing in a state machine.

6     Making the documentation page

In this section we describe the production of the documentation page given the variable documentation-elements (and others). Thus, the starting point is the parsed documentation page, as represented in the bunch a variables, of which the most important is documentation-elements. The parsing process was described in section 5 and summaried above in 5.8. The most serious challenge in this section is to convert curly brackets and brackets to program and documentation references.

6.1     The function documentation-contents
6.2     The function do-program-link-documentation!
6.3     The state machines which transform the documentation bodies
6.4     The functions which returns a link to a program unit or a documentation unit
6.5     Refined linking possibilities
6.6     Linking between documentation sections and entries.
6.7     Linking from source markers in the documentation.
6.8     Preparing the linking to the documentation source markers.
 


6.1     The function documentation-contents

The natural starting point is the procedure documentation-contents! which is called by end-documentation .

In the transition to Scheme Elucidator 2 we went from a functional handling of the documentation to an imperative handling. With functional handling, we aggregated (on textual basis) the entire documentation, and we wrote it to the documentation HTML file in end-documentation.

The rationale behind the imperative handling is the following. With the use of the AST-based XHTML mirror, many small HTML fragments are represented as ASTs. These can be, of course, be converted to text, but this either causes lots of garbage collection, or allocation of large chunks of strings. (In LAML, the latter solution is used). Neither solution is attractive. It is better to render these small HTML ASTs directly to an open output port. Rooted in the procedure documentation-contents! via the procedures present-documentation-section!, present-documentation-entry!, do-program-link-documentation!, and do-program-link-documentation-1! this is what we have done.

The function documentation-contents! produce to title, author info, abstract, and the sections of the real documentation. The documentation sections (including the so-called entires) are made in the for loop ( ). The most important is initiated by the function present-documentation-element! within the for loop. It dispatches to present-documentation-section! and present-documentation-entry!. The function present-documentation-section! presents the introductory section text in a color-frame together with a numbered title. Similary, the function present-documentation-entry! presents a documentation entry. Both of these functions call do-program-link-documentation! in which the real interesting work is done, namely conversion of the linking brackets to HTML anchor tags. This function is described in section 6.2 .

From a overall point of view, do-program-link-documentation! is geared towards a textual representation of the documentation. In the orginal Scheme elucidator (elucidator 1), the holds both for textally authored documentation and for documentation authored in Scheme. (The reason is that, in the orginal elucidator, we used mirror functions that produced text). Therefore, it makes sense for do-program-link-documentation! to traverse its first parameter as text, and do certain transformation on this text.

In Elucidator 2, we will still support the textual documentation format, but we will also support authoring with Scheme and LAML. Using the latter approach, the first parameter of do-program-link-documentation! is now a LAML AST. As an important decision, we do not want to use the textual linking syntax, such as {*...} and {+....} within LAML authored documentation. Rather, in Elucidator 2, we will use Scheme-level referencing forms, and we will have a special form for source marker references. This is a new development, compared for instance with the LAML tutorial, in which we have used LAML/Scheme authoring with textual references. The hybrid format of the original LAML tutorial is therefore not supported any more.

The appropriate location to make the cross road is in the activation of do-program-link-documentation! in present-documentation-entry! and present-documentation-section!. If the intro/body parameters of do-program-link-documentation! (the first parameter) is text, we proceed more or less as in the original elucidator (but imperatively), entering the state machine implemented by program-linking-transition. If the intro/body is an AST, we need to write a new contribution, which bypasses the state machine.

In both a documentation section and a documentation entry we show links to parent and sibling sections/entries. In section 6.6 we describe how this is done.


6.2     The function do-program-link-documentation!

The function do-program-link-documentation! takes the documentation body and the documentation id as parameters. The function translate the input body to an output body in which the bracket linking is transformed to real links. It calls a tail recursive (iterative) variant of the function called do-program-link-documentation-1, which takes nine parameters:

  1. The documentation id (doc-id)
  2. the documentation body (instr)
  3. a pointer into the instr (inptr)
  4. the length of instr (inlength)
  5. the current state of a state machine which se discuss below (current-state) and
  6. a collected word which is used inside the state machine.
  7. the open output port.

(Eliminated problem: the length of output string is a rough estimate which may not be sufficient (2 times the length of the input plus 500). The best we can do is to report on problems if the output exceeds this size. This is done via the first test in do-program-link-documentation-1!.)

The function do-program-link-documentation-1! works via a state machine which reads each character in the input (the body) and translates this to an appropriate output. The central function, which realizes the state machine is program-linking-transition. It is activated on a state, an input character, a collected word, and the documentation id. The collected word serves as a collector strings for the bracketed linking words; When we see the rear end of the linking word we can call the function linking-from-doc-to-prog or linking-from-doc-to-doc which insert the anchor tags. Defails about these follow in section 6.4. In the next section we will describe the state machine in some details.


6.3     The state machines which transform the documentation bodies

As mentioned above the function program-linking-transition implements a state machine. We have used the same kind of state machine in several other pieces of LAML software.

We have the following states:

The most interesting place in the function is the places where we have recognized a whole linking word. When we are dealing with program link words (typographic-prog-ref 'name) we call the function linking-from-doc-to-prog with the collected word and the documentation id as parameters. Similarly we call linking-from-doc-to-doc when we are dealing with documentation link words (d-link-words). These two functions are discussed in the next section.


6.4     The functions which returns a link to a program unit or a documentation unit

The function linking-from-doc-to-prog starts by a determination of the possible link-targets to the given word on the program side. We find all defining names that matches the first parameter (the link word). Next we see a conditional in which we distinguish between zero, one, and several targets. Zero or several targets results in a warning message, but no fatal errors. In case of zero matching program definitions we do not link at all. In case of several we link to the first one.

As an important side effect of this function we accumulate the linking words in the global variable documented-name-occurences. Besides this we return an a-tag-target string.

The function linking-from-doc-to-doc is simpler. The word is looked up in the variable documentation-key-numbering-alist. This is the variable which maps documentation ids to the assigned section numbers. The section number becomes the anchor text of the a-tag-target URL. In order to handle errors (again non-fatal) we test if the collected word is a known one in the association list documentation-key-numbering-alist. If not we issue a warning, and just return the word collected-word without any liking from it.


6.5     Refined linking possibilities

From preliminary experiences with the Elucidator we have learned that it is valuable to distinguish between strong and weak program references. The strong references explains the referenced program in some details. The weak program reference is just a kind of convenient, navigatable cross reference. We decide that strong cross references include an initial star character following the start curly bracket:

  {*refernce}

The necessary program modifications were the following: In the function linking-from-doc-to-prog we test for the initial star in the first parameter, word. This is done by the function strong-program-link?. This function defines the variable strong-link-char in order to be able to change the string link character to another character.

We distinguish weak and strong links by using different colors in the documentation frame. Strong links are red, and weak links are dark blue. We also need to extract the real linking word in case there is a leading star. This is done easily by the function linking-word-of-strong-link .

On the program side we want the left arrows to indicate whether they are involved in strong or weak program links. In order to do so we need to remember the strong/weak distinction of a given link. This has to affect the registrations done in the variable documented-name-occurences. We decide to add an extra symbol 'strong or 'weak to the association. Thus an association may now be (program-name doc-id strong/weak). Notice that it is still an association list, associating the program-name to a list of two elements.

The association list stored in documented-name-occurences is used in a variety of functions under the formal name docmented-names. The only place the information in the association list is in the function total-doc-navigator, and further on doc-link. (The function doc-navigator is outdated, and not used.) Here we introduce the distinction between weak and strong links by shown different left arrow icons for the two of them.


6.6     Linking between documentation sections and entries.

We chain the documentation sections and entries together with sibling and parent links (the small yellow arrows presented on the blue background, just above the section titles). We call these links for documentation link banners. We want all sections chained, and all subsections chained. Also we want up-links. Thus, the section tree structure is linked together in a natural way.

The documentation link banners are produced by the function section-navigation-banner, which takes the documentation elements of the section/entry as parameter. section-navigation-banner is called by present-documentation-entry ( ) and present-documentation-section ( ).

Internally in section-navigation-banner we use the function doc-section-url to produce URLs to sections and subsections (entries). This function just traverses the variable documentation-elements, and by means of filtering it finds the relevant documentation element (a section or entry). The predicate section-subsection? is useful (and it is similar to subsections? which we discuss in section 8.5 ).

An URL to section n.m is produced by (doc-section-url n m). Notice that n.0 denotes section n. In this respect, in the function section-navigation-banner, there are special cases ( and ) when we calculate the previous (blind) links from section 1 and section i.1.

Whereas the function section-navigation-banner produces URLs, the function section-navigation-banner-1 produces the graphical appearance of the documentation link banners. In that way there is a clear division of responsibility in between the two of them.


6.7     Linking from source markers in the documentation.

In this section we will described how we have implemented the linking from source markers in the documentation to the source mark in the program.

We have alread prepared for this in section 3.7, where we introduced anchor names of the program source marks.

From a design point of view we decide that a source marker is associated to the nearest strong relation (a red one). Only strong relations earlier than the source mark is taken into consideration.

In the function program-linking-transition we need to output an anchor tag instead of just the source marker ( ). This is done by the function source-mark-anchor. This function is modelled after linking-from-doc-to-prog, and it is really straightforward once linking-from-doc-to-prog is understood. source-mark-anchor depends on the global variable previous-strong-program-word, which is assigned by linking-from-doc-to-prog ( and ).

We use the following naming scheme for identification of source marker:

program.html#definition-@m

where definition is a name of a defintion (a function name, typcially) and m is a marker name.


6.8     Preparing the linking to the documentation source markers.

In this section we introduce anchor names of the source markers in the documentation, such that we can link from program source markers via a definition name and a documentation section to a documentation source marker. This is done in section 3.8 .

We also define a variable documentation-source-marker-occurences which relates program-ids, doc-ids and source mark characters. The variabel is assigned by the procedure source-mark-register (registration of a new entry in the list).

The function to care about in order to introduce anchor names of documentation source markers is program-linking-transition. At the same place ( ) as we introduced the linking to the program (see section 6.7 ) we now also insert an anchor name, via use of the a-name LAML tag. The anchor name is the following:

    docid-@x

where docId is the documentation id of the section/entry and x is the source mark character.

The programming challenge is here how to get access to the documentation id of the section/entry, in which we are located. We are lucky here! The function program-linking-transition carries this information as the last parameter.

In section 3.8 we will link to the anchor names (from program source markers to documentation source markers).

7     Extracting applied names.

In this section we will study the extraction of applied names from the parsed program files.

7.1     Overview
7.2     The function applied-names-multiple-sources
7.3     Extracting applied names from a single form.
 


7.1     Overview

The result of the efforts described in this section is the variable defined-applied-names. The variable is defined in the function end-documentation. This variable is a list of the form ((applied-name . defined-name)...). The meaning is that applied-name is used in the definition of the form (define (defined-name...) ...).

The function applied-names-multiple-sources initiates the extraction task. At the calling place in end-documentation we se that the list of parsed source forms is made by appending source-list-list-process (the list of parsed sources processed in this 'Elucidation') and the list of source forms read via read-source (corresponding to the non-processed source files).


7.2     The function applied-names-multiple-sources

The function applied-names-multiple-sources accumulates and sorts the contribution from the various source files, as represented in source-list-list-process .

The function applied-names, and in particular its helping function applied-names-1 extracts applied name pairs from a single source list, representing a single source file. The latter function accumulates the results in the last parameter, res. We see that applied name pairs are only collected from definitions, identified by the predicate is-define-form?. In reality applied-names-1 iterates through the definitions of source-list, skipping the remaining top level forms. Under the definition of this-contribution we see the construction of the pairs of applied and defined names. We also see that the function applied-names-one-form extracts the applied names from a single form. This function is explained in the next section.


7.3     Extracting applied names from a single form.

The function applied-names-one-form extracts applied names from a single form. This function returns a list of applied names. The function is heavily recursive; It traverses the parse tree.

For each symbol we encounter we return that symbol (i.e., a list consisting of the symbol) if the symbol is defined in the current documentation bundle. The function defining-in-batch? implements this condition.

Later in the conditional we process various special cases of lists (from most specialised to most general):

  (define (f p1 p2) ...)
  (define f ...)
  (lambda (p1 p2) ...)
  (let ((n1 v1) (n2 v2) ...) ...)

In all of these we want to skip defining name occurences (defined names, parameters, and bound let names). These are f, p2, p2, n1, and n2 above. The function let-vals returns the forms corresponding to v1 and v2 above.

The remaining cases are simple, exhaustive traversals and collection.

8     Making the indexes

The elucidator supports a number of indexes: An index of program definitions, an cross reference index of defined applied names, an index of multiple defined names, and a table of contents covering the documentation. In this section we will explain each of these.

8.1     The cross reference index
8.2     Alphabetically organized cross reference indexes
8.3     The duplicated name index
8.4     Making the table of contents
8.5     Local table of contents
 


8.1     The cross reference index

Via the cross reference index we are able to answer the question: In which definition is a given name applied. Thus given some name we can via this index get to all the definitions in which the name occur.

The function end-documentation writes a HTML page with the cross reference index (at ). Here the function present-cross-reference-index is called with the list defined-applied-names as parameter (an association list mapping names to all definitions in which they occur, sorted after the first element in the list). The creation of this list was addressed in section 7 .

Besides forming the real and actual list of applied/defined names (see below in this subsection) the function present-cross-reference-index makes the outer table of the cross reference. The function takes a list of pairs as parameter; each pair (a . d) represents an applied name a which is applied in the definition of d. The list is sorted by applied name (the car position). In this function, we first "sublist" the parameter list ( ) such that all entries belonging to the same applied name become a sublist; In other words all occurences of an applied name are grouped together in a sublist. Hereby the the list passed as parameter becomes one level deeper. Next ( ) we eliminate mulitiple applications of the same name in a single definition. This is done by (essentially) mapping remove-duplicates-by-predicate over all sublists of the list formed just above. The rest of the task in present-cross-reference-index is presentation of the result. The left colum shows an applied name; The right "fat column" presents all the definitions in which the name occur. All the entries (an applied name and all the definitions, in which it occurs) are produced by the function present-applied-sublist. The "fat colum" just mentioned is made in this function. Each entry in this inner table is made by present-defined-entry .

The names, which are defined but not applied in the current documentation bundle, do not occur in the list defined-applied-names. It would be quite informative to include these, e.g., in order to illustrate the a given definition is not used (at all, or at least in the current bundle). Therefore we merge the lists defining-name-occurences and defined-applied-names to a list of the same data format as defined-applied-names. This takes place in the function merge-defined-and-defined-applied-lists. The pair

   (name . #f)

is a legal entry in the list, meaning that the name is applied nowhere in the documentation bundle.

Symmetrically, the names which are applied but never defined, would be useful in the cross reference index. This may be an error. As of now, these do not appear. These names are probably not extracted at all. If we tried to do so, we would probably end up get far to many names. It could be complicated to hit all the symbols which are in evaluating position, and relating to global definitions.


8.2     Alphabetically organized cross reference indexes

The cross reference index described in section 8.1 becomes large for non-trivial elucidative programs. Large HTML files take relatively long time to bring up in a browser. And it takes long time for the user to scroll to a particular place in such an index. Therefore we now want to split the cross reference index into a number of smaller indexes, one for each letter in the alphabet.

The splitted cross reference index facility is controlled by a boolean variable alphabetic-cross-reference-index? .

As can be seen in end-documentation ( ) the generation of the split index is, in principle, straightforward. First we split the value of extended-defined-applied-names into alphabetical sections by means of the function split-defined-applied-names. Next we call the function make-cross-reference-index over the splitted list, thus generating a number of smaller index files. Finally, the function make-overall-cross-reference-index make the overall index, with links the individual small index files.

Now to some of the details, first split-defined-applied-names. It is easy to do the splitting via an application of the function sublist-by-predicate. We just need to make a predicate ( ) which identifies different front letters in two consequtive elements of the car position in defined-applied-names (called dan here, for brievity).

However, there is one problem: In case there are no names with a particular starting letter we get a smaller number of sublist than letters in the alphabet. We see two solutions:

  1. Either we insert empty lists at appropriate positions, or
  2. We produce a partial alphabetic overview

We go for solution number 2.

It is not hard to make the partial alphabet list; We just map function first-letter-of over an appropriate list formed by another mapping over the splitted list of defined applied names.

Next we map a procedure make-cross-reference-index over the splitted name list and over the partial alphabet. The first parameter passed to this function is a list of name pairs; Each pair (a . d) represents an applied named a defined in d; d may be false (#f) in case a is not applied in the current documentation bundle at all.

The procedure make-cross-reference-index produces a small index files (for names with a given initial letter). It uses the function present-cross-reference-index to present the cross reference index. Recall from section 8.1 that this function produces the table which presents the cross reference index. We also present an alphabet link array - by means of alphabetic-link-array-1 - allowing for easy navigation to other indexes from an arbitrary index.

Finally, we have to make the overall index, which just contains an alphabet navigator. We take the already existing library function alphabetic-link-array as the starting point. This function needs generaliation with respect to both the linking target, the alphabet, and more. This gives a variant, called alphabetic-link-array-1 .


8.3     The duplicated name index

The function duplicated-definitions produces an index of definitions which appear two or more times in the documentation bundle. In Lisp it is possible to redefine a name. In some situations a redefinition is intended; in others it is an error. The duplicated name index of the elucidator makes it possible, in an easy way, to find which names are defined more than once within the current documentation bundle. Using this index, the unintended double definitions can easily be eliminated.

The elucidator uses the names of definitions as identifications. This is a very simple decission (probably also too simple). In case of double definitions we cannot distinguish between two or more definitions. Needless to say, this cause problems. Therefore we also use the duplicated name index to remind the user of the Elucidator of the size of this problem (or "flaw", you could say).

Internally, the function duplicated-definitions sorts all defined names, and next we attempt to identify duplicates. Given that we have the sorted definitions f, g, g, h, i, and j (with g double defined) we identify duplicates by pairing the list

  (f g g h i j)

with the tail of the list

  (g g h i j)

giving

  ((f . g) (g . g) (g . h) (h . i) (i . j))

The element (g . g) represents a duplicate.

The function present-duplicated-definitions (called by end-documentation ) presents the duplicated definitions in a straightforward way.


8.4     Making the table of contents

The table of contents is generated by the function present-documentation-contents. We generate both a detailed and an overall table of contents. The present-documentation-contents function is called from end-documentation, inside a write-text-file clause. As a parameter we pass documentation-elements .

As can be seen in present-documentation-contents we present the table of contents in a two column list, made by the function two-column-list. The second parameter determines whether we show both sections and entries, or only sections (good for long documents).

The function which presents a single entry is present-documentation-content-element. We use the information on the first parameter element to access the kind (entry or section), the doc-id symbol, the section number, and the title. We return a string which represents a (possibly indented) anchor tag.


8.5     Local table of contents

From the overall table of contents we can navigate to a selected section. Under the section body we find a local table of contents of this section.

Recall that a documentation section is made by the function present-documentation-section. In this function we find the relevant subsections by filtering the documentation-elements list with the predicate (subsections? section-number). The function subsections? generates a predicate which return true on exactly the proper subsections of section section-number.

The local table of contents is made by the function the function present-documentation-subsection-element. This function is straightforward; It returns a string, in which the important substring is an anchor tag which is associated with a link to the appropriate subsection.

9     Constructing the HTML files.

In this section we will discuss the most interesting aspects of the HTML file construction, including the images on which the HTML depend.

9.1     Some HTML details.
9.2     The icons
9.3     The program file menu and coloring schemes
9.4     The Help page
 


9.1     Some HTML details.

The elucidator generates a lot of HTML files in the html sub directory of the directory, which contains the setup file and the textual documentation file, cf. section 2.3 .

Almost all the html files are generated in end-documentation. The responsibilities are divided in three parts:

  1. Writing the text file to particular directory: write-text-file and html-destination .
  2. Making the outer structure of the HTML page: page .
  3. Making the body of the page: To this end various presentation functions in the elucidator are called: such as documentation-contents , present-duplicated-definitions , present-defined-name-index , present-cross-reference-index , and present-documentation-contents .

As part of the elucidator we have made a function make-frame-file-in-html-dir which makes an HTML frame file. An accompanying function make-frame-file-in-source-dir makes HTML files in the source directory, such that the result of the Elucidator process can be activited from a file besides the setup and documentation files.

The frames as such (technically the frameset) are constructed by specialized functions such as elucidator-frame and elucidator-frame-horizontal. Here we use the functions from the html-v1 library for aggregation of the frame stuff. These are the function which are responsible for the overall layout of the Elucidator, as presented in a browser.


9.2     The icons

If you take a look at an elucidator running in an Internet browser you will se a number of icons. These icons reside in the images directory of the elucidator software directory.

If or when we introduce a new icon it must be saved in the images directory. In addition, it must be put into the list elucidator-image-files. As part of the processing in end-documentation the icons are copied from the software directory to the html/images subdirectory of directory, in which a concrete elucidator resides. This is done by means of copy-files from the general library.

This organization ensures that all relevant icons appear in any elucdiator instance. The icons will physically exist many places; But this is prices of self contained html directories, which easily can be copied and transported.

The icons appear on a number of difference WWW pages. The icons, and the links behind them, are produced by the function icon-bar .


9.3     The program file menu and coloring schemes

In most elucidators there will be a menu of program files from the documentation bundle in the top menu and index frame. The menu is produced by the function source-file-links, and called by function icon-bar .

It may be difficult to realize which program file is being presented in the program-frame to the right in an elucidator browser. Therefore we support a background color scheme of programs in the documentation bundle. In order to be general, also the documentation and index frames can have distinct colors. The colors of the frames are controlled by the variable elucidator-color-scheme, which is #f (use default-background-color ) or an association list a that maps group names (see section 2.5 ) to colors.

The function make-color-scheme is meant to be a high-level function via which to defined elucidator-color-scheme. The function returns a association list given the input which is a property list. make-color-scheme is called from the setup file. The function color-of-group maps color groups (as used in the program-source forms) to colors, as defined by the color scheme.

If there are many programs in the documentation bundle it will not work out nicely to have a horizontal table with all program files in the index frame. Therefore we have introduced yet another frame, the program-menu frame, which holds a menu (table) of all programs. The color scheme, discussed above, is used in that table too. The function source-file-links-for-program-menu produces the table. This function is very similar to source-file-links, which produces the horizontal table. (The main difference is the use of different table functions, table-1 and table-4 ).

The boolean variable separate-program-menu? controls whether to use the original horizontal table of program, or the menu frame. The function control-frame produces the control frame, if the boolean variable is false, or a column frameset consisting of the control-frame and the program-menu .


9.4     The Help page

The elucidator help page is made by the function make-elucidator-help-page. Quite naturally, this function is called from end-documentation .

The help page use LAML markup, as can be expected. The elucidator generates the help page every time it is executed.

10     Handling of bounded names

This section elaborates on the problem raised in section 3.3. Let us repeat the point here:

We have not yet subtracted locally defined names from let bindings. This causes unfortunately some mis-bindings of applied names in our WEB presentations of programs. It would probably be rather tedious to implement the subtractions of let-defined names.

As we will see below, it turns out that it is relatively easy to implement a solution.

10.1     Introduction to the problem
10.2     A solution
 


10.1     Introduction to the problem

In the early versions of the Scheme Elucidator locally bound names could interfer with the top-level bindings. Here is an example:

 (define a ...)
 
 (define c ...)

 (define (f a b)
   (let ((c d)
         (e f))
     (x a c)))

First, binding occurences should never be linked as applied names to the top-level definitions of a and c. In the early versions of the Elucidator the binding occurence of c in the let-construct is linked to the top-level definition of c. This is - of course - wrong.

Second, the names a and c in the form (x a c) should not be be linked to the top-level definition of a and c. Within the body of the let clause in f, a and c refers to the a parameter and the local binding of c, of course. The early versions of the Elucidator would make wrong links here on c. However, a is identified as a parameter, and no mis-linking is done on the name a in the early version of the elucidator.


10.2     A solution

The relevant existing functions dealing with these aspects of the Elucidator are elucidate-program-form and bounded-names.

In elucidate-program-form we subtract the locally bound names from defining names in the recursive calls of the function ( ). The form f is here a define form, on which the function bounded-names return the list of binding name occurences.

Our solution is now to weaken the precondition on bounded-names such that it works on any Scheme form. If we pass a name binding construct to bounded-names it will return the namebindings of the form. If we pass another construct to bounded-names it returns the empty list. The existing version of bounded-names is renamed to parameter-names, as suggested in section 4.2 .

Now, in elucidate-program-form, we subtract the bounded names from defining names more uniformly in the recursive calls. Concretely, this is now also done in the cases ( and ) on lists which are not define forms ( and ). Hereby we eliminate the names, which happen to be bound locally, from the names which are considered as source anchors of applied-defined name links.

11     Dealing with comments.

In this section we will improve the Scheme Elucidator's way of dealing with comments. We start with a section, in which we discuss the problem, and we direct the reader's attention to relevant places in the existing documentation and programs.

11.1     Problems and existing descriptions
11.2     Ideas to improved handling of comments
11.3     Solution
11.4     Extracting sectional names from comments.
11.5     Look-ahead through comments for a define form
11.6     Presenting syntactical comments.
11.7     Printing the anchor name
11.8     Pretty printing syntactical comments
 


11.1     Problems and existing descriptions

Comments are lexical elements in most programming languages, and therefore the comments are not represented at all in the data structures defined by a parsing process. This causes problems, because we want to represent important elucidator relevant information in the comments. We have already seen the source mark information, which necessarily has to be embedded into comments at the program source side. Besides source markers we need to represent the following information in program comments in the Scheme elucidator:

We have discussed the problems of comments and other special lexical problems in the sections 3.3 and 3.4 .

In the Scheme Elucidator program the parsing of the Scheme program is done by the function read-source, which is called from end-documentation. The handling of comments is done via the function skip-comment, which is called exclusively by skip-white-space, which in turn is called from elucidate-program-form .

We will now discuss our ideas for improved handling of comments in the Scheme eluciator.


11.2     Ideas to improved handling of comments

The key to an improvement is to handle comments as syntactical constructs. Fortunately, we have already implemented a procedure lexical-to-syntactical-comments! which converts lexical comments to syntactical comments. We have used this procedure to extract interface documentation from Scheme comments, in preparation for manual page production. Se the manual page of the SchemeDoc tool for further information.

Let us illustrate the idea via an example. First a small Scheme program:

  ;;;; This is just an example
  
  ;; The function f adds a constant
  ;; c to its first parameter
  (define (f a c)
    (+ a c)  ; The result
  )
  
  ;; This function just calls f
  (define (g a)
    (f a 5))

and here the transformation, in which the comments are lexical elements:

  (comment 4 "This is just an example")
  
  (comment 2 "The function f adds a constant 
  c to its first parameter ")
  
  (define (f a c)
    (+ a c)  (comment 1 "The result ")
  )
  
  (comment 2 "This function just calls f ")
  
  (define (g a)
    (f a 5))

(Actually, we use a slightly different designation for the comment, syntactical-comment-designator, in order not to risk a name conflict wrt. the name 'comment')

The transformation illustrated above has been carried out by the function lexical-to-syntactical-comments ! mentioned above. We see that the number of semicolons are represented as an integer argument of the syntactical comment form. We also see that consequtive comment lines are folded into a single syntactic comment construct.

Now the overall idea is to pre-process all Scheme source files by means of the function lexical-to-syntactical-comments. This will affect the program presentation, as realized by elucidate-program-form, which therefore needs modification.


11.3     Solution

We first pre-process the Scheme source files. This is done by the function pre-process-comments-in-files! called by end-documentation. In turn, this function calls pre-process-comments!, which calls the function lexical-to-syntactical-comments! of the SchemeDoc tool. This causes definition of comment transformed source file in the doc/internal directory of those source file, which we process. Notice that we now load the SchemeDoc tool from the Scheme Elucidator just after library loading.

Now, the function read-source, as called by end-documentation, should read the internal comment-transformed source file instead of the original source files. This is arranged for in the function source-file-determinator, which takes a program source file descriptor (with key, file-location, and language components). The variable comment-handling determines whether to invoke and use syntactical comments.


11.4     Extracting sectional names from comments.

We here present the design of sectional names in comments. Recall from section 11.1 that sectional comments was one of reason for the extensions discussed here. A sectional comment is of the following form:

  ; ::section-name:: Remaining comment

which - according to section 11.2 - is transformed to

  (comment 1 "::section-name:: Remaining comment")

There may be one or more semicolons (typically, there will be threee, according to our SchemeDoc conventions). However, the section name must be the first thing in the comment (following possible white space). Thus, we only look for section names in comments in the prefix of the comment string. This is a natural decission seen from a design perspective, and this allows us to make a reasonable efficient predicate to determine whether a comment holds a section name.

At the overall level, we now want to extract section names from comments, and add these as contributions to defining-name-occurences .

We start with the the predicate syntactical-comment? which recognizes a syntactical comment. This function is straightforward. Next, we need a predicate which identifies a section name comment: section-name-comment?. The input is the string, such as

  "::section-name:: Remaining comment"

The function first skips white space in the string, whereafter it looks for a name in double colons. We could use a function such as substring-index, but we want to be reasonable efficient wrt. determination of the existence of a section name in a comment. The programming of this function involves skip-chars-in-string and looking-at-substring? (programmed for this particular purpose) which we put in the general-lib because of their general applicability.

We use syntactical-comment? and section-name-comment? in the function defined-names-1, which has been revised to extract the section name from 'sectional comment'. The section name is considered as exactly the same as a defining name in a Scheme define form.


11.5     Look-ahead through comments for a define form

Before we discuss how to present and render the syntactical comment we want to prepare the context around elucidate-program-form and elucidate-program-source-1 such that we otain the possibility to look one form ahead. This is useful in order to insert anchor names before an interface comment of a definition, cf. the first item in section 11.1. We therefore want to point out the formal parameter nf (means next form) in elucidate-program-form. This parameter is either the next form after f (the previous formal parameter relative to nf ) or #f if no such form exists.


11.6     Presenting syntactical comments.

We need to identify syntactical comment forms in elucidate-program-form, and to present them properly. Now that comments are syntactical we need to present these instead of the processing done in skip-comment and in particular in skip-comment-1. We keep, and still call skip-comment via skip-white-space from elucidate-program-source-1. If all comments are syntactical we will never meet lexical comment when we "eat" white space, so there is no reason to remove this handling from the elucidator.

We are now ready to present the syntactal comment in elucidate-program-form. We introduce the conditional clause which captures syntactical comments ( ) before any lists (define, non-define, and pairs) are catched. The idea is to skip the comment (real skipping with outputting anything on op), and then to pretty print the syntactical comment string with respect to source markers and section names (and more perhaps).

Here we will address the skipping. The pretty printing will be discussed in section 11.8. The procedure match-syntactical-comment-without-output is specialized, and simple. It reads through all characters of the syntactical comment on the input port ip without outputting anyting on the output port op. The procedure depends on the exact form of a syntactical comment, as explained in section 11.2. Just after calling match-syntactical-comment-without-output in elucidate-program-form ( ) we read a single char. As explained in the program comment, this eats the empty line (the newline) character after the syntactical comment. Now it is time to explain the pretty printing of the comment string of the syntactical comment.


11.7     Printing the anchor name

Here we will describe how to insert the anchor name tag of a defintion before the comment. Recall that this placement will make it possible to see the interface comment of a Scheme define form, without needing to scroll backwards from the definition.

The key to this is the next form (nf) parameter of elucidate-program-form, as touched on in section 11.6. In the condititonal case ( ) we test whether the next form, nf is a define form. If it is, we write the anchor name on the output port, and we set flag last-define-a-name imperatively, which remembers the fact that we have already written the anchor tag of the next defintion. The value of the flag is the name of the define form. The flag is initialized in elucidate-program-source-1 .

In the section of elucidate-program-form we now only write the anchor tag if it has not already been done together with the previous (interface) comment. Thus, the writing of the anchor tag is conditional, by the expression (not (eq? last-define-a-name (defined-name f))). After using the value of last-define-a-name we reset it to its initial value, #f.

In case we use lexcical comments this will also work, because in that case last-define-a-name will always be false.


11.8     Pretty printing syntactical comments

The task is here to format the syntactical comment, such as to present source markers and sectional comments in a nice way. We have the source text of the comment in a string, which stems from the third parameter of a comment form. This information can be extracted by the selector comment-string-of-syntactical-comment .

It is worth noticing, that we have already done a similar processing in the function skip-comment-1, but here we read from the input port and wrote to the output port. As mentioned above, we now have the comment in a string. This calls for a functional solution using a state machine, similar to the state machine from section do-program-link-documentation! and 6.3 .

Let us look at the desired transformation. The input might be

 ::sect-name:: This is a comment with a @a source mark. 
It is a two line comment.

We need to identify the double colon syntax of section names, and the source marker. Source markers only take effect if there is a white space character after them, cf. section 3.6. A newline should be followed by lexical comment characters (one or more semicolons).

The top-level function for the current comment processing is render-syntactical-comment. It calls the function do-render-syntactical-comment, which is quite similar to do-program-link-documentation!. (We could consider to abstract over the functions which realize the state machine, especially because they have been used five or more times in various places in the LAML software. However, the experience seems to bee that they are all sligtly different when it comes to the fine details. So I make the state machine by hand, also this time).

The functiono do-render-syntactical-comment in turn calls the function syntactical-comment-transition, which implements the central state machine of the comment rendering.

There will be the following states in the state machine of syntactical-comment-transition :

We could and should have additional notes here about the state machine. As of now, this information is simply not here. It is tricky to get all details of such a relative complicated state machine right. I used most of Sunday to battle the details. As of now, it works reasonably, but I am only 95% sure that it is indeed correct and appropriate. Time will show.

12     Addressing definitions in specific source files.

In this section we will described a modification which allows for addressing of definitions and sections in specific source files.

12.1     Background
12.2     The solution to the problem
 


12.1     Background

In the original version of the Elucidator we were able to refer to program entities from the documentation via the syntax

{*reference}

The star character could be '+', '-', or empty as well. (An empty modifier defaults to '+').

We would like to be able to determine in which file to address reference. We will use the syntax:

{*source-key$reference}

The source-key acts as a source key qualification. The special syntax (in which the reference part is empty, and in which the *, +, - may be omitted)

{*source-key$}

generates a reference to the program as such.

The original and simple reference should still be legal. In cases where there is only one file containing af definition of reference everything works out as usual. In case two or more files contain reference we are now able to refer to a specific instance, in a specific source file.


12.2     The solution to the problem

The function affected by this idea is linking-from-doc-to-prog. As we can see, this function distinguishes between a number of cases: zero targets, exactly one target, and several targets. The latter case needs to be refined to two cases:

This is done now.

The function qualified-program-link? returns the source key qualification (a string) if it is applied on a qualified "word" which happens to be a source key. However, it only returns a source key if the candidate is member of the source-key-list Recall that all the source keys of the documentation bundle are found in the list source-key-list If the word is not qualifiy it returns #f (false).

The function proper-linking-word returns the proper linking word. This is the string "reference" in the example of section 12.1. The result is witout modifier (such as +, -, *) and with the qualification.

The new case in linking-from-doc-to-prog is relevant if there is more than one possible linking target, and if the reference is qualified ( ). A warning is issued an illegal qualification is given. In that case we link to 'the first' source key.

13     Ideas to future work on the Elucdiator tool.

In this section we will describe and enumerate the ideas to future work on the Elucidator.

13.1     The ideas
 


13.1     The ideas
  1. Navigation from documentation to program

    It is irritating that the (doc)comment just prior to the program definition cannot be seen. The reason is the way navigation is implemented in a browser: The destination is 'highlighted' by means of scrolling. Either we should place the anchor name before the comment, or we should arrange that the browser scrolls a little back (perhaps via Javascript means). We could also move the comment inside the definition, but this would be a major work in relation to our current commenting practice.

  2. Differential documentation.

    Minor program modifications may call for major modifications of the program documentation, in order for the documentation to stay at a high level of quality. More specifically, we may need to read through major parts of the documentation in order to ensure that the program modifications are properly reflected at all relevant places in the documentation. This is not always realistic. As a consequence we propose to introduce particular sections in the documentation which document the modifications. We could talk about a differential documentation approach. In that way, the latest documentation is the "base documentation" plus the documentation of the modifications.

    The question is now:

    • How can we support differential documentation in the browser. Index of modifications. Special marking of modifications.
    • In the editor
    • And how can we support an editing phase in which the documentation of the modifications are worked into the base documentation, creating a new base documentation. Calculation of which sections are affected. Maybe a special browser which relates differences to the affected sections in the 'main documentation'.

    This part of our work is related to version control of the documentation bundle.

  3. Loose ends

    When we implement and document a program there is often some "loose ends", which need additional attention later in the development process. Rather than polishing the documentation to completion by ignoring these loose ends, we are better off if we represent and document them. In that way the documentation can help to us to understand and get an overview of the loose ends in the program.

    Support

    • Special marking of loose ends in the documentation. Maybe sections.
    • Index of loose ends.
  4. to interface documentation

    Internal documentation and interface documentation are two different kinds of documentation. However, they are not unrelated. If both kinds of documentation exists for programs in a documentation bundle, we could consider how to make good use of the interface documentation from the internal documentation.

  5. Overlapping name spaces

    If several programs are documented together, and if there are overlapping name spaces in these programs, it would be necessary to provide for source key qualified program name references, such as

        {source-key:program-name}
  6. Rererences to non-define top-level forms

    If we document imperative programs with many non-defined top-level forms we need ways to refer to such top level forms. We can use source markers, but as of now these are associated with the previous define form. It would be relatively easy at the program side to associate them with the right top level form. At the documentation side we need means to refer to such places. A special {*TOP} relation could be invented. This needs to clarified before implemented.

  7. Linking defined names to the cross reference index

    Defining names in the program are not linked to anything. It would be useful to link it to an entry in (the alphabetical) cross reference index. In that way it would be easy to follow potential call chains in the program. Implementation of this is now done.

  8. Elucidative programming as a diary.

    This is another structure of the documentation than the traditional literate one.

14     Problems and Errors in the Scheme Elucidator

Here we enumerate the problems we have identified with the Scheme eludicator:

14.1     Problems and errors
 


14.1     Problems and errors
  1. Names with question marks ('?') and exclamation marks ('!') cause problems when represented as symbols in Emacs. We might consider to change the elisp reader (if possible). A more promising approach is probably to avoid symbols in list of defined names. Use string instead.

    Solution: We make sure that store-defined-names stores the defined name as a string. The reverse function, restore-defined-names, reverses the string to a symbol. As such, the elucidator (the Scheme software) is not affected by the change. The internal files with defined names have been changed, however.

  2. Within backquotes, there should only be linking to defined names from unquote contexts. As of now, we may also link from the constant parts of the form. This is an error. The problem is solved within (forward) quotes, simply by passing the empty list of defined names.