Copyright © 2011 , Kurt Nørmark |
It is assumed that the variable ip references an open input port, or a string. The assignment of ip must be done exernally to this library, and after the library is loaded. If ip references an open input stream (made by the Scheme function open-input-file, for instance) input is read from that port. If ip is string, the variable pstring-ip-pointer is used as a pointer into the string.
The functions in section one are generic, low-level reading functions that either read from the open input port, or from a string (at the location determined by the variable pstring-ip-pointer). The functions in section two are the basic reading and peeking functions. In section three a number of convenient collection and skipping functions are provided for.
This library has been developed as part of an SGML/XML Document Type Definition (DTD) parser, but it is useful in many other parsing situations. There exists som early internal documentation of the DTD parser (on the www.cs.auc.dk site) and as such also of some aspects of the functions in this library.
In earlier versions of LAML, this library was called 'the text collection and skipping library'.
advance-look-ahead | (advance-look-ahead n) | Provided that there is at least n characters in the buffer, advance next-read with n positions. |
char-predicate | (char-predicate ch) | Return a predicate functions which matches the character ch. |
collect-balanced-until | (collect-balanced-until char-pred-1 char-pred-2) | This collection procedure returns a balanced collection given two char predicates. |
collect-until | (collect-until p) | Read and collect a string from the input, controlled by a predicate. |
collect-until-string | (collect-until-string str . inclusive) | Collect characters until str is encountered. |
end-of-line? | (end-of-line? ch) | Is ch an end of line charcter? |
ensure-look-ahead | (ensure-look-ahead n) | Make sure that there is at least n characters in the buffer. |
eof? | (eof? ch) | Is ch an end of file character? |
generic-eof-object? | (generic-eof-object? x) | Is x the designated end-of-file value relative to the implicitly given input port ip. |
generic-read-char | (generic-read-char ip) | Reads a single character from ip, and advances the input pointer. |
is-white-space? | (is-white-space? ch) | Is ch a white space character? |
look-ahead-char | (look-ahead-char) | Return the first character from the "read end" of the buffer. |
look-ahead-prefix | (look-ahead-prefix lgt) | Return a string of length lgt from the "read end" of the buffer. |
match-look-ahead? | (match-look-ahead? str) | Return whether the buffer matches the string str. |
max-look-ahead | max-look-ahead | The length of the cyclic look ahead buffer. |
max-look-ahead-prefix | (max-look-ahead-prefix) | Return the entire look ahead queue as a string. |
peek-a-char | (peek-a-char) | Peek a character from the input port, but queues it for subsequent reading at "the peek end". |
peek-chars | (peek-chars n) | Peeks n charcters, by n calls of peek-a-char. |
put-back-a-char-read-end | (put-back-a-char-read-end ch) | Put ch into the "read end" buffer (where read-a-char operates). |
put-back-a-char-write-end | (put-back-a-char-write-end ch) | Put ch into the "peek end" of buffer (where peek-a-char operates). |
put-back-a-string | (put-back-a-string str which-end) | Put str back into the buffer. |
read-a-char | (read-a-char) | Read from the the look ahead buffer. |
read-a-string | (read-a-string n) | Read and return a string of length n by means of repeated activations of read-a-char. |
reset-look-ahead-buffer | (reset-look-ahead-buffer) | Reset the look ahead buffer. |
skip-string | (skip-string str if-not-message) | Assume that str is just in front of us. |
skip-until-string | (skip-until-string str . inclusive) | Skip characters until str is encountered. |
skip-while | (skip-while p) | Skip characters while p holds. |
1 Low-level, generic input functions. | |||
The functions in this section reads from either an input port, or from a string. | |||
generic-read-char | |||
Form | (generic-read-char ip) | ||
Description | Reads a single character from ip, and advances the input pointer. | ||
See also | Scheme source file | generic-read-char | |
generic-eof-object? | |||
Form | (generic-eof-object? x) | ||
Description | Is x the designated end-of-file value relative to the implicitly given input port ip. | ||
See also | Scheme source file | generic-eof-object? | |
2 Look ahead buffer and queue. | |||
The functions in this section manipulates a look ahead queue, which is in between the input port ip and the applications. Via this buffer it is possible to implement look ahead in the input port. Imagine an input buffer of (actual) size n: c1 c2 c3 ... cnWhen characters are read from the input stream, they enter to the right (the peek end). When characters are read by an application they are taken from the left (the read end). Thus, cn is the last character read from the input port (or from the input string); This is done by peek-a-char. c1 is the next char to leave the buffer, and to be read by the client application; This will be done by read-a-char. A few words about terminology in relation to R4RS or R5RS. The Scheme procedure read-char corresponds roughtly to read-a-char; The former always reads a character from an input port; The latter reads from an input stream via the buffer; Only if the buffer is empty, a character is read from the port or string. The proper Scheme function peek-char returns the next char from the input port, without updating the 'input pointer'. The function peek-a-char of this library is different, because it reads a character from the file and puts it into the buffer. This use of terminology is unfortunate, and it may be confusing for some readers. | |||
max-look-ahead | |||
Form | max-look-ahead | ||
Description | The length of the cyclic look ahead buffer. Predefined to 20000 characters. A constant. | ||
See also | Scheme source file | max-look-ahead | |
reset-look-ahead-buffer | |||
Form | (reset-look-ahead-buffer) | ||
Description | Reset the look ahead buffer. You should always call this function after you have re-assigned ip to a new input stream. | ||
See also | Scheme source file | reset-look-ahead-buffer | |
peek-a-char | |||
Form | (peek-a-char) | ||
Description | Peek a character from the input port, but queues it for subsequent reading at "the peek end". This function always reads one character via generic-read-char, and puts in into the "peek end" of the buffer. | ||
See also | Scheme source file | peek-a-char | |
peek-chars | |||
Form | (peek-chars n) | ||
Description | Peeks n charcters, by n calls of peek-a-char. In other words, the buffer is extended with n characters read from the input stream. | ||
See also | Scheme source file | peek-chars | |
relies on | peek-a-char | ||
read-a-char | |||
Form | (read-a-char) | ||
Description | Read from the the look ahead buffer. Only if this buffer is empty, read from the port. Reads from "the read end" of the queue. In case the buffer is non-empty, this procedure takes a character out of the buffer in the "read end". In any case, it advances the implicit input pointer of the input stream. | ||
See also | Scheme source file | read-a-char | |
read-a-string | |||
Form | (read-a-string n) | ||
Description | Read and return a string of length n by means of repeated activations of read-a-char. Takes eof into account such that a string shorter than n can be returned. | ||
See also | Scheme source file | read-a-string | |
look-ahead-prefix | |||
Form | (look-ahead-prefix lgt) | ||
Description | Return a string of length lgt from the "read end" of the buffer. A proper function. | ||
Precondition | lgt cannot be larger than the number of characters in the buffer. | ||
See also | Scheme source file | look-ahead-prefix | |
max-look-ahead-prefix | |||
Form | (max-look-ahead-prefix) | ||
Description | Return the entire look ahead queue as a string. A proper function. | ||
See also | Scheme source file | max-look-ahead-prefix | |
look-ahead-char | |||
Form | (look-ahead-char) | ||
Description | Return the first character from the "read end" of the buffer. A proper function. | ||
Precondition | The buffer is not empty. | ||
See also | Scheme source file | look-ahead-char | |
match-look-ahead? | |||
Form | (match-look-ahead? str) | ||
Description | Return whether the buffer matches the string str. Matching is done by the function equal? A proper function. | ||
See also | Scheme source file | match-look-ahead? | |
ensure-look-ahead | |||
Form | (ensure-look-ahead n) | ||
Description | Make sure that there is at least n characters in the buffer. If there are less than n characters, ented a sufficient number of characters with peek-chars. | ||
See also | Scheme source file | ensure-look-ahead | |
relies on | peek-chars | ||
put-back-a-char-write-end | |||
Form | (put-back-a-char-write-end ch) | ||
Description | Put ch into the "peek end" of buffer (where peek-a-char operates). | ||
See also | Scheme source file | put-back-a-char-write-end | |
put-back-a-char-read-end | |||
Form | (put-back-a-char-read-end ch) | ||
Description | Put ch into the "read end" buffer (where read-a-char operates). | ||
See also | Scheme source file | put-back-a-char-read-end | |
put-back-a-string | |||
Form | (put-back-a-string str which-end) | ||
Description | Put str back into the buffer. The second parameter which-end controls whether to put back in read end or write end (equivalent to peek end). Possible values of which end are the symbols read-end or write-end. | ||
See also | Scheme source file | put-back-a-string | |
advance-look-ahead | |||
Form | (advance-look-ahead n) | ||
Description | Provided that there is at least n characters in the buffer, advance next-read with n positions. Hereby n queued characters are skipped from the buffer at the "read end". | ||
See also | Scheme source file | advance-look-ahead | |
3 Collection and skipping functions. | |||
This section contains a number of higher level collection and skipping functions. These functions use the funtions from the previous section. The functions in this section are the most important of this library. | |||
collect-until | |||
Form | (collect-until p) | ||
Description | Read and collect a string from the input, controlled by a predicate. The collection stops when the predicate p holds on the character read. The last read character (the first character on which p holds) is left as the oldest character in the queue. | ||
See also | Scheme source file | collect-until | |
collect-balanced-until | |||
Form | (collect-balanced-until char-pred-1 char-pred-2) | ||
Description | This collection procedure returns a balanced collection given two char predicates. Return the string collected from the input port ip. The collection stops when the predicate char-pred-2 holds on the character read. However, if char-pred-1 becomes true it has to be matched by char-pred-2 without causing a termination of the collection. The last read character (the first character on which char-pred-2 holds) is processed by this function. As a precondition assume that if char-pred-1 holds then char-pred-2 does not hold, and vice versa. | ||
See also | Scheme source file | collect-balanced-until | |
skip-while | |||
Form | (skip-while p) | ||
Description | Skip characters while p holds. The first character on which p fails is left as the oldest character in the queue. The predicate does not hold if end of file. | ||
See also | Scheme source file | skip-while | |
skip-string | |||
Form | (skip-string str if-not-message) | ||
Description | Assume that str is just in front of us. Skip through it. If str is not in front of us, a fatal error occurs with if-not-message as error message. | ||
See also | Scheme source file | skip-string | |
skip-until-string | |||
Form | (skip-until-string str . inclusive) | ||
Description | Skip characters until str is encountered. If inclusive, also skip str. It is assumed as a precondition that the length of str is at least one. | ||
See also | Scheme source file | skip-until-string | |
collect-until-string | |||
Form | (collect-until-string str . inclusive) | ||
Description | Collect characters until str is encountered. If inclusive, also collect str. It is assumed as a precondition that the length of str is at least one. | ||
See also | Scheme source file | collect-until-string | |
4 Useful predicates for skipping and collecting. | |||
is-white-space? | |||
Form | (is-white-space? ch) | ||
Description | Is ch a white space character? | ||
See also | Scheme source file | is-white-space? | |
end-of-line? | |||
Form | (end-of-line? ch) | ||
Description | Is ch an end of line charcter? | ||
See also | Scheme source file | end-of-line? | |
eof? | |||
Form | (eof? ch) | ||
Description | Is ch an end of file character? | ||
See also | Scheme source file | eof? | |
char-predicate | |||
Form | (char-predicate ch) | ||
Description | Return a predicate functions which matches the character ch. A higher order function. | ||
See also | Scheme source file | char-predicate | |