Generated: Monday, November 14, 2011, 09:18:08 Copyright © 2011 , Kurt Nørmark The local LAML software home page

Reference Manual of Buffered Input Streams

Kurt Nørmark © normark@cs.aau.dk Department of Computer Science, Aalborg University, Denmark.

LAML Source file: lib/collect-skip.scm

This library supports buffered input stream. A buffered input stream is convenient for various kinds of parsing tasks.

It is assumed that the variable ip references an open input port, or a string. The assignment of ip must be done exernally to this library, and after the library is loaded. If ip references an open input stream (made by the Scheme function open-input-file, for instance) input is read from that port. If ip is string, the variable pstring-ip-pointer is used as a pointer into the string.

The functions in section one are generic, low-level reading functions that either read from the open input port, or from a string (at the location determined by the variable pstring-ip-pointer). The functions in section two are the basic reading and peeking functions. In section three a number of convenient collection and skipping functions are provided for.

This library has been developed as part of an SGML/XML Document Type Definition (DTD) parser, but it is useful in many other parsing situations. There exists som early internal documentation of the DTD parser (on the www.cs.auc.dk site) and as such also of some aspects of the functions in this library.

In earlier versions of LAML, this library was called 'the text collection and skipping library'.

Table of Contents:
1. Low-level, generic input functions. 3. Collection and skipping functions.
2. Look ahead buffer and queue. 4. Useful predicates for skipping and collecting.

Alphabetic index:
advance-look-ahead (advance-look-ahead n) Provided that there is at least n characters in the buffer, advance next-read with n positions.
char-predicate (char-predicate ch) Return a predicate functions which matches the character ch.
collect-balanced-until (collect-balanced-until char-pred-1 char-pred-2) This collection procedure returns a balanced collection given two char predicates.
collect-until (collect-until p) Read and collect a string from the input, controlled by a predicate.
collect-until-string (collect-until-string str . inclusive) Collect characters until str is encountered.
end-of-line? (end-of-line? ch) Is ch an end of line charcter?
ensure-look-ahead (ensure-look-ahead n) Make sure that there is at least n characters in the buffer.
eof? (eof? ch) Is ch an end of file character?
generic-eof-object? (generic-eof-object? x) Is x the designated end-of-file value relative to the implicitly given input port ip.
generic-read-char (generic-read-char ip) Reads a single character from ip, and advances the input pointer.
is-white-space? (is-white-space? ch) Is ch a white space character?
look-ahead-char (look-ahead-char) Return the first character from the "read end" of the buffer.
look-ahead-prefix (look-ahead-prefix lgt) Return a string of length lgt from the "read end" of the buffer.
match-look-ahead? (match-look-ahead? str) Return whether the buffer matches the string str.
max-look-ahead max-look-ahead The length of the cyclic look ahead buffer.
max-look-ahead-prefix (max-look-ahead-prefix) Return the entire look ahead queue as a string.
peek-a-char (peek-a-char) Peek a character from the input port, but queues it for subsequent reading at "the peek end".
peek-chars (peek-chars n) Peeks n charcters, by n calls of peek-a-char.
put-back-a-char-read-end (put-back-a-char-read-end ch) Put ch into the "read end" buffer (where read-a-char operates).
put-back-a-char-write-end (put-back-a-char-write-end ch) Put ch into the "peek end" of buffer (where peek-a-char operates).
put-back-a-string (put-back-a-string str which-end) Put str back into the buffer.
read-a-char (read-a-char) Read from the the look ahead buffer.
read-a-string (read-a-string n) Read and return a string of length n by means of repeated activations of read-a-char.
reset-look-ahead-buffer (reset-look-ahead-buffer) Reset the look ahead buffer.
skip-string (skip-string str if-not-message) Assume that str is just in front of us.
skip-until-string (skip-until-string str . inclusive) Skip characters until str is encountered.
skip-while (skip-while p) Skip characters while p holds.


1 Low-level, generic input functions.
The functions in this section reads from either an input port, or from a string.

generic-read-char
Form (generic-read-char ip)
Description Reads a single character from ip, and advances the input pointer.
See also Scheme source file generic-read-char

generic-eof-object?
Form (generic-eof-object? x)
Description Is x the designated end-of-file value relative to the implicitly given input port ip.
See also Scheme source file generic-eof-object?


2 Look ahead buffer and queue.
The functions in this section manipulates a look ahead queue, which is in between the input port ip and the applications. Via this buffer it is possible to implement look ahead in the input port.

Imagine an input buffer of (actual) size n:

    c1 c2 c3 ... cn 
When characters are read from the input stream, they enter to the right (the peek end). When characters are read by an application they are taken from the left (the read end). Thus, cn is the last character read from the input port (or from the input string); This is done by peek-a-char. c1 is the next char to leave the buffer, and to be read by the client application; This will be done by read-a-char.

A few words about terminology in relation to R4RS or R5RS. The Scheme procedure read-char corresponds roughtly to read-a-char; The former always reads a character from an input port; The latter reads from an input stream via the buffer; Only if the buffer is empty, a character is read from the port or string. The proper Scheme function peek-char returns the next char from the input port, without updating the 'input pointer'. The function peek-a-char of this library is different, because it reads a character from the file and puts it into the buffer. This use of terminology is unfortunate, and it may be confusing for some readers.


max-look-ahead
Form max-look-ahead
Description The length of the cyclic look ahead buffer. Predefined to 20000 characters. A constant.
See also Scheme source file max-look-ahead

reset-look-ahead-buffer
Form (reset-look-ahead-buffer)
Description Reset the look ahead buffer. You should always call this function after you have re-assigned ip to a new input stream.
See also Scheme source file reset-look-ahead-buffer

peek-a-char
Form (peek-a-char)
Description Peek a character from the input port, but queues it for subsequent reading at "the peek end". This function always reads one character via generic-read-char, and puts in into the "peek end" of the buffer.
See also Scheme source file peek-a-char

peek-chars
Form (peek-chars n)
Description Peeks n charcters, by n calls of peek-a-char. In other words, the buffer is extended with n characters read from the input stream.
See also Scheme source file peek-chars
relies on peek-a-char

read-a-char
Form (read-a-char)
Description Read from the the look ahead buffer. Only if this buffer is empty, read from the port. Reads from "the read end" of the queue. In case the buffer is non-empty, this procedure takes a character out of the buffer in the "read end". In any case, it advances the implicit input pointer of the input stream.
See also Scheme source file read-a-char

read-a-string
Form (read-a-string n)
Description Read and return a string of length n by means of repeated activations of read-a-char. Takes eof into account such that a string shorter than n can be returned.
See also Scheme source file read-a-string

look-ahead-prefix
Form (look-ahead-prefix lgt)
Description Return a string of length lgt from the "read end" of the buffer. A proper function.
Precondition lgt cannot be larger than the number of characters in the buffer.
See also Scheme source file look-ahead-prefix

max-look-ahead-prefix
Form (max-look-ahead-prefix)
Description Return the entire look ahead queue as a string. A proper function.
See also Scheme source file max-look-ahead-prefix

look-ahead-char
Form (look-ahead-char)
Description Return the first character from the "read end" of the buffer. A proper function.
Precondition The buffer is not empty.
See also Scheme source file look-ahead-char

match-look-ahead?
Form (match-look-ahead? str)
Description Return whether the buffer matches the string str. Matching is done by the function equal? A proper function.
See also Scheme source file match-look-ahead?

ensure-look-ahead
Form (ensure-look-ahead n)
Description Make sure that there is at least n characters in the buffer. If there are less than n characters, ented a sufficient number of characters with peek-chars.
See also Scheme source file ensure-look-ahead
relies on peek-chars

put-back-a-char-write-end
Form (put-back-a-char-write-end ch)
Description Put ch into the "peek end" of buffer (where peek-a-char operates).
See also Scheme source file put-back-a-char-write-end

put-back-a-char-read-end
Form (put-back-a-char-read-end ch)
Description Put ch into the "read end" buffer (where read-a-char operates).
See also Scheme source file put-back-a-char-read-end

put-back-a-string
Form (put-back-a-string str which-end)
Description Put str back into the buffer. The second parameter which-end controls whether to put back in read end or write end (equivalent to peek end). Possible values of which end are the symbols read-end or write-end.
See also Scheme source file put-back-a-string

advance-look-ahead
Form (advance-look-ahead n)
Description Provided that there is at least n characters in the buffer, advance next-read with n positions. Hereby n queued characters are skipped from the buffer at the "read end".
See also Scheme source file advance-look-ahead


3 Collection and skipping functions.
This section contains a number of higher level collection and skipping functions. These functions use the funtions from the previous section. The functions in this section are the most important of this library.

collect-until
Form (collect-until p)
Description Read and collect a string from the input, controlled by a predicate. The collection stops when the predicate p holds on the character read. The last read character (the first character on which p holds) is left as the oldest character in the queue.
See also Scheme source file collect-until

collect-balanced-until
Form (collect-balanced-until char-pred-1 char-pred-2)
Description This collection procedure returns a balanced collection given two char predicates. Return the string collected from the input port ip. The collection stops when the predicate char-pred-2 holds on the character read. However, if char-pred-1 becomes true it has to be matched by char-pred-2 without causing a termination of the collection. The last read character (the first character on which char-pred-2 holds) is processed by this function. As a precondition assume that if char-pred-1 holds then char-pred-2 does not hold, and vice versa.
See also Scheme source file collect-balanced-until

skip-while
Form (skip-while p)
Description Skip characters while p holds. The first character on which p fails is left as the oldest character in the queue. The predicate does not hold if end of file.
See also Scheme source file skip-while

skip-string
Form (skip-string str if-not-message)
Description Assume that str is just in front of us. Skip through it. If str is not in front of us, a fatal error occurs with if-not-message as error message.
See also Scheme source file skip-string

skip-until-string
Form (skip-until-string str . inclusive)
Description Skip characters until str is encountered. If inclusive, also skip str. It is assumed as a precondition that the length of str is at least one.
See also Scheme source file skip-until-string

collect-until-string
Form (collect-until-string str . inclusive)
Description Collect characters until str is encountered. If inclusive, also collect str. It is assumed as a precondition that the length of str is at least one.
See also Scheme source file collect-until-string


4 Useful predicates for skipping and collecting.

is-white-space?
Form (is-white-space? ch)
Description Is ch a white space character?
See also Scheme source file is-white-space?

end-of-line?
Form (end-of-line? ch)
Description Is ch an end of line charcter?
See also Scheme source file end-of-line?

eof?
Form (eof? ch)
Description Is ch an end of file character?
See also Scheme source file eof?

char-predicate
Form (char-predicate ch)
Description Return a predicate functions which matches the character ch. A higher order function.
See also Scheme source file char-predicate

Generated: Monday, November 14, 2011, 09:18:08
Generated by LAML SchemeDoc using LAML Version 38.0 (November 14, 2011, full)