Generated: August 27, 2003, 22:48:17Copyright ©2003, Kurt NørmarkThe local LAML software home page

Reference Manual of Buffered Input Streams

Kurt Nørmark ©    normark@cs.auc.dk    Department of Computer Science    Aalborg University    Denmark    

Master index
Source file: lib/collect-skip.scm
LAML Version 21.00 (August 27, 2003, PP edition)

This library supports buffered input stream. A buffered input stream is convenient for various kinds of parsing tasks.

It is assumed that the variable ip references an open input port, or a string. The assignment of ip must be done exernally to this library, and after the library is loaded. If ip references an open input stream (made by the Scheme function open-input-file, for instance) input is read from that port. If ip is string, the variable pstring-ip-pointer is used as a pointer into the string.

The functions in section one are generic, low-level reading functions that either read from the open input port, or from a string (at the location determined by the variable pstring-ip-pointer). The functions in section two are the basic reading and peeking functions. In section three a number of convenient collection and skipping functions are provided for.

This library has been developed as part of an SGML/XML Document Type Definition (DTD) parser, but it is useful in many other parsing situations. There exists som early internal documentation of the DTD parser (on the www.cs.auc.dk site) and as such also of some aspects of the functions in this library.

In earlier versions of LAML, this library was called 'the text collection and skipping library'.

Table of Contents:
1. Low-level, generic input functions.3. Collection and skipping functions.
2. Look ahead buffer and queue.4. Useful predicates for skipping and collecting.

Alphabetic index:
advance-look-ahead(advance-look-ahead n)Provided that there is at least n characters in the buffer, advance next-read with n positions.
char-predicate(char-predicate ch)Return a predicate functions which matches the character ch.
collect-balanced-until(collect-balanced-until char-pred-1 char-pred-2)This collection procedure returns a balanced collection given two char predicates.
collect-until(collect-until p)Return the string collected from the input port ip.
collect-until-string(collect-until-string str . inclusive)Collect characters until str is encountered.
end-of-line?(end-of-line? ch)Is ch an end of line charcter?
ensure-look-ahead(ensure-look-ahead n)Make sure that there is at least n characters in the buffer.
eof?(eof? ch)Is ch an end of file character?
generic-eof-object?(generic-eof-object? x)Is x the designated end-of-file value relative to the implicitly given input port ip.
generic-read-char(generic-read-char ip)Reads a single character from ip, and advances the input pointer.
is-white-space?(is-white-space? ch)Is ch a white space character?
look-ahead-char(look-ahead-char)Return the first character from the "read end" of the buffer.
look-ahead-prefix(look-ahead-prefix lgt)Return a string of length lgt from the "read end" of the buffer.
match-look-ahead?(match-look-ahead? str)Return whether the buffer matches the string str.
max-look-aheadmax-look-aheadThe length of the cyclic look ahead buffer.
max-look-ahead-prefix(max-look-ahead-prefix)Return the entire look ahead queue as a string.
peek-a-char(peek-a-char)Peek a character from the input port, but queues it for subsequent reading at "the peek end".
peek-chars(peek-chars n)Peeks n charcters, by n calls of peek-a-char.
put-back-a-char-read-end(put-back-a-char-read-end ch)Put ch into the "read end" buffer (where read-a-char operates).
put-back-a-char-write-end(put-back-a-char-write-end ch)Put ch into the "peek end" of buffer (where peek-a-char operates).
put-back-a-string(put-back-a-string str which-end)Put str back into the buffer.
read-a-char(read-a-char)Read from the the look ahead buffer.
read-a-string(read-a-string n)Read and return a string of length n by means of repeated activations of read-a-char.
reset-look-ahead-buffer(reset-look-ahead-buffer)Reset the look ahead buffer.
skip-string(skip-string str if-not-message)Assume that str is just in front of us.
skip-until-string(skip-until-string str . inclusive)Skip characters until str is encountered.
skip-while(skip-while p)Skip characters while p holds.

 

1.   LOW-LEVEL, GENERIC INPUT FUNCTIONS.
The functions in this section reads from either an input port, or from a string.


generic-read-char


Form
(generic-read-char ip)

Description
Reads a single character from ip, and advances the input pointer.


generic-eof-object?


Form
(generic-eof-object? x)

Description
Is x the designated end-of-file value relative to the implicitly given input port ip.


 

2.   LOOK AHEAD BUFFER AND QUEUE.
The functions in this section manipulates a look ahead queue, which is in between the input port ip and the applications. Via this buffer it is possible to implement look ahead in the input port.

Imagine an input buffer of (actual) size n:

    c1 c2 c3 ... cn 
When characters are read from the input stream, they enter to the right (the peek end). When characters are read by an application they are taken from the left (the read end). Thus, cn is the last character read from the input port (or from the input string); This is done by peek-a-char. c1 is the next char to leave the buffer, and to be read by the client application; This will be done by read-a-char.

A few words about terminology in relation to R4RS or R5RS. The Scheme procedure read-char corresponds roughtly to read-a-char; The former always reads a character from an input port; The latter reads from an input stream via the buffer; Only if the buffer is empty, a character is read from the port or string. The proper Scheme function peek-char returns the next char from the input port, without updating the 'input pointer'. The function peek-a-char of this library is different, because it reads a character from the file and puts it into the buffer. This use of terminology is unfortunate, and it may be confusing for some readers.


max-look-ahead


Form
max-look-ahead

Description
The length of the cyclic look ahead buffer. Predefined to 2000 characters. A constant.


reset-look-ahead-buffer


Form
(reset-look-ahead-buffer)

Description
Reset the look ahead buffer. You should always call this function after you have re-assigned ip to a new input stream.


peek-a-char


Form
(peek-a-char)

Description
Peek a character from the input port, but queues it for subsequent reading at "the peek end". This function always reads one character via generic-read-char, and puts in into the "peek end" of the buffer.


peek-chars


Form
(peek-chars n)

Description
Peeks n charcters, by n calls of peek-a-char. In other words, the buffer is extended with n characters read from the input stream.

See also
relies onpeek-a-char    


read-a-char


Form
(read-a-char)

Description
Read from the the look ahead buffer. Only if this buffer is empty, read from the port. Reads from "the read end" of the queue. In case the buffer is non-empty, this procedure takes a character out of the buffer in the "read end". In any case, it advances the implicit input pointer of the input stream.


read-a-string


Form
(read-a-string n)

Description
Read and return a string of length n by means of repeated activations of read-a-char. Takes eof into account such that a string shorter than n can be returned.


look-ahead-prefix


Form
(look-ahead-prefix lgt)

Description
Return a string of length lgt from the "read end" of the buffer. A proper function.

Preconditions
lgt cannot be larger than the number of characters in the buffer.


max-look-ahead-prefix


Form
(max-look-ahead-prefix)

Description
Return the entire look ahead queue as a string. A proper function.


look-ahead-char


Form
(look-ahead-char)

Description
Return the first character from the "read end" of the buffer. A proper function.

Preconditions
The buffer is not empty.


match-look-ahead?


Form
(match-look-ahead? str)

Description
Return whether the buffer matches the string str. Matching is done by the function equal? A proper function.


ensure-look-ahead


Form
(ensure-look-ahead n)

Description
Make sure that there is at least n characters in the buffer. If there are less than n characters, ented a sufficient number of characters with peek-chars.

See also
relies onpeek-chars    


put-back-a-char-write-end


Form
(put-back-a-char-write-end ch)

Description
Put ch into the "peek end" of buffer (where peek-a-char operates).


put-back-a-char-read-end


Form
(put-back-a-char-read-end ch)

Description
Put ch into the "read end" buffer (where read-a-char operates).


put-back-a-string


Form
(put-back-a-string str which-end)

Description
Put str back into the buffer. The second parameter which-end controls whether to put back in read end or write end (equivalent to peek end). Possible values of which end are the symbols read-end or write-end.


advance-look-ahead


Form
(advance-look-ahead n)

Description
Provided that there is at least n characters in the buffer, advance next-read with n positions. Hereby n queued characters are skipped from the buffer at the "read end".


 

3.   COLLECTION AND SKIPPING FUNCTIONS.
This section contains a number of higher level collection and skipping functions. These functions use the funtions from the previous section. The functions in this section are the most important of this library.


collect-until


Form
(collect-until p)

Description
Return the string collected from the input port ip. The collection stops when the predicate p holds holds on the character read. The last read character (the first character on which p holds) is left as the oldest character in the queue.


collect-balanced-until


Form
(collect-balanced-until char-pred-1 char-pred-2)

Description
This collection procedure returns a balanced collection given two char predicates. Return the string collected from the input port ip. The collection stops when the predicate char-pred-2 holds holds on the character read. However, if char-pred-1 becomes true it has to be matched by char-pred-2 without causing a termination of the collection. The last read character (the first character on which char-pred-2 holds) is processed by this function. As a precondition assume that if char-pred-1 holds then char-pred-2 does not hold, and vice versa.


skip-while


Form
(skip-while p)

Description
Skip characters while p holds. The first character on which p fails is left as the oldest character in the queue The predicate does not hold if end of file


skip-string


Form
(skip-string str if-not-message)

Description
Assume that str is just in front of us. Skip through it. If str is not in front of us, a fatal error occurs with if-not-message as error message.


skip-until-string


Form
(skip-until-string str . inclusive)

Description
Skip characters until str is encountered. If inclusive, also skip str. It is assumed as a precondition that the length of str is at least one.


collect-until-string


Form
(collect-until-string str . inclusive)

Description
Collect characters until str is encountered. If inclusive, also collect str. It is assumed as a precondition that the length of str is at least one.


 

4.   USEFUL PREDICATES FOR SKIPPING AND COLLECTING.


is-white-space?


Form
(is-white-space? ch)

Description
Is ch a white space character?


end-of-line?


Form
(end-of-line? ch)

Description
Is ch an end of line charcter?


eof?


Form
(eof? ch)

Description
Is ch an end of file character?


char-predicate


Form
(char-predicate ch)

Description
Return a predicate functions which matches the character ch. A higher order function.


Generated: August 27, 2003, 22:48:17
This documentation has been extracted automatically from the Scheme source file by means of the Schemedoc tool