Programming Languages and Compilers

 

Lecture 5

 

In this lecture we will look LR parsers and parser generators.

 

The slides for this lecture can be found here and here.

 

Literature

 

Pratt and Zelkowitz, chapter 4.1

 

The paper: “SableCC, an object-Oriented Compiler Framework”, by Etienne M. Gagnon and Laurie J. Hendren, Sable Research Group, McGill University, Canada. The paper can be downloaded from http://www.sablecc.org/tools-98.pdf

 

The JLex manual, by Elliot Berk. The manual can be downloaded from

http://www.cs.princeton.edu/~appel/modern/java/JLex/current/manual.html

 

The CUP user’s manual, by Scott Hudson. The manual can be downloaded from

http://www.cs.princeton.edu/~appel/modern/java/CUP/manual.html

 

Background material and further recourses, including the SableCC, JLex and CUP systems can be found on the following URLs:

http://www.sablecc.org/

 

http://www.cs.princeton.edu/~appel/modern/java/JLex/

 

http://www.cs.princeton.edu/~appel/modern/java/CUP/

 

As background reading on LR parsers and the above systems I can recommend chapter 2 and chapter 3 of Andrew Appel’s book: “Modern compiler implementation in Java (Second edition)” from Cambridge University Press.

Exercises

 

Exercises for lecture 5 will be done from 8.15 till 10.00 before Lecture 6 on Wednesday the 26th of March. Exercises will be published shortly.

 

  1.  The lexical symbols of a programming language can be recognized by deterministic finite state automatons (DFA). These automatons can be described by state/transition diagrams where each node represents a state, and each edge a state transition ("circles-and-arrows"). Edges in a state diagram are labelled by lexical symbols that are read by the transitions. The start state can be marked by a special in-coming arrow, and final accepting states are often marked as "doubled" circles (see the slides from the lecture).

Given an alphabet A = { 0, 1 }, and the languages defined by the following rules (a) - (e), construct (by hand) a deterministic finite state automaton recognizing each language. Represent your automatons as state diagrams.

(a) 

The string of three characters, 101.

(b) 

All strings of arbitrary length that end in 101.

(c) 

All strings that contain a 101 at least once anywhere.

(d) 

All strings that contain no consecutive ones.

(e) 

All strings in which the number of zeros is even.

 

  1. Unsigned numbers in Algol-60 are given by the following regular description:

digit = '0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9'
integer = digit digit*
sign = '+'|'-'
exponent = 'e' (sign | empty) integer
number = integer('.' integer | empty) (exponent | empty) | exponent

Construct a DFA to recognize this language, and represent it as a state diagram. You may find it useful to construct a NDFA-ε, then convert it to a NDFA and finally convert it to a DFA

  1. (a) Construct a state diagram for an DFA which accepts identifiers which obey the following rules.

The first character must be alphabetic (a letter); following characters may be alphabetic, numeric, or the underscore character; however, an underscore may not be final character, and two underscores may not be adjacent.

(b) Express the automaton as a regular expression. Use concatenation, alternation (|), closure (*), and, if needed, parentheses for grouping the items. You may find it helpful to introduce short-hand notation to represent any character that is a member of a small specified set, and another notation for a character that is not a member of a given set.

  1. Download and install JLex.

Try JLex on the sample grammar sample.lex:

 http://www.cs.princeton.edu/~appel/modern/java/JLex/current/sample.lex

 

  1. Download and install CUP.

Try CUP (and JLex) on the MinimalExample http://www.cs.princeton.edu/~appel/modern/java/CUP/minimal.zip

  1. Download and install SableCC.

Try SableCC on the postfix grammar and on the SmallLang grammar.

What happens if you add an if-then-else production to the SmallLang grammar?

Try SableCC on a bigger language, either your own language, MiniTriangle (see http://www.dcs.kcl.ac.uk/teaching/units/cs3plt/IntroductionToMiniTriangle.doc) or MiniJava (see http://geezy.cs.purdue.edu/~samanta/MCIIJ2E/grammar.html ).

  1. Try CUP and JLex on a bigger example, either your own language or

http://www.cs.princeton.edu/~appel/modern/java/CUP/javagrm.zip