In this
lecture we will look LR parsers and parser generators.
The slides
for this lecture can be found here and here.
Pratt and
Zelkowitz, chapter 4.1
The paper:
“SableCC, an object-Oriented Compiler Framework”, by Etienne M. Gagnon and
Laurie J. Hendren, Sable Research Group,
The JLex manual,
by Elliot Berk. The manual can be downloaded from
http://www.cs.princeton.edu/~appel/modern/java/JLex/current/manual.html
The CUP
user’s manual, by Scott Hudson. The manual can be downloaded from
http://www.cs.princeton.edu/~appel/modern/java/CUP/manual.html
Background
material and further recourses, including the SableCC, JLex and CUP systems can
be found on the following URLs:
http://www.cs.princeton.edu/~appel/modern/java/JLex/
http://www.cs.princeton.edu/~appel/modern/java/CUP/
As
background reading on LR parsers and the above systems I can recommend chapter
2 and chapter 3 of Andrew Appel’s book: “Modern compiler implementation in Java
(Second edition)” from Cambridge University Press.
Exercises
for lecture 5 will be done from 8.15 till 10.00 before Lecture 6 on Wednesday
the 26th of March. Exercises will be published shortly.
Given an
alphabet A = { 0, 1 }, and the languages defined by the following rules
(a) - (e), construct (by hand) a deterministic finite state automaton
recognizing each language. Represent your automatons as state diagrams.
|
(a)
|
The
string of three characters, 101. |
|
(b)
|
All
strings of arbitrary length that end in 101. |
|
(c)
|
All
strings that contain a 101 at least once anywhere. |
|
(d)
|
All
strings that contain no consecutive ones. |
|
(e)
|
All strings
in which the number of zeros is even. |
digit =
'0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9'
integer = digit digit*
sign = '+'|'-'
exponent = 'e' (sign | empty) integer
number = integer('.' integer | empty) (exponent | empty) | exponent
Construct
a DFA to recognize this language, and represent it as a state diagram. You may
find it useful to construct a NDFA-ε, then convert it to a NDFA and
finally convert it to a DFA
The first
character must be alphabetic (a letter); following characters may be alphabetic,
numeric, or the underscore character; however, an underscore may not be final
character, and two underscores may not be adjacent.
(b)
Express the automaton as a regular expression. Use concatenation, alternation
(|), closure (*), and, if needed, parentheses for grouping the items. You may
find it helpful to introduce short-hand notation to represent any character
that is a member of a small specified set, and another notation for a character
that is not a member of a given set.
Try JLex
on the sample grammar sample.lex:
http://www.cs.princeton.edu/~appel/modern/java/JLex/current/sample.lex
Try CUP
(and JLex) on the MinimalExample http://www.cs.princeton.edu/~appel/modern/java/CUP/minimal.zip
Try
SableCC on the postfix grammar and on the SmallLang grammar.
What
happens if you add an if-then-else production to the SmallLang grammar?
Try
SableCC on a bigger language, either your own language, MiniTriangle (see http://www.dcs.kcl.ac.uk/teaching/units/cs3plt/IntroductionToMiniTriangle.doc)
or MiniJava (see http://geezy.cs.purdue.edu/~samanta/MCIIJ2E/grammar.html
).
http://www.cs.princeton.edu/~appel/modern/java/CUP/javagrm.zip