Programming Languages and Compilers

Lecture 4

In this lecture we will look at scanners or lexers as they are sometimes called. We will look at how we can build scanners by hand and how they can be generated automatically using tools such as JLex. We will also look at the JavaCC compiler compiler. This tool can help you generate (at least the front-end of) recursive decent compilers.

The slides for this lecture can be found here.

Literature

Sebesta section 3.1 to 3.4 and section 4.1 to 4.4

The JLex manual, by Elliot Berk. The manual can be downloaded from

http://www.cs.princeton.edu/~appel/modern/java/JLex/current/manual.html

The article in Java World: “Build your own language with JavaCC”, by Oliver Enseling, which can be downloaded from http://www.javaworld.com/javaworld/jw-12-2000/jw-1229-cooltools_p.html

As background reading I will recommend you read:

The JavaCC FAQ:

http://www.engr.mun.ca/~theo/JavaCC-FAQ/javacc-faq.htm

You can download a free copy of JavaCC from the website JavaCC Home

There is a repository of grammars for languages, including Java and SQL on the below URL:

http://www.cobase.cs.ucla.edu/pub/javacc/

The Java Tree Builder tool can be found on

http://www.cs.purdue.edu/jtb/

The JLex system can be found on the following URL

http://www.cs.princeton.edu/~appel/modern/java/JLex/

An alternative LL(1) compiler generator is the Compiler Generator Coco/R. There are versions of CoCo/R for Java, C#, C++, Oberon, Modula-2 and Pascal.

Exercises

Exercises for lecture 4 will be done from 12.30 till 14.15 before Lecture 5 on Thursday the 26^th of February.

The lexical symbols of a programming language can be recognized by deterministic finite state automatons (DFA). These automatons can be described by state/transition diagrams where each node represents a state, and each edge a state transition ("circles-and-arrows"). Edges in a state diagram are labelled by lexical symbols that are read by the transitions. The start state can be marked by a special in-coming arrow, and final accepting states are often marked as "doubled" circles (see the slides from the lecture).

Given an alphabet A = { 0, 1 }, and the languages defined by the following rules (a) - (e), construct (by hand) a deterministic finite state automaton recognizing each language. Represent your automatons as state diagrams.

(a)	The string of three characters, 101.
(b)	All strings of arbitrary length that end in 101.
(c)	All strings that contain a 101 at least once anywhere.
(d)	All strings that contain no consecutive ones.
(e)	All strings in which the number of zeros is even.

Unsigned numbers in Algol-60 are given by the following regular description:

digit = '0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9'
integer = digit digit*
sign = '+'|'-'
exponent = 'e' (sign | empty) integer
number = integer('.' integer | empty) (exponent | empty) | exponent

Construct a DFA to recognize this language, and represent it as a state diagram. You may find it useful to construct a NDFA-ε, then convert it to a NDFA and finally convert it to a DFA

(a) Construct a state diagram for an DFA which accepts identifiers which obey the following rules.

The first character must be alphabetic (a letter); following characters may be alphabetic, numeric, or the underscore character; however, an underscore may not be final character, and two underscores may not be adjacent.

(b) Express the automaton as a regular expression. Use concatenation, alternation (|), closure (*), and, if needed, parentheses for grouping the items. You may find it helpful to introduce short-hand notation to represent any character that is a member of a small specified set, and another notation for a character that is not a member of a given set.

Download and install JLex.

Try JLex on the sample grammar sample.lex:

http://www.cs.princeton.edu/~appel/modern/java/JLex/current/sample.lex

Download and install JavaCC. Look at the file Calc2i.jj Copy the file to an empty directory. Run Javacc on the file. Look at the .java files. Run javac *.java and java Calc2i

Try some of the JavaCC examples in examples directory in the JavaCC distribution

Look at the files eg1.jjt and eg4.jjt in the examples/JJTreeExamples directory. Copy the files to two new directories. Run jjtree on each of the files and look at the generated .jj files and .java files. The run javacc on the .jj file and javac *.jj. Then run java eg1, resp. java eg4