|
Noeud:Simple Uses of Flex, Noeud «Next»:Using Flex, Noeud «Previous»:What Flex is, Noeud «Up»:Scanning with Flex
Flex is a source generator, just as Gperf, Bison, and others. It takes a list of regular expressions and actions, and produces a fast function triggering the action associated to the pattern recognized. As for Gperf and Bison, the input syntax allows for a prologue, containing Flex directives and possibly some user declarations and initializations, and an epilogue, typically additional functions:
%{ user-file-prologue %} flex-directives %% %{ user-yylex-prologue %} regular-expression-1 action-1 regular-expression-2 action-2 ... %% user-epilogue
Example 6.11: Structure of a Flex Input File
All the pairs of regular-expression and action is listed on
separate lines. The regular-expression must be written at the
first column, otherwise it is considered as code to output inside the
function which will be produced. This can be used to leave comments in
the input. The actions maybe enclosed in braces if they are
several lines long, and action = |
stands for "same as the
next action".
When run, flex
produces a file named lex.yy.c
,
containing a C program including, in addition to your
user-prologue and user-epilogue, one function:
int yylex () | Fonction |
Scan the FILE *yyin , which defaults to the standard input, for
tokens. Trigger the action associated to the succeeding
regular-expression. Return 0 when it should no longer be called,
typically when the end of the file is reached. Otherwise, typically
returns the kind of the token that has just been recognized.
|
For instance, this simple Flex input file is meant to recognize rude words, and to express its surprise on unknown words:
%{ /* -*- C -*- */ %} %% "sh*t" | "f*k" | "win*ows" | "Huh? What the f*?" printf ("I don't like you saying `%s'.\n", yytext); ^.*$ printf ("Huh? What the f* `%s'?\n", yytext); \n /* Ignore. */ %%
Example 6.12: rude-3.l
-- Recognizing Rude Words With Flex
which we can try now:
$ flex -orude-3.c rude-3.l $ gcc -Wall -o rude-3 rude-3.c -lfl error-->rude-3.c:1020: warning: `yyunput' defined but not used $ echo 'dear' | ./rude-3 Huh? What the f* `dear'?
We paid attention to writing the most general rules last, and to
providing a rule to prevent the newline characters from being echoed to
the standard output. This is needed because .
does not match the
newline characters, hence ^.*$
doesn't cover them.
You certainly have noted that we did not provide any main
,
however the program works: the Flex library, libfl
, provides a
default main
which calls yylex
until the end of file is
found.
Let's try an actual match:
$ echo 'Huh? What the f*?' | ./rude-3 Huh? What the f* `Huh? What the f*?'?
Huh? What the f* Huh? What the f* `Huh? What the f*?'?
? It was
supposed to recognize it!
You just fell onto so-called right contexts: $
is exactly
equivalent to /\n
standing for "if followed by a newline". The
bad news, is that when /right-context
is used, the number
of characters matched by right-context counts to elect the longest
match (but of course doesn't get into yyleng
). In other words,
^.*$
matched the whole line plus the newline, hence it
wins over the dedicated pattern.
As much as possible you should avoid depending upon contexts, i.e., if
you ever design a language, make sure it can be scanned without. If we
simply replace ^.*$
with .*
in the example 6.12, then we have:
$ echo 'Huh? What the f*?' | ./rude-4 I don't like you saying `Huh? What the f*?'.