This is G o o g l e's cache of http://www.lrde.epita.fr/people/akim/compil/gnuprog2/Flex-Regular-Expressions.html. G o o g l e's cache is the snapshot that we took of the page as we crawled the web.
The page may have changed since that time. Click here for the current page without highlighting. To link to or bookmark this page, use the following url: http://www.google.com/search?q=cache:f9l_ms_fK2kC:www.lrde.epita.fr/people/akim/compil/gnuprog2/Flex-Regular-Expressions.html+&hl=en&ie=UTF-8
Google is not affiliated with the authors of this page nor responsible for its content.
Any characters amongst x, y or z. You may use a
dash for character intervals: [a-z] denotes any letter from
a through z. You may use a leading hat to negate the
class: [0-9] stands for any character which is not a decimal
digit, including new-line.
\x
if x is an a, b, f, n, r,
t, or v, then the ANSI-C interpretation of
\x. Otherwise, a literal x (used to escape
operators such as *).
\0
a NUL character.
\num
the character with octal value num.
\xnum
the character with hexadecimal value num.
"string"
Match the literal string. For instance "/*" denotes the
character / and then the character *, as opposed to
/* denoting any number of slashes.
<<EOF>>
Match the end-of-file.
The basic operators to make more complex regular expressions are, with
r and s being two regular expressions:
(r)
Match an r; parentheses are used to override precedence.
rs
Match the regular expression r followed by the regular expression
s. This is called concatenation.
r|s
Match either an r or an s. This is called
alternation.
{abbreviation}
Match the expansion of the abbreviation definition. Instead of
writing
%%
[a-zA-Z_][a-zA-Z0-9_]* return IDENTIFIER;
%%
you may write
id [a-zA-Z_][a-zA-Z0-9_]*
%%
{id} return IDENTIFIER;
%%
The quantifiers allow to specify the number of times a pattern
must be repeated:
r*
zero or more r's.
r+
one or more r's.
r?
zero or one r's.
r{[num]}
num times r
r{min,[max]}
anywhere from min to max (defaulting to no bound) r's.
For instance -?([0-9]+|[0-9]*\.[0-9]+([eE][-+]?[0-9]+)?) matches
C integer and floating point numbers.
One may also depend upon the context:
r/s
Match an r but only if it is followed by an s. This type of
pattern is called trailing context. The text matched by s
is included when determining whether this rule is the "longest match",
but is then returned to the input before the action is executed. So the
action only sees the text matched by r. Using trailing contexts
can have a negative impact on the scanner, in particular the input
buffer can no longer grow upon demand. In addition, it can produce
correct but surprising errors. Fortunately it is seldom needed, and
only to process pathologic languages such as Fortran. For instance to
recognize its loop keyword, do, one needs:
DO/[A-Z0-9]*=[A-Z0-9]*,
to distinguish DO1I=1,5, a for loop where I runs from 1 to
5, from DO1I=1.5, a definition/assignment of the floating
variable DO1I to 1.5. Voir Fortran and Satellites, for more on
Fortran loops and traps.
^r
Match an r at the beginning of a line.
r$
Match an r at the end of a line. This is rigorously equivalent to
r/\n, and therefore suffers the same problems, see
Simple Uses of Flex.