This is G o o g l e's cache of http://www.lrde.epita.fr/people/akim/compil/gnuprog2/Flex-Regular-Expressions.html.
G o o g l e's cache is the snapshot that we took of the page as we crawled the web.
The page may have changed since that time. Click here for the current page without highlighting.
To link to or bookmark this page, use the following url: http://www.google.com/search?q=cache:f9l_ms_fK2kC:www.lrde.epita.fr/people/akim/compil/gnuprog2/Flex-Regular-Expressions.html+&hl=en&ie=UTF-8


Google is not affiliated with the authors of this page nor responsible for its content.

Flex Regular Expressions

Noeud:Flex Regular Expressions, Noeud «Next»:, Noeud «Previous»:Flex Directives, Noeud «Up»:Using Flex



Flex Regular Expressions

The characters and literals may be described by:

x
the character x.
.
any character except newline.
[xyz]
Any characters amongst x, y or z. You may use a dash for character intervals: [a-z] denotes any letter from a through z. You may use a leading hat to negate the class: [0-9] stands for any character which is not a decimal digit, including new-line.
\x
if x is an a, b, f, n, r, t, or v, then the ANSI-C interpretation of \x. Otherwise, a literal x (used to escape operators such as *).
\0
a NUL character.
\num
the character with octal value num.
\xnum
the character with hexadecimal value num.
"string"
Match the literal string. For instance "/*" denotes the character / and then the character *, as opposed to /* denoting any number of slashes.
<<EOF>>
Match the end-of-file.

The basic operators to make more complex regular expressions are, with r and s being two regular expressions:

(r)
Match an r; parentheses are used to override precedence.
rs
Match the regular expression r followed by the regular expression s. This is called concatenation.
r|s
Match either an r or an s. This is called alternation.
{abbreviation}
Match the expansion of the abbreviation definition. Instead of writing
%%
[a-zA-Z_][a-zA-Z0-9_]*   return IDENTIFIER;
%%

you may write

id  [a-zA-Z_][a-zA-Z0-9_]*
%%
{id}   return IDENTIFIER;
%%

The quantifiers allow to specify the number of times a pattern must be repeated:

r*
zero or more r's.
r+
one or more r's.
r?
zero or one r's.
r{[num]}
num times r
r{min,[max]}
anywhere from min to max (defaulting to no bound) r's.

For instance -?([0-9]+|[0-9]*\.[0-9]+([eE][-+]?[0-9]+)?) matches C integer and floating point numbers.

One may also depend upon the context:

r/s
Match an r but only if it is followed by an s. This type of pattern is called trailing context. The text matched by s is included when determining whether this rule is the "longest match", but is then returned to the input before the action is executed. So the action only sees the text matched by r. Using trailing contexts can have a negative impact on the scanner, in particular the input buffer can no longer grow upon demand. In addition, it can produce correct but surprising errors. Fortunately it is seldom needed, and only to process pathologic languages such as Fortran. For instance to recognize its loop keyword, do, one needs:
DO/[A-Z0-9]*=[A-Z0-9]*,

to distinguish DO1I=1,5, a for loop where I runs from 1 to 5, from DO1I=1.5, a definition/assignment of the floating variable DO1I to 1.5. Voir Fortran and Satellites, for more on Fortran loops and traps.

^r
Match an r at the beginning of a line.
r$
Match an r at the end of a line. This is rigorously equivalent to r/\n, and therefore suffers the same problems, see Simple Uses of Flex.