Regular expressions in Lex


A regular expression is a pattern description using a meta language. An expression is made up of symbols. Normal symbols are characters and numbers, but there are other symbols that have special meaning in Lex. The following two tables define some of the symbols used in Lex and give a few typical examples.

Defining regular expressions in Lex
Character
Meaning
A-Z, 0-9, a-z Characters and numbers that form part of the pattern.
. Matches any character except \n.
- Used to denote range. Example: A-Z implies all characters from A to Z.
[ ] A character class. Matches any character in the brackets. If the first character is ^ then it indicates a negation pattern. Example: [abC] matches either of a, b, and C.
* Match zero or more occurrences of the preceding pattern.
+ Matches one or more occurrences of the preceding pattern.
? Matches zero or one occurrences of the preceding pattern.
$ Matches end of line as the last character of the pattern.
{ } Indicates how many times a pattern can be present. Example: A{1,3} implies one or three occurrences of A may be present.
\ Used to escape meta characters. Also used to remove the special meaning of characters as defined in this table.
^ Negation.
| Logical OR between expressions.
"<some symbols>" Literal meanings of characters. Meta characters hold.
/ Look ahead. Matches the preceding pattern only if followed by the succeeding expression. Example: A0/1 matches A0 only if A01 is the input.
( ) Groups a series of regular expressions.

Examples of regular expressions
Regular expression
Meaning
joke[rs] Matches either jokes or joker.
A{1,2}shis+ Matches AAshis, Ashis, AAshi, Ashi.
(A[b-e])+ Matches zero or one occurrences of A followed by any character from b to e.

Tokens in Lex are declared like variable names in C. Every token has an associated expression. (Examples of tokens and expression are given in the following table.) Using the examples in our tables, we'll build a word-counting program. Our first task will be to show how tokens are declared.

Examples of token declarations
Token
Associated expression
Meaning
number ([0-9])+ 1 or more occurrences of a digit
chars [A-Za-z] Any character
blank " " A blank space
word (chars)+ 1 or more occurrences of chars
variable (chars)+(number)*(chars)*( number)*