A regular expression is a pattern description using a meta language. An expression
is made up of symbols. Normal symbols are characters and numbers, but there
are other symbols that have special meaning in Lex. The following two tables
define some of the symbols used in Lex and give a few typical examples.
Defining regular expressions in Lex
Character |
Meaning |
A-Z, 0-9, a-z | Characters and numbers that form part of the pattern. |
. | Matches any character except \n. |
- | Used to denote range. Example: A-Z implies all characters from A to Z. |
[ ] | A character class. Matches any character in the brackets. If the first character is ^ then it indicates a negation pattern. Example: [abC] matches either of a, b, and C. |
* | Match zero or more occurrences of the preceding pattern. |
+ | Matches one or more occurrences of the preceding pattern. |
? | Matches zero or one occurrences of the preceding pattern. |
$ | Matches end of line as the last character of the pattern. |
{ } | Indicates how many times a pattern can be present. Example: A{1,3} implies one or three occurrences of A may be present. |
\ | Used to escape meta characters. Also used to remove the special meaning of characters as defined in this table. |
^ | Negation. |
| | Logical OR between expressions. |
"<some symbols>" | Literal meanings of characters. Meta characters hold. |
/ | Look ahead. Matches the preceding pattern only if followed by the succeeding expression. Example: A0/1 matches A0 only if A01 is the input. |
( ) | Groups a series of regular expressions. |
Examples of regular expressions
Regular
expression |
Meaning |
joke[rs] | Matches either jokes or joker. |
A{1,2}shis+ | Matches AAshis, Ashis, AAshi, Ashi. |
(A[b-e])+ | Matches zero or one occurrences of A followed by any character from b to e. |
Tokens in Lex are declared like variable names in C. Every token has an associated expression. (Examples of tokens and expression are given in the following table.) Using the examples in our tables, we'll build a word-counting program. Our first task will be to show how tokens are declared.
Examples of token declarations
Token |
Associated
expression |
Meaning |
number | ([0-9])+ | 1 or more occurrences of a digit |
chars | [A-Za-z] | Any character |
blank | " " | A blank space |
word | (chars)+ | 1 or more occurrences of chars |
variable | (chars)+(number)*(chars)*( number)* |