Lexical Anaylsis boils down to matching patterns.
There are several ways to do this
Over the alphabet {a,b} give a regular expression for
Finite automata that obey certain rules
Over the alphabet {a,b} give a DFA that accepts:
/* scanner for a toy Pascal-like language */
%{
#include <math.h> /* needed for call to atof() */
%}
DIG [0-9]
ID [a-z][a-z0-9]*
%%
{DIG}+ printf("Integer: %s (%d)\n", yytext, atoi(yytext));
{DIG}+"."{DIG}* printf("Float: %s (%g)\n", yytext, atof(yytext));
if|then|begin|end printf("Keyword: %s\n",yytext);
{ID} printf("Identifier: %s\n",yytext);
"+"|"-"|"*"|"/" printf("Operator: %s\n",yytext);
"{"[^}\n]*"}" /* skip one-line comments */
[ \t\n]+ /* skip whitespace */
. printf("Unrecognized: %s\n",yytext);
%%
main(){yylex();}