Returning First Tokens From Lexer

When we set up the regular expressions for the Lexer, the C code (code within braces) to be executed was only print statements:

"return" {printf("RETURN\n")};

For the Lexer to return tokens to the Parser, we need to modify this C code. For example, when the regular expression “return” is observed, a token representing a return keyword should be returned to the Parser:

"return" { return RETURN; }

In the above code, RETURN is a token that is returned to the Parser. Simply returning a token is sufficient for keywords; however, sometimes we want to return more. Say a number is encountered. We not only want to return a number token, but also return the number that was observed. For this a union called yylval is used.

yylval is a global variable used to pass values between the Lexer and Parser. Since yylval is a union it can pass multiple types of values (note union is definied in Parser):

%union {
	int intval;
	double doubleval;
	char charval;
	char* str;
	struct AST_Node* node;
};

Now, since we have yylval, numbers and other desired values (i.e., variable names in the form of strings) can be passed to the Parser:

[0-9]+ {
    int value = atoi(yytext);
    yylval.intval = value;
    return ICONST;
}

The following tokens can now be passed to the Parser from the Lexer (many more will be needed):

%%

"int" { return INT; }
"return" { return RETURN; }

[a-zA-Z_][a-zA-Z0-9_]* {
    yylval.str = _strdup(yytext);
    return ID;
}

[0-9]+ {
    int value = atoi(yytext);
    yylval.intval = value;
    return ICONST;
}

"+" {return PLUS;}
"-" {return MINUS;}
"/" {return DIVIDE;}
"*" {return MULT;}
"=" {return ASSIGN;}

";" { return SEMI; }
"(" { return OP_PAR; }
")" { return CL_PAR; }
"{" { return OP_BRACE; }
"}" { return CL_BRACE; }

%%