The initial Parser’s grammar must be expanded to account for many types of statements. The first will be declaration statements. Starting with the most basic C declaration statement, there is a variable type followed by a variable name:
int var_name;
First, the grammar was updated by adding a token for variables. Just using the previously created term token won’t work as it also allows constant and expressions within parenthesis which should not be in the left side of a declaration statement. Note for now only integer variables are handled:
term:
ICONST { $$ = new_ast_const_node(INTtype, $1); }
| variable
| OP_PAR expression CL_PAR { $$ = $2; }
;
variable:
ID { $$ = new_ast_ID_node(INTtype, $1, $1); }
;
Using the previously created var_type token, the starting grammar for a declaration statement that accounts for a single variable with no value being assigned is:
declaration_st:
var_type variable SEMI { ; }
;
However, sometimes declarations assign a value, such as the following:
int var_name = 1;
This means that another token is needed that can be a variable or a variable assigned to a value. This token is called a declaration:
declaration_st:
var_type declaration SEMI { ; }
;
declaration:
variable { ; }
| variable ASSIGN expression { ; }
;
Now out parser can recognize a declaration statement for a single variable. Yet, it cannot handle a declaration statement for multiple statements such as the below example:
int var_name1, var_name2 = 100, var_name3;
To handle multiple declarations in a single line of C code, another new token called declarations is used. The declarations tokens allows for multiple comma separated declaration tokens to be handled:
declaration_st:
var_type declarations SEMI { ; }
;
declarations:
declarations declaration { ; }
| declaration { ; }
;
declaration:
variable { ; }
| variable COMMA { ; }
| variable ASSIGN expression { ; }
| variable ASSIGN expression COMMA { ; }
;
The parser can now recognize declaration statements, but the code to be executed when the declaration grammar is encountered must be added. Since a single line of C code can declare multiple variables, it is unknown how many AST nodes will be needed for a single declaration statement. For this, an AST (Abstract Syntax Tree) structure is created to store a list of declarations is used (fields are variable type and a pointer to a list of individual declarations):
// Addition to ast.h
typedef struct AST_Node_Declaration_List_t {
Node_Type type;
int vartype;
AST_Node* decls;
} AST_Node_Decl_List;
// Addition to ast.c
AST_Node* new_ast_decl_list_node(int vtype, AST_Node* decls)
{
AST_Node_Decl_List* node = malloc(sizeof(AST_Node_Decl_List));
node->type = DECLS_NODE;
node->vartype = vtype;
node->decls = decls;
return (AST_Node*)node;
}
Also, a structure must be created for the individual declarations stored in the declration list. The fields for this structure are the variable type, a node pointing to the variable, and a value (NULL unless variable is assigned a value in declaration):
// Additions to ast.h
typedef struct AST_Node_Declaration_t {
Node_Type type;
int vartype;
AST_Node* var;
AST_Node* val;
} AST_Node_Decl;
// Additions to ast.c
AST_Node* new_ast_decl_node(int vtype, AST_Node* var, AST_Node* val)
{
AST_Node_Decl* node = malloc(sizeof(AST_Node_Decl));
node->type = DECL_NODE;
node->vartype = vtype;
node->var = var;
node->val = val;
return (AST_Node*)node;
}
Adding the C code to the Parser leads to the following:
declaration_st:
var_type declarations { $$ = new_ast_decl_list_node($1, $2); }
;
declarations:
declarations declaration {
AST_Node_StateList* temp_decl_list = (AST_Node_StateList*) $1;
$$ = new_ast_statelist_node($1, $2, temp_decl_list->statement_cnt + 1);
}
| declaration { $$ = new_ast_statelist_node(NULL, $1, 1); }
;
/* $2 -> declaration name, $1 -> declaration type*/
declaration:
variable
{
AST_Node_ID* temp_ID = $1;
insert(temp_ID->varname, SYMTAB_LOCAL, decl_type, 0);
$$ = new_ast_decl_node(decl_type, $1, NULL);
}
| variable COMMA
{
AST_Node_ID* temp_ID = $1;
insert(temp_ID->varname, SYMTAB_LOCAL, decl_type, 0);
$$ = new_ast_decl_node(decl_type, $1, NULL);
}
| variable ASSIGN expression
{
AST_Node_ID* temp_ID = $1;
insert(temp_ID->varname, SYMTAB_LOCAL, decl_type, 0);
$$ = new_ast_decl_node(decl_type, $1, $3);
}
| variable ASSIGN expression COMMA
{
AST_Node_ID* temp_ID = $1;
insert(temp_ID->varname, SYMTAB_LOCAL, decl_type, 0);
$$ = new_ast_decl_node(decl_type, $1, $3);
}
;
Now the grammar of the Compiler can handle C declaration statements.