Declaration Parser Grammar/AST Code

The initial Parser’s grammar must be expanded to account for many types of statements. The first will be declaration statements. Starting with the most basic C declaration statement, there is a variable type followed by a variable name:

int var_name;

First, the grammar was updated by adding a token for variables. Just using the previously created term token won’t work as it also allows constant and expressions within parenthesis which should not be in the left side of a declaration statement. Note for now only integer variables are handled:

term:
ICONST { $$ = new_ast_const_node(INTtype, $1); }
| variable
| OP_PAR expression CL_PAR { $$ = $2; }
;

variable:
ID  { $$ = new_ast_ID_node(INTtype, $1, $1); }
;

Using the previously created var_type token, the starting grammar for a declaration statement that accounts for a single variable with no value being assigned is:

declaration_st:
var_type variable SEMI { ; }
;

However, sometimes declarations assign a value, such as the following:

int var_name = 1;

This means that another token is needed that can be a variable or a variable assigned to a value. This token is called a declaration:

declaration_st:
var_type declaration SEMI { ; }
;

declaration:
variable { ; }
| variable ASSIGN expression { ; }
;

Now out parser can recognize a declaration statement for a single variable. Yet, it cannot handle a declaration statement for multiple statements such as the below example:

int var_name1, var_name2 = 100, var_name3;

To handle multiple declarations in a single line of C code, another new token called declarations is used. The declarations tokens allows for multiple comma separated declaration tokens to be handled:

declaration_st:
var_type declarations SEMI { ; }
;

declarations:
declarations declaration { ; }
| declaration { ; }
;

declaration:
variable { ; }
| variable COMMA { ; }
| variable ASSIGN expression { ; }
| variable ASSIGN expression COMMA { ; }

;

The parser can now recognize declaration statements, but the code to be executed when the declaration grammar is encountered must be added. Since a single line of C code can declare multiple variables, it is unknown how many AST nodes will be needed for a single declaration statement. For this, an AST (Abstract Syntax Tree) structure is created to store a list of declarations is used (fields are variable type and a pointer to a list of individual declarations):

// Addition to ast.h
typedef struct AST_Node_Declaration_List_t {
	Node_Type type;
	int vartype;
	AST_Node* decls;
} AST_Node_Decl_List;

// Addition to ast.c
AST_Node* new_ast_decl_list_node(int vtype, AST_Node* decls)
{
	AST_Node_Decl_List* node = malloc(sizeof(AST_Node_Decl_List));

	node->type = DECLS_NODE;
	node->vartype = vtype;
	node->decls = decls;

	return (AST_Node*)node;
}

Also, a structure must be created for the individual declarations stored in the declration list. The fields for this structure are the variable type, a node pointing to the variable, and a value (NULL unless variable is assigned a value in declaration):

// Additions to ast.h
typedef struct AST_Node_Declaration_t {
	Node_Type type;
	int vartype;
	AST_Node* var;
	AST_Node* val;
} AST_Node_Decl;

// Additions to ast.c
AST_Node* new_ast_decl_node(int vtype, AST_Node* var, AST_Node* val)
{
	AST_Node_Decl* node = malloc(sizeof(AST_Node_Decl));

	node->type = DECL_NODE;
	node->vartype = vtype;
	node->var = var;
	node->val = val;

	return (AST_Node*)node;
}

Adding the C code to the Parser leads to the following:

declaration_st:
var_type declarations { $$ = new_ast_decl_list_node($1, $2); }
;

declarations:
declarations declaration {
	AST_Node_StateList* temp_decl_list = (AST_Node_StateList*) $1;
	$$ = new_ast_statelist_node($1, $2, temp_decl_list->statement_cnt + 1); 
}
| declaration { $$ = new_ast_statelist_node(NULL, $1, 1); }
;

/* $2 -> declaration name, $1 -> declaration type*/
declaration:
variable 
{
	AST_Node_ID* temp_ID = $1;
	insert(temp_ID->varname, SYMTAB_LOCAL, decl_type, 0); 
	$$ = new_ast_decl_node(decl_type, $1, NULL); 
}
| variable COMMA
{
	AST_Node_ID* temp_ID = $1;
	insert(temp_ID->varname, SYMTAB_LOCAL, decl_type, 0); 
	$$ = new_ast_decl_node(decl_type, $1, NULL); 
}
| variable ASSIGN expression 
{
	AST_Node_ID* temp_ID = $1;
	insert(temp_ID->varname, SYMTAB_LOCAL, decl_type, 0); 
	$$ = new_ast_decl_node(decl_type, $1, $3);
}
| variable ASSIGN expression COMMA
{
	AST_Node_ID* temp_ID = $1;
	insert(temp_ID->varname, SYMTAB_LOCAL, decl_type, 0); 
	$$ = new_ast_decl_node(decl_type, $1, $3);
}
;

Now the grammar of the Compiler can handle C declaration statements.