Semantic Analysis
5.4 TYPE-CHECKING DECLARATIONS
Environments are constructed and augmented by declarations. In Tiger, decla-rations appear only in aletexpression. Type-checking aletis easy enough, usingtransDecto translate declarations:
struct expty transExp(S_table venv, S_table tenv, A_exp a) { switch(a->kind) {
...
case A_letExp: {
struct expty exp;
A_decList d;
S_beginScope(venv);
S_beginScope(tenv);
for (d = a->u.let.decs; d; d=d->tail) transDec(venv,tenv,d->head);
exp = transExp(venv,tenv,a->u.let.body);
S_endScope(tenv);
S_endScope(venv);
return exp;
} ...
5.4. TYPE-CHECKING DECLARATIONS
Here transExp marks the current “state” of the environments by call-ingbeginScope(); callstransDecto augment the environments(venv, tenv)with new declarations; translates the body expression; then reverts to the original state of the environments usingendScope().
VARIABLE DECLARATIONS
In principle, processing a declaration is quite simple: a declaration augments an environment by a new binding, and the augmented environment is used in the processing of subsequent declarations and expressions.
The only problem is with (mutually) recursive type and function declara-tions. So we will begin with the special case of nonrecursive declaradeclara-tions.
For example, it is quite simple to process a variable declaration without a type constraint, such asvar x := exp.
void transDec(S_table venv, S_table tenv, A_dec d) { switch(d->kind) {
case A_varDec: {
struct expty e = transExp(venv,tenv,d->u.var.init);
S_enter(venv, d->u.var.var, E_VarEntry(e.ty));
} ... }
What could be simpler? In practice, ifd->typis present, as in
var x : type-id := exp
it will be necessary to check that the constraint and the initializing expres-sion are compatible. Also, initializing expresexpres-sions of type Ty_Nilmust be constrained by aTy_Recordtype.
TYPE DECLARATIONS
Nonrecursive type declarations are not too hard:
void transDec(S_table venv, S_table tenv, A_dec d) { ...
case A_typeDec: {
S_enter(tenv, d->u.type->head->name, transTy(d->u.type->head->ty));
}
The transTy function translates type expressions as found in the abstract syntax (A_ty) to the digested type descriptions that we will put into envi-ronments (Ty_ty). This translation is done by recurring over the structure of anA_ty, turning A_recordTy into Ty_Record, etc. While translating, transTyjust looks up any symbols it finds in the type environmenttenv.
The program fragment shown is not very general, since it handles only a type-declaration list of length 1, that is, a singleton list of mutually recursive type declarations. The reader is invited to generalize this to lists of arbitrary length.
FUNCTION DECLARATIONS
Function declarations are a bit more tedious:
void transDec(S_table venv, S_table tenv, A_dec d) { switch(d->kind) {
...
case A_functionDec: {
A_fundec f = d->u.function->head;
Ty_ty resultTy = S_look(tenv,f->result);
Ty_tyList formalTys = makeFormalTyList(tenv,f->params);
S_enter(venv,f->name,E_FunEntry(formalTys,resultTy));
S_beginScope(venv);
{A_fieldList l; Ty_tyList t;
for(l=f->params, t=formalTys; l; l=l->tail, t=t->tail) S_enter(venv,l->head->name,E_VarEntry(t->head));
}
transExp(venv, tenv, d->u.function->body);
S_endScope(venv);
break;
} ...
This is a very stripped-down implementation: it handles only the case of a single function; it does not handle recursive functions; it handles only a function with a result (a function, not a procedure); it doesn’t handle program errors such as undeclared type identifiers, etc; and it doesn’t check that the type of the body expression matches the declared result type.
So what does it do? Consider the Tiger declaration
function f(a: ta, b: tb) : rt = body.
First, transDeclooks up the result-type identifier rt in the type environ-ment. Then it calls the local functionmakeFormalTyList, which traverses
5.4. TYPE-CHECKING DECLARATIONS
the list of formal parameters and returns a list of their types (by looking each parameter type-id in thetenv). NowtransDechas enough information to construct the FunEntryfor this function and enter it in the value environ-ment.
Next, the formal parameters are entered (asVarEntrys) into the value en-vironment; this environment is used to process the body (with thetransExp function). Finally,endScope()discards the formal-parameters (but not the FunEntry) from the environment; the resulting environment is used for pro-cessing expressions that are allowed to call the functionf.
RECURSIVE DECLARATIONS
The implementations above will not work on recursive type or function dec-larations, because they will encounter undefined type or function identifiers (in transTyfor recursive record types or transExp(body)for recursive functions).
The solution for a set of mutually recursive things (types or functions) t1, ...,tn is to put all the “headers” in the environment first, resulting in an environment e1. Then process all the “bodies” in the environment e1. During processing of the bodies it will be necessary to look up some of the newly defined names, but they will in fact be there – though some of them may be empty headers without bodies.
What is a header? For a type declaration such as
type list = {first: int, rest: list}
the header is approximatelytype list =.
To enter this header into an environmenttenvwe can use aTy_Nametype with an empty binding :
S_enter(tenv, name, Ty_Name(name,NULL));
Now, we can calltransTyon the “body” of the type declaration, that is, on the record expression{first: int, rest: list}.
It’s important thattransTystop as soon as it gets to anyTy_Nametype. If, for example,transTybehaved likeactual_tyand tried to look “through”
theTy_Nametype bound to the identifierlist, all it would find (in this case) would be NULL – which it is certainly not prepared for. This NULLcan be replaced only by a valid type after the entire{first:int, rest:list}is translated.
The type thattransTyreturns can then be assigned into thetyfield within the Ty_Namestruct. Now we have a fully complete type environment, on whichactual_tywill not have a problem.
Every cycle in a set of mutually recursive type declarations must pass through a record or array declaration; the declaration
type a = b type b = d type c = a type d = a
contains an illegal cycle a → b → d → a. Illegal cycles should be detected by the type-checker.
Mutually recursive functions are handled similarly. The first pass gathers information about the header of each function (function name, formal pa-rameter list, return type) but leaves the bodies of the functions untouched. In this pass, the types of the formal parameters are needed, but not their names (which cannot be seen from outside the function).
The second pass processes the bodies of all functions in the mutually recur-sive declaration, taking advantage of the environment augmented with all the function headers. For each body, the formal parameter list is processed again, this time entering the parameters asVarEntrys in the value environment.
P R O G R A M TYPE-CHECKING
Write a type-checking phase for your compiler, a modulesemant.c match-ing the followmatch-ing header file:
/* semant.h */
void SEM_transProg(A_exp exp);
that type-checks an abstract syntax tree and produces any appropriate error messages about mismatching types or undeclared identifiers.
Also provide the implementation of theEnvmodule described in this chap-ter. Make a moduleMainthat calls the parser, yielding anA_exp, and then callsSEM_transProgon this expression.
You must use precisely theAbsyn interface described in Figure 4.7, but you are free to follow or ignore any advice given in this chapter about the internal organization of theSemantmodule.
You’ll need your parser that produces abstract syntax trees. In addition, supporting files available in$TIGER/chap5include:
EXERCISES
types.h, types.c Describes data types of the Tiger language.
and other files as before. Modify the makefile from the previous exercise as necessary.
Part a. Implement a simple type-checker and declaration processor that does not handle recursive functions or recursive data types (forward references to functions or types need not be handled). Also don’t bother to check that each break statement is within a for or while statement.
Part b. Augment your simple type-checker to handle recursive (and mutu-ally recursive) functions; (mutumutu-ally) recursive type declarations; and correct nesting of break statements.
E X E R C I S E S
5.1 Improve the hash table implementation ofProgram 5.2:
a. Double the size of the array when the average bucket length grows larger than 2 (sotableis now a pointer to a dynamically allocated array). To double an array, allocate a bigger one and rehash the contents of the old array; then discard the old array.
b. Allow for more than one table to be in use by making the table a param-eter toinsertandlookup.
c. Hide the representation of thetabletype inside an abstraction module, so that clients are not tempted to manipulate the data structure directly (only through theinsert,lookup, andpopoperations).
***5.2 In many applications, we want a + operator for environments that does more than add one new binding; instead of σ′= σ + {a !→ τ }, we want σ′= σ1+ σ2, where σ1 and σ2 are arbitrary environments (perhaps overlapping, in which case bindings in σ2take precedence).
We want an efficient algorithm and data structure for environment “adding.”
Balanced trees can implement σ + {a !→ τ } efficiently (in log(N) time, where N is the size of σ ), but take O(N) to compute σ1+ σ2, if σ1and σ2 are both about size N.
To abstract the problem, solve the general nondisjoint integer-set union
prob-lem. The input is a set of commands of the form, s1= {4} (define singleton set)
s2= {7}
s3=s1∪s2(nondestructive union) 6 ∈? s3 (membership test) s4=s1∪s3
s5= {9}
s6=s4∪s5
7 ∈? s2
An efficient algorithm is one that can process an input of N commands, answering all membership queries, in less than o(N2)time.
*a. Implement an algorithm that is efficient when a typical set union a ← b∪c has b much smaller than c [Brown and Tarjan 1979].
***b. Design an algorithm that is efficient even in the worst case, or prove that this can’t be done (see Lipton et al. [1997] for a lower bound in a restricted model).
*5.3 The Tiger language definition states that every cycle of type definitions must go through a record or array. But if the compiler forgets to check for this error, nothing terrible will happen. Explain why.