Basic Blocks and Traces
8.1 CANONICAL TREES
are so troublesome? Because they make it much more convenient for the Translate(translation to intermediate code) phase of the compiler.
We can take any tree and rewrite it into an equivalent tree without any of the cases listed above. Without these cases, the only possible parent of aSEQ
node is anotherSEQ; all theSEQnodes will be clustered at the top of the tree.
This makes theSEQs entirely uninteresting; we might as well get rid of them and make a linear list ofT_stms.
The transformation is done in three stages: First, a tree is rewritten into a list of canonical trees without SEQ orESEQnodes; then this list is grouped into a set of basic blocks, which contain no internal jumps or labels; then the basic blocks are ordered into a set of traces in which every CJUMP is immediately followed by itsfalselabel.
Thus the moduleCanonhas these tree-rearrangement functions:
/* canon.h */
typedef struct C_stmListList_ *C_stmListList;
struct C_block { C_stmListList stmLists; Temp_label label;};
struct C_stmListList_ { T_stmList head; C_stmListList tail;};
T_stmList C_linearize(T_stm stm);
struct C_block C_basicBlocks(T_stmList stmList);
T_stmList C_traceSchedule(struct C_block b);
Linearizeremoves theESEQs and moves the CALLs to top level. Then BasicBlocks groups statements into sequences of straight-line code. Fi-nallytraceScheduleorders the blocks so that everyCJUMPis followed by itsfalselabel.
8.1 CANONICAL TREES
Let us define canonical trees as having these properties:
1. NoSEQorESEQ.
2. The parent of eachCALLis eitherEXP(. . .)orMOVE(TEMPt, . . .).
TRANSFORMATIONS ON ESEQ
How can theESEQ nodes be eliminated? The idea is to lift them higher and higher in the tree, until they can becomeSEQnodes.
Figure 8.1gives some useful identities on trees.
(1) tis a new temporary
BINOP(op, e1,ESEQ(s, e2)) = ESEQ(MOVE(TEMPt, e1), FIGURE 8.1. Identities on trees (see alsoExercise 8.1).
8.1. CANONICAL TREES
Identity (1) is obvious. So is identity (2): Statement s is to be evaluated;
then e1; then e2; then the sum of the expressions is returned. If s has side effects that affect e1or e2, then either the left-hand side or the right-hand side of the first equation will execute those side effects before the expressions are evaluated.
Identity (3) is more complicated, because of the need not to interchange the evaluations of s and e1. For example, if s isMOVE(MEM(x), y)and e1is
BINOP(PLUS,MEM(x), z), then the program will compute a different result if sis evaluated before e1instead of after. Our goal is simply to pull s out of the
BINOPexpression; but now (to preserve the order of evaluation) we must pull e1 out of theBINOP with it. To do so, we assign e1into a new temporary t, and put t inside theBINOP.
It may happen that s causes no side effects that can alter the result produced by e1. This will happen if the temporaries and memory locations assigned by s are not referenced by e1(and s and e1don’t both perform external I/O). In this case, identity (4) can be used.
We cannot always tell if two expressions commute. For example, whether
MOVE(MEM(x), y) commutes with MEM(z) depends on whether x = z, which we cannot always determine at compile time. So we conservatively approximate whether statements commute, saying either “they definitely do commute” or “perhaps they don’t commute.” For example, we know that any statement “definitely commutes” with the expression CONST(n), so we can use identity (4) to justify special cases like
BINOP(op,CONST(n),ESEQ(s, e)) = ESEQ(s,BINOP(op,CONST(n), e)).
The commutefunction estimates (very naively) whether two expressions commute:
static bool isNop(T_stm x) {
return x->kind == T_EXP && x->u.EXP->kind == T_CONST;
}
static bool commute(T_stm x, T_exp y) {
return isNop(x) || y->kind==T_NAME || y->kind==T_CONST;
}
A constant commutes with any statement, and the empty statement com-mutes with any expression. Anything else is assumed not to commute.
GENERAL REWRITING RULES
In general, for each kind of Treestatement or expression we can identify the subexpressions. Then we can make rewriting rules, similar to the ones in Figure 8.1, to pull theESEQs out of the statement or expression.
For example, in [e1,e2,ESEQ(s, e3)], the statement s must be pulled left-ward past e2and e1. If they commute, we have (s; [e1,e2,e3]). But suppose e2does not commute with s; then we must have
(SEQ(MOVE(t1,e1),SEQ(MOVE(t2,e2),s)); [TEMP(t1),TEMP(t2),e3]) Or if e2commutes with s but e1does not, we have
(SEQ(MOVE(t1,e1),s); [TEMP(t1), e2, e3])
The reorder function takes a list of expressions and returns a pair of (statement, expression-list). The statement contains all the things that must be executed before the expression-list. As shown in these examples, this includes all the statement-parts of the ESEQs, as well as any expressions to their left with which they did not commute. When there are no ESEQs at all we will useEXP(CONST0), which does nothing, as the statement.
Algorithm. Step one is to make a “subexpression-extraction” method for each kind. Step two is to make a “subexpression-insertion” method: given anESEQ-clean version of each subexpression, this builds a new version of the expression or statement.
typedef struct expRefList_ *expRefList;
struct expRefList_ {T_exp *head; expRefList tail;};
struct stmExp {T_stm s; T_exp e;};
static T_stm reorder(expRefList rlist);
static T_stm do_stm(T_stm stm);
static struct stmExp do_exp(T_exp exp);
Thereorderfunction is supposed to pull all theESEQs out of a list of ex-pressions and combine the statement-parts of theseESEQinto one bigT_stm. The argument toreorderis a list of references to the immediate subexpres-sions of that statement.Figure 8.2illustrates the use of a pointer to a pointer.
If we call reorder(l2), we are saying, “please pull any ESEQs out of the children and grandchildren of thisBINOPnode e2. For your convenience, the
8.1. CANONICAL TREES
places where it points to its children are at the locations pointed to by the list l2. For each child that is anESEQ(sk,ek), you should update the child-pointer to point to ek instead and put sk on the big sequence of statements that you will return as a result.”
Reorder(l2)calls upon an auxiliary functiondo_expon each expression in the list l2, that is, the expressions e1and e3.Do_exp(e1)returns a statement s1and an expression e′1, where e′1contains no ESEQs, such that ESEQ(s, e′1) would be equivalent to the original expression e1. In this case, since e1 is so trivial, s1 will be a no-op statement EXP(CONST(0)) and e1′ = e1. But if expression e3’s MEM node pointed to ESEQ(sx,TEMP a), then do_exp(e3) will yield s3=sx and e′3=MEM(TEMPa).
The implementation ofdo_expis rather simple. For any kind of expres-sion except ESEQ,do_expjust makes a list of the subexpression references and callsreorder:
static struct stmExp do_exp(T_exp exp) { switch(exp->kind) {
case T_BINOP:
return StmExp(reorder(ExpRefList(&exp->u.BINOP.left, ExpRefList(&exp->u.BINOP.right,NULL))), exp);
case T_MEM:
return StmExp(reorder(ExpRefList(&exp->u.MEM,NULL)), exp);
case T_ESEQ: {
struct stmExp x = do_exp(exp->u.ESEQ.exp);
return StmExp(seq(do_stm(exp->u.ESEQ.stm), x.s), x.e);
}
case T_CALL:
return StmExp(reorder(get_call_rlist(exp)), exp);
default:
return StmExp(reorder(NULL), exp);
}}
The functionseq(s1,s2)just returns a statement equivalent toSEQ(s1,s2), but in the very common case that s1 or s2 is a no-op, we can so something simpler:
static T_stm seq(T_stm x, T_stm y) { if (isNop(x)) return y;
if (isNop(y)) return x;
return T_Seq(x,y);
}
CONST
BINOP
MEM 343
.
. . . . . . . . .
.
.
+
e1 l1
e2 l2
e3 l3
FIGURE 8.2. List-of-refs argument passed toreorder.
The ESEQ case ofdo_exp must calldo_stm, which pulls all theESEQs out of a statement. It also works by making a list of all the subexpression references and callingreorder:
static T_stm do_stm(T_stm stm) { switch (stm->kind) {
case T_SEQ:
return seq(do_stm(stm->u.SEQ.left), do_stm(stm->u.SEQ.right));
case T_JUMP:
return seq(reorder(ExpRefList(&stm->u.JUMP.exp,NULL)), stm);
case T_CJUMP:
return seq(reorder(ExpRefList(&stm->u.CJUMP.left,
ExpRefList(&stm->u.CJUMP.right,NULL))), stm);
case T_MOVE:
..
. see below case T_EXP:
if (stm->u.EXP->kind == T_CALL)
return seq(reorder(get_call_rlist(stm->u.EXP)), stm);
else return seq(reorder(ExpRefList(&stm->u.EXP, NULL)), stm);
default:
return stm;
}}
The left-hand operand of the MOVE statement is not considered a subex-pression, because it is the destination of the statement – its value is not used
8.1. CANONICAL TREES
by the statement. However, if the destination is a memory location, then the addressacts like a source. Thus we have,
static T_stm do_stm(T_stm stm) { ..
.
case T_MOVE:
if (stm->u.MOVE.dst->kind == T_TEMP &&
stm->u.MOVE.src->kind == T_CALL)
return seq(reorder(get_call_rlist(stm->u.MOVE.src)), stm);
else if (stm->u.MOVE.dst->kind == T_TEMP)
return seq(reorder(ExpRefList(&stm->u.MOVE.src, NULL)), stm);
else if (stm->u.MOVE.dst->kind == T_MEM)
return seq(reorder(ExpRefList(&stm->u.MOVE.dst->u.MEM, ExpRefList(&stm->u.MOVE.src, NULL))), stm);
else if (stm->u.MOVE.dst->kind == T_ESEQ) { T_stm s = stm->u.MOVE.dst->u.ESEQ.stm;
stm->u.MOVE.dst = stm->u.MOVE.dst->u.ESEQ.exp;
return do_stm(T_Seq(s, stm));
} .. .
With the assistance of do_exp anddo_stm, the reorderfunction can pull the statement si out of each expression ei on its list of references, going from right to left.
MOVING CALLS TO TOP LEVEL
TheTreelanguage permitsCALLnodes to be used as subexpressions. How-ever, the actual implementation ofCALLwill be that each function returns its result in the same dedicated return-value registerTEMP(RV). Thus, if we have
BINOP(PLUS,CALL(. . .),CALL(. . .))
the second call will overwrite theRVregister before thePLUScan be executed.
We can solve this problem with a rewriting rule. The idea is to assign each return value immediately into a fresh temporary register, that is
CALL(fun, args) → ESEQ(MOVE(TEMPt,CALL(fun, args)),TEMPt) Now theESEQ-eliminator will percolate theMOVEup outside of its contain-ingBINOP(etc.) expressions.
This technique will generate a few extra MOVE instructions, which the register allocator (Chapter 11) can clean up.
The rewriting rule is implemented as follows: reorderreplaces any oc-currence ofCALL(f, args)by
ESEQ(MOVE(TEMPtnew,CALL(f, args)), TEMPtnew)
and calls itself again on theESEQ. Butdo_stmrecognizes the pattern
MOVE(TEMPtnew,CALL(f, args)),
and does not callreorderon the CALL node in that case, but treats the f and args as the children of the MOVE node. Thus, reorder never “sees”
anyCALLthat is already the immediate child of aMOVE. Occurrences of the patternEXP(CALL(f, args))are treated similarly.
A LINEAR LIST OF STATEMENTS
Once an entire function body s0is processed withdo_stm, the result is a tree s′0where all theSEQnodes are near the top (never underneath any other kind of node). Thelinearizefunction repeatedly applies the rule
SEQ(SEQ(a, b), c) = SEQ(a, seq(b, c))
The result is that s0′ is linearized into an expression of the form
SEQ(s1,SEQ(s2, . . . ,SEQ(sn−1,sn) . . .))
Here theSEQnodes provide no structuring information at all, and we can just consider this to be a simple list of statements,
s1,s2, . . . ,sn−1,sn
where none of the si containSEQorESEQnodes.
These rewrite rules are implemented by linearize, with an auxiliary functionlinear:
static T_stmList linear(T_stm stm, T_stmList right) { if (stm->kind == T_SEQ)
return linear(stm->u.SEQ.left, linear(stm->u.SEQ.right,
right));
else return T_StmList(stm, right);
}
T_stmList C_linearize(T_stm stm) { return linear(do_stm(stm), NULL);
}