CANONICAL TREES - Basic Blocks and Traces

Basic Blocks and Traces

8.1 CANONICAL TREES

are so troublesome? Because they make it much more convenient for the Translate(translation to intermediate code) phase of the compiler.

We can take any tree and rewrite it into an equivalent tree without any of the cases listed above. Without these cases, the only possible parent of aSEQ

node is anotherSEQ; all theSEQnodes will be clustered at the top of the tree.

This makes theSEQs entirely uninteresting; we might as well get rid of them and make a linear list ofT_stms.

The transformation is done in three stages: First, a tree is rewritten into a list of canonical trees without SEQ orESEQnodes; then this list is grouped into a set of basic blocks, which contain no internal jumps or labels; then the basic blocks are ordered into a set of traces in which every CJUMP is immediately followed by itsfalselabel.

Thus the moduleCanonhas these tree-rearrangement functions:

/* canon.h */

typedef struct C_stmListList_ *C_stmListList;

struct C_block { C_stmListList stmLists; Temp_label label;};

struct C_stmListList_ { T_stmList head; C_stmListList tail;};

T_stmList C_linearize(T_stm stm);

struct C_block C_basicBlocks(T_stmList stmList);

T_stmList C_traceSchedule(struct C_block b);

Linearizeremoves theESEQs and moves the CALLs to top level. Then BasicBlocks groups statements into sequences of straight-line code. Fi-nallytraceScheduleorders the blocks so that everyCJUMPis followed by itsfalselabel.

8.1 CANONICAL TREES

Let us define canonical trees as having these properties:

1. NoSEQor^ESEQ.

2. The parent of eachCALLis eitherEXP(. . .)orMOVE(TEMPt, . . .).

TRANSFORMATIONS ON ESEQ

How can theESEQ nodes be eliminated? The idea is to lift them higher and higher in the tree, until they can becomeSEQnodes.

Figure 8.1gives some useful identities on trees.

(1) tis a new temporary

BINOP(op, e₁,ESEQ(s, e₂)) = ESEQ(MOVE(TEMPt, e₁), FIGURE 8.1. Identities on trees (see alsoExercise 8.1).

8.1. CANONICAL TREES

Identity (1) is obvious. So is identity (2): Statement s is to be evaluated;

then e₁; then e₂; then the sum of the expressions is returned. If s has side effects that affect e₁or e₂, then either the left-hand side or the right-hand side of the first equation will execute those side effects before the expressions are evaluated.

Identity (3) is more complicated, because of the need not to interchange the evaluations of s and e₁. For example, if s isMOVE(MEM(x), y)and e₁is

BINOP(PLUS,MEM(x), z), then the program will compute a different result if sis evaluated before e₁instead of after. Our goal is simply to pull s out of the

BINOPexpression; but now (to preserve the order of evaluation) we must pull e1 out of theBINOP with it. To do so, we assign e₁into a new temporary t, and put t inside theBINOP.

It may happen that s causes no side effects that can alter the result produced by e₁. This will happen if the temporaries and memory locations assigned by s are not referenced by e₁(and s and e₁don’t both perform external I/O). In this case, identity (4) can be used.

We cannot always tell if two expressions commute. For example, whether

MOVE(MEM(x), y) commutes with MEM(z) depends on whether x = z, which we cannot always determine at compile time. So we conservatively approximate whether statements commute, saying either “they definitely do commute” or “perhaps they don’t commute.” For example, we know that any statement “definitely commutes” with the expression CONST(n), so we can use identity (4) to justify special cases like

BINOP(op,CONST(n),ESEQ(s, e)) = ESEQ(s,BINOP(op,CONST(n), e)).

The commutefunction estimates (very naively) whether two expressions commute:

static bool isNop(T_stm x) {

return x->kind == T_EXP && x->u.EXP->kind == T_CONST;

}

static bool commute(T_stm x, T_exp y) {

return isNop(x) || y->kind==T_NAME || y->kind==T_CONST;

}

A constant commutes with any statement, and the empty statement com-mutes with any expression. Anything else is assumed not to commute.

GENERAL REWRITING RULES

In general, for each kind of Treestatement or expression we can identify the subexpressions. Then we can make rewriting rules, similar to the ones in Figure 8.1, to pull theESEQs out of the statement or expression.

For example, in [e₁,e2,ESEQ(s, e3)], the statement s must be pulled left-ward past e₂and e₁. If they commute, we have (s; [e₁,e₂,e₃]). But suppose e2does not commute with s; then we must have

(SEQ(MOVE(t1,e1),SEQ(MOVE(t2,e2),s)); [TEMP(t1),TEMP(t2),e3]) Or if e₂commutes with s but e₁does not, we have

(SEQ(MOVE(t1,e1),s); [TEMP(t1), e2, e3])

The reorder function takes a list of expressions and returns a pair of (statement, expression-list). The statement contains all the things that must be executed before the expression-list. As shown in these examples, this includes all the statement-parts of the ESEQs, as well as any expressions to their left with which they did not commute. When there are no ESEQs at all we will useEXP(CONST0), which does nothing, as the statement.

Algorithm. Step one is to make a “subexpression-extraction” method for each kind. Step two is to make a “subexpression-insertion” method: given anESEQ-clean version of each subexpression, this builds a new version of the expression or statement.

typedef struct expRefList_ *expRefList;

struct expRefList_ {T_exp *head; expRefList tail;};

struct stmExp {T_stm s; T_exp e;};

static T_stm reorder(expRefList rlist);

static T_stm do_stm(T_stm stm);

static struct stmExp do_exp(T_exp exp);

Thereorderfunction is supposed to pull all theESEQs out of a list of ex-pressions and combine the statement-parts of theseESEQinto one bigT_stm. The argument toreorderis a list of references to the immediate subexpres-sions of that statement.Figure 8.2illustrates the use of a pointer to a pointer.

If we call reorder(l₂), we are saying, “please pull any ESEQs out of the children and grandchildren of thisBINOPnode e₂. For your convenience, the

8.1. CANONICAL TREES

places where it points to its children are at the locations pointed to by the list l₂. For each child that is anESEQ(s_k,e_k), you should update the child-pointer to point to e_k instead and put s_k on the big sequence of statements that you will return as a result.”

Reorder(l2)calls upon an auxiliary functiondo_expon each expression in the list l₂, that is, the expressions e₁and e₃.Do_exp(e₁)returns a statement s1and an expression e^′₁, where e^′₁contains no ESEQs, such that ESEQ(s, e^′₁) would be equivalent to the original expression e₁. In this case, since e₁ is so trivial, s₁ will be a no-op statement EXP(CONST(0)) and e₁^′ = e1. But if expression e₃’s MEM node pointed to ESEQ(s_x,TEMP a), then do_exp(e₃) will yield s₃=s_x and e^′₃=MEM(TEMPa).

The implementation ofdo_expis rather simple. For any kind of expres-sion except ESEQ,do_expjust makes a list of the subexpression references and callsreorder:

static struct stmExp do_exp(T_exp exp) { switch(exp->kind) {

case T_BINOP:

return StmExp(reorder(ExpRefList(&exp->u.BINOP.left, ExpRefList(&exp->u.BINOP.right,NULL))), exp);

case T_MEM:

return StmExp(reorder(ExpRefList(&exp->u.MEM,NULL)), exp);

case T_ESEQ: {

struct stmExp x = do_exp(exp->u.ESEQ.exp);

return StmExp(seq(do_stm(exp->u.ESEQ.stm), x.s), x.e);

}

case T_CALL:

return StmExp(reorder(get_call_rlist(exp)), exp);

default:

return StmExp(reorder(NULL), exp);

}}

The functionseq(s₁,s₂)just returns a statement equivalent toSEQ(s₁,s₂), but in the very common case that s₁ or s₂ is a no-op, we can so something simpler:

static T_stm seq(T_stm x, T_stm y) { if (isNop(x)) return y;

if (isNop(y)) return x;

return T_Seq(x,y);

}

CONST

BINOP

MEM 343

.

. . . . . . . . .

.

e₁ l₁

e₂ l₂

e₃ l₃

FIGURE 8.2. List-of-refs argument passed toreorder.

The ESEQ case ofdo_exp must calldo_stm, which pulls all theESEQs out of a statement. It also works by making a list of all the subexpression references and callingreorder:

static T_stm do_stm(T_stm stm) { switch (stm->kind) {

case T_SEQ:

return seq(do_stm(stm->u.SEQ.left), do_stm(stm->u.SEQ.right));

case T_JUMP:

return seq(reorder(ExpRefList(&stm->u.JUMP.exp,NULL)), stm);

case T_CJUMP:

return seq(reorder(ExpRefList(&stm->u.CJUMP.left,

ExpRefList(&stm->u.CJUMP.right,NULL))), stm);

case T_MOVE:

. see below case T_EXP:

if (stm->u.EXP->kind == T_CALL)

return seq(reorder(get_call_rlist(stm->u.EXP)), stm);

else return seq(reorder(ExpRefList(&stm->u.EXP, NULL)), stm);

default:

return stm;

}}

The left-hand operand of the MOVE statement is not considered a subex-pression, because it is the destination of the statement – its value is not used

8.1. CANONICAL TREES

by the statement. However, if the destination is a memory location, then the addressacts like a source. Thus we have,

static T_stm do_stm(T_stm stm) { ..

case T_MOVE:

if (stm->u.MOVE.dst->kind == T_TEMP &&

stm->u.MOVE.src->kind == T_CALL)

return seq(reorder(get_call_rlist(stm->u.MOVE.src)), stm);

else if (stm->u.MOVE.dst->kind == T_TEMP)

return seq(reorder(ExpRefList(&stm->u.MOVE.src, NULL)), stm);

else if (stm->u.MOVE.dst->kind == T_MEM)

return seq(reorder(ExpRefList(&stm->u.MOVE.dst->u.MEM, ExpRefList(&stm->u.MOVE.src, NULL))), stm);

else if (stm->u.MOVE.dst->kind == T_ESEQ) { T_stm s = stm->u.MOVE.dst->u.ESEQ.stm;

stm->u.MOVE.dst = stm->u.MOVE.dst->u.ESEQ.exp;

return do_stm(T_Seq(s, stm));

} .. .

With the assistance of do_exp anddo_stm, the reorderfunction can pull the statement s_i out of each expression e_i on its list of references, going from right to left.

MOVING CALLS TO TOP LEVEL

TheTreelanguage permitsCALLnodes to be used as subexpressions. How-ever, the actual implementation ofCALLwill be that each function returns its result in the same dedicated return-value registerTEMP(RV). Thus, if we have

BINOP(PLUS,CALL(. . .),CALL(. . .))

the second call will overwrite theRVregister before thePLUScan be executed.

We can solve this problem with a rewriting rule. The idea is to assign each return value immediately into a fresh temporary register, that is

CALL(fun, args) → ESEQ(MOVE(TEMPt,CALL(fun, args)),TEMPt) Now theESEQ-eliminator will percolate theMOVEup outside of its contain-ingBINOP(etc.) expressions.

This technique will generate a few extra MOVE instructions, which the register allocator (Chapter 11) can clean up.

The rewriting rule is implemented as follows: reorderreplaces any oc-currence ofCALL(f, args)by

ESEQ(MOVE(TEMPt_new,CALL(f, args)), TEMPt_new)

and calls itself again on theESEQ. Butdo_stmrecognizes the pattern

MOVE(TEMPt_new,CALL(f, args)),

and does not callreorderon the CALL node in that case, but treats the f and args as the children of the MOVE node. Thus, reorder never “sees”

anyCALLthat is already the immediate child of aMOVE. Occurrences of the patternEXP(CALL(f, args))are treated similarly.

A LINEAR LIST OF STATEMENTS

Once an entire function body s₀is processed withdo_stm, the result is a tree s^′₀where all theSEQnodes are near the top (never underneath any other kind of node). Thelinearizefunction repeatedly applies the rule

SEQ(SEQ(a, b), c) = SEQ(a, seq(b, c))

The result is that s₀^′ is linearized into an expression of the form

SEQ(s₁,SEQ(s₂, . . . ,SEQ(s_n−1,s_n) . . .))

Here theSEQnodes provide no structuring information at all, and we can just consider this to be a simple list of statements,

s₁,s₂, . . . ,s_n−1,s_n

where none of the s_i containSEQorESEQnodes.

These rewrite rules are implemented by linearize, with an auxiliary functionlinear:

static T_stmList linear(T_stm stm, T_stmList right) { if (stm->kind == T_SEQ)

return linear(stm->u.SEQ.left, linear(stm->u.SEQ.right,

right));

else return T_StmList(stm, right);

}

T_stmList C_linearize(T_stm stm) { return linear(do_stm(stm), NULL);

}

在文檔中 Modern Compiler Implementation in C (頁 188-196)