A Simple Tree Pattern Matching Algorithm for Code Generator
Tzer-Shyong Chen, Feipei Lai, and Rung-Ji Shang
Dept.
of
Electrical Engineering
National Taiwan University, Taipei, Taiwan,
R.O.C.
E-mail: flai@cc.ee.ntu.edu.tw
Abstract
This p a p e r d e s c r i b e s 4 s i m p l e t r e e p a t t e r n m a t c h - i n g a l g o r i t h m f o r t h e c o d e g e n e r a t o r of c o m p i l e r s . T h e i n t e r m e d i a t e c o d e ( R e g i s t e r T r a n s f e r L a n g u a g e ) is m a t c h e d with t h e tree-rewriting r u l e s of t h e i n s t r u c - t i o n d e s c r i p t i o n w h i c h describe t h e t a r g e t architecture t o g e n e r a t e t h e a s s e m b l y code. T h e h a s h i n g f u n c t i o n i s u s e d in o u r s y s t e m t o t r a n s f o r m 4 t r e e p a t t e r n m a t c h - i n g p r o b l e m i n t o 4 s i m p l e n u m b e r c o m p a r i s o n . Com- pared with GNU C c o m p i l e r (gcc), t h e t r e e p a t t e r n m a t c h i n g t i m e c a n be reduced by 69% a n d t h e c o m p i l e r t i m e b y 6%, a n d t h e s p a c e o f t h e i n s t r u c t i o n descrip- t i o n s c a n be reduced b y 4.10 t i m e s o nDLX
a n d 2.14 o n SPARC. T h e size of t a b l e , w h i c h is n e c e s s a r y f o r code g e n e r a t o r , i s quite s m a l l in o u r m e t h o d .1
Introduction
Code selection can be done by the tree pattern matching. Through instruction patterns, target in- structions are first described, then the tree pattern matcher searches for a cover [5] of the input tree. How- ever, if there are several possible covers in a given in- put tree, this process usually becomes indeterminat- ed. The cost of the instruction patterns indicates the quality of code such as execution time or code size. By choosing the code according t o the cost, the ambiguity of code selection is resolved.
There are several methods t o select the cover of the minimum cost. Graham and Glanville [12] pro- pose a concept about the use of LR parsing. In this method, instruction patterns are written in prefix or-
der and interpreted as a context free grammar. More- over, through the modified LALR(1) parser which is constructed by the above grammar, the cover can be found by parsing the input tree. Because the essence of the grammars is ambiguous, some heuristics and sim- plifications are offered t o resolve the ambiguity. Using general tree pattern matching method t o avoid ambi-
guities is guaranteed t o select the cover of the mini- m u m cost. For instance, the general methods which use a dynamic programming algorithm are used in
TWIG [l].
Our experience with t h e tree-rewriting rules has shown that such a method is easy t o use and the speci- fication of the instruction patterns is independent from the actual tree pattern matching algorithm. Howev- er, at the early time of the architecture development, the architecture will be changed sometimes. Thus, a
flexible compiler is necessary for the architect design- ing a new architecture. For the need of a n architect, how to retarget a compiler t o different machines is an important issue.
The goal of our research, A r d e n (Architecture de- velopment environment) compiler, is design a flexible compiler t o help the architect retarget the compiler t o a new instruction set. A r d e n compiler uses a simple and efficient tree pattern matching method t o produce an efficient code generator. By traversing a template of tree-rewriting rule from the bottom up, it can be hashed t o a c h a r a c t e r i s t i c n u m b e r which can represent the rule. Then, the tree pattern matcher can generate the cover sets by comparing the c h a r a c t e r i s t i c n u m b e r of a subtree in an RTL tree with those of tree-rewriting rules from the bottom up. In order t o output efficient codes, the action phase will choose a cover set of a
minimum cost instructions for each RTL tree. Thus, the tree pattern matching problem will become a sim- ple number comparsion. By changing t h e instruction descriptions, we can retarget t h e code generator to d- ifferent instruction sets more easily.
In the next section, we illustrate the flowchart of A r d e n compiler. Instruction description is discussed in section 111. Section IV describes a simple tree pattern matching algorithm. Section V will show the experi- mental results, and conclusions are given in the final section.
2 Flowchart
of
Arden Compiler
The compiler of Arden consists of the gcc front -end, the tree pattern matcher, the instruction description, and the action phase. T h e front-end of gcc takes c program as input, a n d outputs the intermediate code (RTL tree). A template of tree-rewriting rule in in- struction description which is used t o describe the target instructions can be hashed t o a c h a r a c t e l i s t i c n u m b e r , and this c h a r a c t e r i s t i c n u m b e r is used to rep- resent the tree-rewriting rule which includes this tem- plate. T h e cover sets of an RTL tree are generated by the tree pattern matcher, which compares the c h a r a c - t e r i s t i c n u m b e r of a subtree in an RTL tree with those of the tree-rewriting rules from the bottom up. The subtree, matched with a tree-rewriting rule, will be replaced by the corresponding replacement node the replacing process will continue until the root of the RTL tree is encountered. T h e action phase outputs the assembly code which has a minimum cost among the cover sets for the target machine.
C program
1
Cover sets
Assembly code
Figure 1: Flowchart of Arden compiler
3
Instruction Description
3.1
Insitruction Description
In order t o generate the machine assembly code, the instruction description of a target machine can be represented by the tree-rewriting rules which contain macro expression, a replacement node, a template, and sets of condition expressions, cost, and action.
A tree template, composed of a replacement node and a template, represents a computation which is per- formed by one or more machine instructions. A set
of condition expression is used t o select a proper ac- tion. After a template has been matched with a sub- tree in a n RTL tree, the condition expression must be checked. T h e syntax of a tree-rewriting rule in a n in- struction description is described a s the following:
%defineinsn 0 Macro expression Q { Replacement +- Template } Q Condition expression1 Q Q Cost2 Q
{
Action11
Q Condition expression2 Q Q c o s t 2 Q { Action2}
0 Condition expressionN Q Q CostMQ { ActionN}%
The entry between two @s is optional.
M a c r o e x p r e s s i o n defines the macro strings which will be expanded in template, condition expres- sion, or action.
R e p l a c e m e n t is a replacement node, and t e m p l a t e is the representation of an RTL tree.
C o n d i t i o n e x p r e s s i o n will settle some constraints for the operands in the template and will be checked by the tree pattern matcher when the template is matched with a subtree in a n RTL tree.
C o s t is the execution cycle time of the action code. A c t i o n returns the assembly code for the rule. For example, a d d and s u b instructions in SPARCar- chitecture can be defined in cane rule by the following macro expression.
%defineinsn
@ VAR macroaperator = {”plus” ,”minus”} & macro-opcode = {”add”,”sub”) 62
{ ( I SI 0) c (macro-operator:SI (r SI 1 )(I SI 2 ))}
@ %i2
<
4096 and %i2 2 -4096 62@cost=l@
{ macro-opcode % r l , %i2, %rO }
Q %i2 2 4096 or %i2
<
-4096Q @cost=3@{ ”sethi hi(%i2), gl; or 10(%0;2), gl, gl; macro-opcode % r l , gl, %rO;” } %
code. In this m a c r o e x p r e s s i o n , m a c r o - o p e r a t o r is ei- ther plus or minus, and m a c r o - o p c o d e is either add or
sub. The applicablilty of the rule will be settled by a set of condition expression. The operands contain strings like %m and %in where n is the order of the operands in the tree template. For example, a target register operand is represented by t h e string %TO, and the immediate value of the second operand by %22. When the template is matched with a subtree in an RTL tree, the tree pattern matcher will check the sec- ond operand %i2. If t h e value of %i2 is between -4096 and 4095, the code of a c t i o n 1 will be outputed; other- wise, the code of a c t i o n 2 will be output. The action of the tree-rewriting rule is outputed in the action phase and consists of statements which are assembly codes or assembler modules. For example, if m a c r o - o p e r a t o r is replaced by p l u s in the template, the macro-opcode will be replaced by add. This rule indicates that the target register is equal t o the result of the first source register plus the immediate value. For example, if the RTL tree is “reg 3 +reg4
+
30”, “(r SI O)+-(plus :SI ( rSI 1 )(I SI 2
))”
can be matched in the tree-rewriting rules, and thus the instruction “ a d d r4, 30, TJ’ is out-puted.
3.2 Tree-rewriting Rules
The front-end of A r d e n translates source programs into a n intermediate representation ( R T L ) . The RTL program is a series of expression trees which are then transformed into postfix order for the bottom-up com- parison. Fig. 2 shows the RTL of an assignment a:=b+l in which both a and b are local variables; one is stored at offset 4, and the other at offset 8 for the stack pointer which is stored in register sp. The m e m operator will return the content of a memory location. In translating a n RTL tree, there are two steps:(l)
/:= \
mem
I
S P
S P
Figure 2: Intermediate representation of a:=b+l traversing the tree in postfix order and (2) producing code for each individual node. Each nonterminal node
in
the template represents a n intermediate result cal- culated and will be replaced by a replacement node in tree-rewriting rules. Fig. 3 shows the tree-rewriting rules needed t o translate the RTL tree of Fig. 2. Theinstruction add of rule1 in Fig. 3 is t o add the con- tent of a memory location (addressed by the s u m of stack pointer and a n offset) and a register, and returns the result into a register. By repeatedly searching, a
Rewrite rule cost Action
Rule 1 : regi
-
,+, JI
add rt, rj, ri I 5 /+\ sp const Rule2:reg,- const 1 mov const, ri
I
Rule3:
/:=\ null C-
/+\
sp const
Figure 3: Tree-rewriting rules for a:=b+l
cover set=( 2,1,3}
n
U rule1
Figure 4: T h e cover tree for a:=b+l
subtree in a n RTL tree can be reduced t o a replace- ment node, and the RTL tree can be rewritten by the tree-rewriting rules. After tree pattern matching, the nodes of a n RTL tree will be marked with cover sets which include all the possible matching combination-
s of the replacement rules, and the subtrees will be
replaced by replacement nodes. T h e labeled tree is called a cover tree [5]. Fig. 4 shows how t o transform a n RTL tree into a cover tree.
4
A
Simple Tree Pattern Matching
Al-
gori t hm
+
The target assembly code is generated by tree pat- tern matching in which a n RTL tree is reduced into
a replacement node by repeatedly searching for the subtree in the RTL tree. T h e subtrees, matched with template, will be replaced by the corresponding; re- placement node. By using the hashing function from the bottom up, each node of the template can get it- s own c h a r a c t e r i s t i c n u m b e r , then the tree-rewriting rule can be marked with the c h a r a c t e r i s t i c n u m b e r of the root in the template. T h e cover sets of an RTL tree are generated by the tree pattern matcher which compares the c h a r a c t e r i s t i c n u m b e r of a subtree in a n RTL tree with those of the tree-rewriting rules from the bottom up. Then, the action phase will choose a minimum cost instructions for the output assembly code. The hashing function is defined as F(root, left, right)=(root+left*right) mod p r i m e
-
the r o o t is the root of the tree, left is the left subtree of the root, r i g h t is the right subtree of the root, and the p r i m e number is 17041. Hence, we will represent each operator and terminal node with a different number. Below is a n example of operator/terminal node representation.operator/terminal
1
different numberree
I
203302
I
memI
204I
Rewrite rule Cost Action Characteristic number
-
Rulel: 1 add rj, rk, ri 7429 reg,-+
/ \ reg, regk Rule;?: reg,-)- + Id m[rj], rt 3 addrt,rk,ri 14759 / \ym
reg2Figure 5: Tree-rewriting rules represent some instruc- tions of a target machine
After applying the hashing function
F
into tree- rewriting rules, we can get the c h a r a c t e r i s t i c n u m b e r for each tree-rewriting rule in Fig. 5. The processes of c h a r a c t e r i s t i c n u m b e r calculation are depicted as the following:The first rule: F ( + , reg,, regk)=(302+203*203) m o d 1 7 0 4 k 7 4 2 9 The second ruh: F(mem, reg,, null)=(204+203*1) m o d 17041=407
F ( + , mem[regj], regk)=(302+407*203) mod 1704k14759 The third rule: F ( + , regj, regk)=(302+203*203) mod 17041=7429 F(mem, regj
+
‘egk, nul1)=(204+7429*1) mod 17041=7633RTL tree (Characteristic number=7429) Rule 2 (Characteristic number=14759) (Characteristic numbe~7633) Treerewriting rules
Figure 6: Simple tree pattern matching by the hashing function
The tree-rewriting rules in Fig. 5 represent some instructions of a target machine. Each rule is marked with a c h a r a c t e r i s t i c n u m b e r computed by the hash- ing function. In tree pattern matching, the assembly instruction of the target machine will be generated by the action of a tree-rewriting rule. To illustrate, let us traverse the RTL tree by tree pattern matching. The process is shown in Fig. 6. After the c h a r a c t e r - i s t i c n u m b e r s of the three rules have been generated, the tree pattern matcher will traverse the RTL tree from the bottom up. The tree pattern matcher will compute the c h a r a c t e r i s t i c n u m b e r of a subtree in the RTL tree
-
F ( + , r e g 4 , r e g 5 ) = ’ 7 4 2 9 . Then, Comparing the c h a r a c t e r i s t i c n u m b e r 7429 with those of the tree- rewriting rules, we find that rule1 matches with the subtree of the RTL tree. The template of the first rule in which j=4 and k=5 matches with the leftmost leaf of the RTL tree. If we use this rule, the subtree“ r e g 4 + r e g 5 ” of the RTL tree will be replaced by ~ e g 7 ,
and the instruction “ a d d r4, r5, r7’ will be generated. T h e second rule in which i=3, j=7, and k=6 matches with the root of the RTL tree. If we choose the second rule, the RTL tree will be replaced by a single node regs; then both instructions “ld m[r7], r8” and “ a d d
r8, r6, r3” will be generated. The code which gener-
ated by the tree pattern matcher to translate the RTL tree are shown as the following:
add
r4,
r.5, r7 2d m[r7], rRadd r8, 7-6, 7-3
The tree pattern matching algorithm includes two phases: 1. the preprocessing phase. 2. the tree pat- tern matching phase. T h e preprocessing phase parses the instruction description, and calculates the char-
a c t e r i s t i c n u m b e r for each tree-rewriting rule. For a
specific target instruction set, the preprocessing phase only needs to be done once. As for the tree pattern
matching phase, an RTL tree can be parsed by con- sulting the c h a r a c t e r i s t i c numbers of the tree-rewriting rules t o produce the cover sets.
4.1
Preprocessing Phase
The use of macro strings is for the reduction of tree- rewriting rules which are used t o describe the target instructions. Then, in preprocessing phase, the macro strings will be expanded by t h e tree pattern matcher for each tree-rewriting rule. T h e tree pattern match- er first expands macro strings for each tree-rewriting rule. Next, the tree pattern matcher computes a char- a c t e r i s t i c number for each template, and then sorts the templates again according t o their c h a r a c t e r i s t i c numbers.
4.2
Tree
Pattern Matching Phase
There are two steps t o traverse a n RTL tree in the tree pattern matching phase. T h e first step is to tra- verse the RTL tree in postfix order. Then, each node in an RTL tree will get a c h a r a c t e r i s t i c number, and the tree pattern matcher will compare this c h a r a c t e r - i s t i c number with those of the tree-rewriting rules. If the c h a r a c t e r i s t i c number of a subtree in an RTL tree is equal t o that of a tree-rewriting rule and one of the condition expressions in this rule can be satisfied, the tree pattern matcher will record the information of match node in the match parsing stack. The match n- ode is the root of the subtree in a n RTL tree which can be replaced by a replacement node. The second step is t o replace each match node by a replacement node in the match parsing stack until the root is encountered. After t h a t , the tree pattern matcher outputs the cov- er sets. If there exist multiple cover sets, the action phase will choose a one of minimum cost. The cover sets which are generated from the traversal of a n RTL tree are shown in Fig. 7.
For the match node “mem”, this subtree can be matched by the template of rules, then we can rewrite this subtree as a single replacement node reg7. Next,
“reg7+reg6” can be matched by the template of r u l e l , and the cover set[l]={3, 1) will be output. As for the
match node
“+”,
this subtree can be matched by the template of rule1 andthis
subtree can be rewritten asa single replacement node reg7. Then, the rest of the RTL tree can be matched by the template of ruZe2. At last, the cover set[2]={1, 2) will be output. There are several different combinations of rules which are matched into a n RTL tree. If several different cover sets are matched into the root of a n RTL tree, the one of t h e minimum cost will be selected.
reg,+-
a
,
match node = memcover s e t [ ~ l = ( ~ ) reg,
-
a
&
reg,_ _ _ _ _ _
*/
\
matchnode=+ reg, reg, cover set[11=(3,1)1 1 RTL tree f match node = +f cover set[2]=( 1 ) reg,
t
match node = +cover set[2]=( 1, 2 )
y m
reg, reg7Figure 7: T h e cover sets of a n RTL tree
5
Experimental Results
We have implemented two code generators for DLX and SPARC. The comparison of number of rules and size of instruction descriptions between gcc and Ar- d e n is shown in Table 1. Compared with gcc, Arden use fewer rules t o describe a target architecture. In addition, the preprocessing phase takes 0.2 second for DLX and 0.6 second for SPARC. T h e tables generated by preprocessing phase occupy 29 KB for DLX and 6 5 KB for SPARC. The program size of t h e tree pattern matching phase is 87
KB.
Table 3 summarizes the tree pattern matching time of t h e SPEC [13] benchmarks compiled by gcc and Arden. Compared with gcc, the average matching time can be reduced by 69%. In Ta- ble 4, only the compiler time is included in the three comparisons with the manufacturer’s C compiler (c- c), gcc, and Arden. T h e compiler time in Arden is less than in gcc and cc. Arden runs 1.06 times faster than gcc on average. All the above measurements are carried out on a SPARC 10 workstation.6
Conclusions
In the paper, we have presented a simple number comparison method for tree pattern matching t o pro- duce a code generator. Our experiment shows that this method can reduce the tree pattern matching time by 69%, and the instruction descriptions size of gcc is 3.92 times more than Arden on average. Moreover, this method can get a n optimal instructions for a n RTL tree. Because table generated through prepro- cessing phase is very small, t h e space which code gen- erator needs is greatly reduced. In other words, the tree pattern matching time and the complexity of s-
pace get a significant reduction. Furthermore, if we want to retarget the code generator t o different ma-
I . .
,
Annual Symposium on Principles of Programming Arden .05 .05 -08 .06 .05 .oa Languages, 1987, pp. 168-177.[4] Davison, J.
W.,
and Fraser, C. W., ”The Design and Application of a Retargetable Peephole Opti- mizer,” ACM Trans. Program Lang. Syst., Vol. 2, Table 1: Instruction descriptions size for gcc and .Ar- No. 2, Apr. 1980, pp. 173-190.den [5] Emmelmann, H. S., and Landwehr, F. W., ”BEG
-
A Generator for Efficient, Back Ends,” Proceed- ing of ACM Conference on programming Language Design and Implementation, Vol. 24, No. 7, June 1989, pp. 227-237.[6] Fraser, Christopher W., and Hanson, David R., ” A Code Generation Interface for ANSI C,” Software- Practice and Experience, Vol. 21, No. 9, Sep. 1991, pp. 963-988.
Table 2: Tree pattern matching time for C SPEC Benchmarks in seconds
chines, what we need is only t o change the instructi.on descriptions. Such a characteristic can help the archi- tect t o design a new architecture more easily.
References
[l] Aho, A. V., Ganapathi, M., and Tjiang, S. W. K., ”Code Generation Using Tree Matching a:nd Dynamic Programming,” ACM Trans. Pr0gra.m Lang. Syst., Vol. 2, No. 4, Oct. 1989, pp. 491-561. [2] Cattell, R. G. G., ”Automatic Derivation of Code Generators from Machine Descriptions,” ACM Trans. Program Lang. Syst., Vol. 2, No. 2, Apr. 1980, pp. 173-190.
[3] Chase, David R., ”An Improvement t o Bottom Up Tree Pattern Matching,” Proceedings of the 14th
[7] Hennessy, J . L., and Patterson,
D.
A., ”Computer Architecture: A Quantitative Approach,” Morgan Kaufmann Publishers Inc., San Mateo, 1990. [8] Hoffman, C. W., and O’Donnell, M. J., ”PatternMatching in Trees,” Journal of t h e ACM, Vol. 29, No. 1, January 1982, pp. 68-95.
[9] Lai, Feipei, Tsaur, F., and Shang, R., ”ARDEN -
ARchitecture Development ENvironment,” IEEE TENCON 92, NOV., 1992, pp. 181-185.
[lo] Proebsting, Todd A., ”Simple and Efficient BURS Table Generation,” Proceeding of ACM SIGPLAN’92 Conference on Programming Lan- guage Design and Implementation, June 1992, pp. 331-340.
[ll] Stallman, R. M., ”Using and Porting GNU CC (for version 2.2),” Free Software Foundation, Inc., Cambridge, Massachusetts, U.S.A, May 1992. [12] Glanville, R. S., ” A Machine Independent Algo-
rithm for Code Generation and its Use in Retar- getable Compilers,” Ph.D. Thesis, University of California, Berkeley, 1978. .13] Standards Benchmark Performance Suite Release Evaluation Corp. 2.0, Jan. 1992. SPEC
Table 3: Compiler time for C S P E C Benchmarks -in seconds