Chapter 4 Experiment
4.2 ReassignCost
Registers such as ArgRegs, RetRegs, contain specific value are given a ReassignCost. We will describe the calculation of ReassignCost in this section. There are four ArgRegs (e.g., $4,
$5, $6, and $7) and two RetRegs ($2 and $3) in MIPS. The first four argument are pass to ArgRegs, and the pass order is started from $4 to $7 (i.e. the first argument is passed to $4, the second is passed to $5, and so on). If there are more than four arguments, the rest of them will store in stack frame. Normally, the return value is stored in $2, but if the return value is larger than one register could hold, $3 will be used.
Basically, a 32-bit instruction is 4 bytes and a 16-bit instruction is 2 bytes. We need four additional instructions (i.e. sw, lw, addu(as move instruction), and NOP after lw instruction) per call for handling calling convention. Therefore, the InstrSize is equal to 10 bytes since three of these additional instructions can be converted to 16-bit equivalents. MovArgIS and MovRetValIS are move instructions, their size are 4 bytes. Note that ArgRegs do not use in returning value, so their MovRetValISi is zero. Similarly, the MovArgISi of RetRegs are zero.
Above factors for calculating ReassignCost are listed in Table 4-2.
Table 4-2 Factors for calculating ReassignCost
Register i InstrSize (bytes) MovArgISi (bytes) MovRetValISi (bytes)
$2-$3 10 0 4
$4-$7 10 4 0
Next we get the NCi from analysis phase, and we calculate each ReassignCosti according to it. For example, we get the analysis result of main function in CRC32 benchmark, as shown in Table 4-3. There are two arguments of CRC32's main function, so we need to insert two move instructions after prolog to move values from $4 and $5 to their own new registers, if both $4 and $5 have been reassigned. The ReassignCost of ArgRegs are shown in
Table 4-4. There are eight function calls of CRC32's main function, indexing from 0 to 7. The ReassignCost4 is 84 bytes because InstrSize × NC4 = 80 and plus MovArgIS4 (4 bytes).
Table 4-3 Analysis result of main function in CRC32 benchmark Name of Calls Index # of Arguments Registers used
fopen 0 2 $4, $5
_IO_getc 1 1 $4
perror 2 1 $4
_IO_getc 3 1 $4
ferror 4 1 $4
perror 5 1 $4
fclose 6 1 $4
printf 7 4 All ArgRegs
Table 4-4 The ArgRegs' ReassignCost of main function in CRC32 Registers
i
NCi Have used in arguemtns of CurrentFunction?
ReassignCosti (Bytes)
$4 8 Yes 10×8+4 = 84
$5 2 Yes 2×8+4 = 20
$6 1 No 1×8 = 8
$7 1 No 1×8= 8
We find out the RetRegs were be selected in most cases by observing each mapping pairs of all benchmarks. That means RetRegs do not reassign to new registers at most of time, hence we seldom insert additional instructions for them. Besides, we cannot know the exact numbers of RetRegs precisely that both CurrentFunction and function calls are required.
Accordingly, we do not analyze for the return value of function calls or CurrentFunction. But if any RetRegs has been reassigned, we assume it is used in storing the return value of all function calls (i.e. eight function calls in CRC32's main function) and the CurrentFunction.
Then we have to insert additional instructions for them to conform with calling convention.
Table 4-5 shows the ReassignCost of RetRegs. In this table we know $2 has been reassigned, so ReassignCost2 is the sum of InstrSize*NCi and MovRetValISi.
We get the TotalCost by summing up all ReassignCost. The TotalCost of CRC32's main function is larger than the code size reduction we could get, so we leave this function unchanged.
Table 4-5 The RetRegs' ReassignCost of main function in CRC32 Register i NCi Has been reassign? ReassignCosti (Bytes)
$2 8 Yes 10×8+4 = 84
$3 8 No 0
4.3 Experimental Result
This section presents the performance of our register reassignment methods. We use direct translation as the baseline for comparison. In direct translation, we examine each instruction in turn, converting it to a 16-bit version if possible. We calculate the code size and the ratio of additional instructions under the two register re-assignment methods, respectively.
In last section we show the analysis by comparing two methods and give summary.
Table 4-6 Code size reduction and additional instructions of Method I Number of
Table 4-7 Code size reduction and additional instructions of Method II rawdaudio, and CRC32 the cost is larger than the profit. We could get more code reduction from larger programs, such as gzip, mcf, blowfish, and rijnadel, and from functions that make few function calls.
Figure 4-1 Code Size Reduction of Register Reassignment Methods
Compared with Method I, Method II has no significant improvement in reducing code size.
The reason is that the weight used in Method II, which is simply the number of times two registers are used in the same instructions, should be biased toward instructions with fewer register operands. Different instructions might have different numbers of register operands.
For converting a 32-bit instruction to a 16-bit equivalent, all the register operands used in the instruction must be mapped to registers in RegS.
Figure 4-2 shows the additional instructions rate in each benchmark program. The rate of Method I is on average 0.99%, and Method II is 0.77%. As a rule of thumb, the more ArgRegs and RetRegs are reassigned to other registers, the more additional instructions are inserted. In blowfish, the additional instructions ratio in Method I is much higher than that in Method II, since the largest function in blowfish is profitable for conversion in Method I but is not so in Method II. Hence, Method II will leave the function unchanged, but Method I will perform register reassignment on it. This causes numerous additional instructions.
Figure 4-2 Additional Instruction Rate
There are no additional instructions in rawcdaudio, rawdaudio, and CRC32, since register reassignment is not done on them. Because the stringsearch benchmark is small, the overhead is relatively high. The bitcount benchmark has no additional instructions because in lots of functions, all the registers are mapped to themselves. The rijndael benchmark has lower cost than others since the mapping pairs in most functions do not cause the calling convention problem.
Chapter 5 Conclusion and Future Work
In this thesis we present two register re-assignment methods for mixed-width ISA. On the average, the two methods reduce 28% of the code size. In contrast, a direct translation reduces 26.7% of the code. If a ArgRegs or RetRegs have been reassigned, and at the same time they are used for arguments or the return value, the reassignment comes with a cost. We could get more code reduction from larger programs, such as gzip, mcf, blowfish, and rijnadel, and from functions that make few function calls.
From the experimental results, we observed that the effects of Method I and Method II are not much different. The main reason might be that the weights do not consider the number of register operands in an instruction. We plan to modify the weights by taking the number of operands into consideration in the future.
References
[1]. A. Krishnaswamy and R. Gupta, "Mixed-Width Instruction Sets," In Communications of the ACM, Vol. 46, No. 8, 2003
[2]. Sun Microsystems. CDC HotSpot Implementation Dynamic Compiler Architecture Guide, 2005
[3]. S. Furber. ARM System Architecture. Addison-Wesley, 1996. ISBN 0-201-40352-8.
[4]. ARM Corporation. Thumb ISA. http://www.arm.com/products/CPUs/ARM7TDMI.html [5]. MIPS32 Architecture for Programmers Volume IV-a: The MIPS16 Application Specific
Extension to the MIPS32 Architecture. 2001
[6]. Andes Technology. Andes Instruction Set Architecture Specification, 2008.
[7]. Aviral Shrivastava, Partha Biswas, Ashok Halambi, Nikil Dutt, Alex Nicolau,
"Compilation framework for code size reduction using reduced bit-width ISAs (rISAs),"
ACM Transactions on Design Automation of Electronic System (TODAES), v.11 n.1, p.123-146, January 2006.
[8]. Bor-Sung Liang, June-YuhWu, Jih-YiingLin, Ming-Chuan Huang, Chi-Shaw Lai, Yun-Yin Lien. Ching-HuaChang, Pei-Lin Tsai, Ching-PengLin, SunplusTechnol. Co., Ltd., Hsinchu, Taiwan. "Instruction set architecture scheme for multiple fixed-width instruction sets and conditional execution". 2005 IEEE VLSI-TSA International Symposium on VLSI Design, Automation and Test, 2005. (VLSI-TSA-DAT).
[9]. A. Krishnaswamy and R. Gupta, "Profile guided selection of ARM and Thumb instructions." In Proceedings of LCTES/SCOPES, Berlin, Germany, June 2002.
[10]. Chris Lattner and Vikram Adve. "LLVM: A compilation framework for lifelong program analysis & transformation," Proceedings of the international symposium on Code
generation and optimization: feedback-directed and runtime optimization, p.75, March 20-24, 2004, Palo Alto, California
[11]. C. Lattner et al. The LLVM Compiler Infrastructure. http://llvm.org/
[12]. Matthew R. Guthaus, Jeffrey S. Ringenberg, Dan Ernst, Todd M. Austin, Trevor Mudge, Richard B. Brown, "MiBench: A free, commercially representative embedded benchmark suite", IEEE 4th Annual Workshop on Workload Characterization, Austin, TX, December 2001.
[13]. C. Lee, M. Potkonjak, and W. H. Mangione-Smith, "MediaBench: A tool for evaluating and synthesizing multimedia and communications systems," in Proceedings of the 30th Annual International Symposium on Microarchitecture, (Research Triangle Park, North Carolina), pp. 330-335, Dec. 1-3, 1997.
[14]. SPEC: Standard Performance Evaluation Corporation. http://www.spec.org, September 2000.