Chapter 3 Optimizations
3.2 Local Code Analyzer
Before discussing the local code analyzer, we define two terms first. If the value of a register will be updated after executing an instruction, we called the register a
“producer register”. On the other hand, if the value of a register is used but not updated, we called the register a “consumer register”.
The optimization techniques which implemented in the local code analyzer will separate into three categories. The first one is the elimination of redundant instructions. Such optimization technology will reduce the code size. This type of technology will be introduced in 3.2.1-3.2.2. The second category is replaced with the more efficient instructions, such optimization technology will not reduce the code space, but will make improve on the computational speed, and such optimization technology will be introduced in 3.2.3. The last category is the supporting type optimization technology, which is used to help increase the opportunity to eliminating redundant code. This type of technology will be introduced in 3.2.4-3.2.6.
3.2.1 Dead Code Elimination
Dead code elimination is a common compiler optimization. It is used to reduce code size by removing instructions which does not affect the program. In the low-level optimization, we eliminated the instructions which define useless register value.
For the example (Table 4), we found that gp register is loaded a value from memory address of “s1+0”, and then redefine its value at the 4th instruction. The gp register didn’t used between 2nd and 3rd instructions. So, we can remove the first instruction.
Table 4 Dead code elimination with redefine register value.
Before Dead Code Elimination
0xf77626f8 152: lwi $gp, [$s1+0]
0xf77626fc 156: sethi $a0, 33623 0xf7762700 160: ori $a0, $a0, 484 0xf7762704 164: lwi $gp, [$a0+0]
After Dead Code Elimination
0xf77626f8 152: lwi $gp, [$s1+0]
0xf77626fc 156: sethi $a0, 33623 0xf7762700 160: ori $a0, $a0, 484 0xf7762704 164: lwi $gp, [$a0+0]
Because the “jal” and “jral” opcode could update lp register value, so when implementing the dead code elimination algorithm, we should consider such situation.
The Table 5 demonstrates the status.
Table 5 Dead code elimination with link instruction.
Before Dead Code Elimination
0xf778c770 184: ori $lp, $lp, 2232 0xf778c774 188: seth $a2, 33442 0xf778c778 192: ori $a2, $a2, 2464 0xf778c77c 196: jal 0xf777800c After Dead Code Elimination
0xf778c774 188: sethi $a2, 33442 0xf778c778 192: ori $a2, $a2, 2464
Beside the situations of redefining register, we also regard the null operation instructions as dead code. Table 6 shows that the target register and the source register are both the same and the opcode is the moving instruction or adding a zero immediate, we can remove such instructions.
Table 6 Dead code elimination with null sequence.
Before Dead Code Elimination
0xf77634ec 1060: addi $a0, $a0, 0 After Dead Code Elimination
DELETE
3.2.2 Redundant Load/Store Elimination
The redundant load/store elimination is tried to find out useless load or store instruction. We record the target register, base register and the offset value as a node.
Once the target register or base register has been modified, we remove the instruction from our table. If we match the other instruction which is equal to the target register, base register and immediate value in our table, we could consider that the instruction is a redundant instruction. The Table 7 is a redundant load/store elimination example.
Table 7 The redundant load/store elimination example.
0xf77620b8 48: swi $s1, [$fp-8]
0xf77620bc 52: lwi $s1, [$fp-8]
In the above example, we could obverse that the first instruction and the second instruction have the same target register, base register and offset, so we can remove the second instruction safely.
3.2.3 Load Copy Optimization
The optimization stage is combining with redundant load/store elimination process. The optimization only considers the base register and offset value. If the base register and offset value are the same between two instructions, we could revise the second one as a move instruction. Table 8 shows an example of load copy optimization.
Table 8 The load copy optimization example.
Before Load Copy Optimization
0xf7789124 156: swi $s6, [$fp-4]
0xf7789128 160: lwi $s7, [$fp-4]
After Load Copy Optimization
0xf7789124 156: swi $s6, [$fp-4]
0xf7789128 160: addi $s7, $s6, 0
3.2.4 Common Sub-expression Replacement
Using common sub-expression could reduce the number of consumer register, and improve the opportunity of dead code elimination. When we encountered the opcode “addi”, we firstly judge whether the target register and source register are consistent. If they were inconsistent, we would record the target register, source register and immediate value.
As shown in Table 9, we could observe that the fp register is equal to sum of s0 register and immediate value ‘4’. In the second instruction, the fp register could replace as “s0+4”, and together with the offset 12, we can use “s0+16”. The replacement would reduce consumer register number, and indirectly enhance the
Table 9 The common Sub-expression example.
Before Common Sub-expression 0xf7762220 24: addi $fp, $s0, 4 0xf7762224 28: swi $a0, [$fp+12]
After Common Sub-expression
0xf7762220 24: addi $fp, $s0, 4 0xf7762224 28: swi $a0, [$s0+16]
If the target register and source register were consistent, we would abandon the node since it may cause errors as the following example shown in Table 10.
Table 10 The target register and source register were consistent.
3.2.5 Copy Propagation
Using copy propagation could reduce the number of consumer register, but it’s different from common sub-expression since it only record the target register and source register and could apply to more cases. When we encountered the opcode
“addi” and its immediate value was zero, we could record it into the table. Such instructions represent a moving action actually. Therefore, in a reasonable live range, when we encountered an instruction which its source register is the same as target
Before Common Sub-expression 0xf7762350 136: addi $s0, $s0, -4 0xf7762354 140: lwi $gp, [$s0+0]
After Common Sub-expression
0xf7762350 136: addi $s0, $s0, -4
0xf7762354 140: lwi $gp, [$s0-4] (error)
register in our table, we could replace it.
The following example (Table 11) demonstrates that the copy propagation could reduce the usage of consumer register and improve elimination chance. In the first instruction, we record the s1 and s8 as target and source register. And then, in the second register, we could replace s1 as s8. In the third instruction, we found it was redefining s1 register value and the first instruction could be eliminated safely.
Table 11 The copy propagation example with DCE.
Before copy propagation
0xf77c73c8 1288: addi $s1, $s8, 0 0xf77c73cc 1292: swi $s1, [$fp+40]
0xf77c73d0 1296: lwi $s1, [$fp-16]
After copy propagation
0xf77c73c8 1288: addi $s1, $s8, 0 0xf77c73cc 1292: swi $s8, [$fp+40]
0xf77c73d0 1296: lwi $s1, [$fp-16]
After dead code elimination
0xf77c73cc 1292: swi $s8, [$fp+40]
0xf77c73d0 1296: lwi $s1, [$fp-16]
3.2.6 Constant Propagation and Constant Folding
Constant Propagation process recorded the register which its value was known. If the opcode is “movi”, we push the target register and its value into table. And then apply it to the following instruction shown in Table 12.
register in our table. If the same, we could counting its real value and revise it as
“movi” instruction.
The second column is the type of ALU instruction without an immediate value.
If the second source code was the same as target register in the table, we could replace it with the type of supporting immediate format.
The last column is a set of memory access instruction which offset was register type. If the offset register was the same as target register in recorder, we could replace it as corresponding type which is supporting immediate offset.
Table 12 The applicable instructions for constant propagation and constant folding.
Constant Folding Arithmetic Propagation Memory Address Propagation
Before After Before After Before After
ADDI MOVI ADD ADDI LB LBI
SUBRI MOVI SUB SUBRI LBS LBSI
XORI MOVI XOR XORI LH LHI
ORI MOVI OR ORI LHS LHSI
ANDI MOVI AND ANDI LW LWI
SLL SLLI SB SBI
SRL SRLI SH SHI
SW SWI
All of three types above could reduce one consumer register, therefore increment the opportunity of redundant code indirectly. The following example (Table 13) demonstrated how constant propagation work. According to the first instruction, we could record the gp register as ‘0’. In the fourth instruction, we could replace gp register as ‘0’, and replace the opcode “addi” to “movi”.
Table 13 The constant propagation example.
Before Constant Propagation 0xf77f38cc 76: movi $gp, 0
0xf77f38d0 80: swi $gp, [JFP_$fp-44]
0xf77f38d4 84: lwi $s1, [JFP_$fp+36]
0xf77f38d8 88: addi $s4, $gp, 0 After Constant Propagation
0xf77f38cc 76: movi $gp, 0
0xf77f38d0 80: swi $gp, [JFP_$fp-44]
0xf77f38d4 84: lwi $s1, [JFP_$fp+36]
0xf77f38d8 88: movi $s4, 0