• 沒有找到結果。

Chapter 3 Transformations

3.3 Loop Transformations

Loop transformations operate on the statements which comprise a loop (i.e. these transformations modify either the body or control structure of the loop). Because a large percentage of the execution time of programs is spent in loops, these transformations can have a very remarkable impact on energy consumption. Hence, there are a number of researches and approaches based on loop transformations because of the importance of loop transformations.

3.3.1 Loop Fusion

This transformation combines one or more loops with the same bounds into a single loop [21]. It reduces loop overhead; as a result, the number of the instructions executed is reduced.

Besides, it can also be used for improving data cache locality by bringing the statements that access the same set of data to the same loop [20]. But it is noted that if the increased loop body becomes larger than instruction cache, it will increase instruction cache misses; as a result, energy consumption will be increased.

3.3.2 Loop Fission

Loop fission does the opposite operation to loop fusion [21]. The goal of this transformation is to break down larger loop body into smaller ones to reduce the size of loop body to fit into instruction cache, and then it reduces instruction cache misses. It is noted that computation energy is increased due to the new loop overheads. Therefore, we need to be careful to decide that loop fusion or loop fission should be applied to loops in order to reduce energy consumption effectively.

3.3.3 Loop Reversal

This transformation reverses the order in which a specific loop’s iterations are performed [13]. In some loops of which loop body exists dependence, loop reversal may eliminate the dependence; as a result, it allows other transformations to be applied. Besides, a special case of loop reversal which is useful in ARM architecture was presented in [25]. To apply loop reversal transformation makes an incrementing loop to a decrementing loop which becomes a count-down-to-zero loop. It causes the original ADD/CMP instruction pair to be replaced by a single SUBS instruction; because of this, it saves compares in critical loops, leading to reduced code size, increased performance and reduced energy consumption.

3.3.4 Loop Inversion

Loop inversion transforms a while loop to a repeat loop (i.e. it moves the loop conditional test from before the loop body to after it) [13]. It results in only one branch instruction needed to be executed to leave the loop, rather than one needed to return to the beginning and another needed to leave after the loop conditional test at beginning. Hence, it is expected that energy consumption is reduced due to the reduced number of the instructions executed. It is noted that this transformation is only safe when the loop body is executed at least once.

3.3.5 Loop Interchange

This transformation reverses the order of two adjacent loops in a loop nest to change the access paths to arrays [14]. It improves the chances that consecutive references are in the same cache line, leading to reduced data cache misses. Hence, reduced energy consumption can be expected.

3.3.6 Loop Unrolling

Loop unrolling replaces the body of a loop by U (the unrolling factor) times copies of the body and modifies the iteration step from 1 to U [13]. The original loop is called the rolled loop. Loop unrolling reduces the overhead of a loop by performing less compare and branch instructions (i.e. better performance) and may improve the effectiveness of other transformations, such as common-sub-expression elimination and software pipelining, etc. It also allows the compiler to get a better register usage of the larger loop body.

On the other hand, the unrolled loop is larger than the rolled loop, so it increases code size and may impact the effectiveness of the instruction cache, leading to increased instruction cache misses. So deciding which loops to unroll and by what unrolling factors is very important.

Besides, there is another form of loop unrolling that applies to the loop which is not a

counting loop [14]. It can unroll the loop and leave the termination conditions in place. This technique has benefits when dealing with a while loop in which later transformations can be used.

3.3.7 Loop Unswitching

This transformation moves loop-invariant conditional branches to the outside of loops [13]. It reduces the number of the instructions executed due to the reduced number of codes executed in the loop body. If the conditional only has if part, loop unswitching has little impact on code size. But if the conditional has else parts, it will need to copy the loop into every else parts of conditional. Hence, it increases code size and instruction cache misses.

3.3.8 Miscellany

In addition to the above loop transformations, other energy-efficiency strategies on loop transformations can be envisioned.

Loop permutation is a general version of loop interchange [13]. It allows more than two loops to be reordered to reduce data cache misses; as a result, it is expected to reduce energy consumption.

Loop tiling achieves the goal of reduction of capacity and conflict misses which are resulted from cache size limitations [21]. It improves cache performance by dividing the loop iteration space into smaller tiles. This also results in a logical division of arrays into tiles, so it may increase reuse of array elements within each tile. By the correct selection of tile sizes, conflict misses which occur when several data elements compete for the same cache line can be eliminated. But it also increases the number of instructions executed and code size because of the increased nesting loops.

Software pipelining can improve the execution performance of loops [16]. It eliminates the dependences between adjacent statements by breaking the operations of single loop

iteration into S stages, and arranges the code in such a way that stage 1 is executed on the instructions originally belonging to iteration i, stage 2 on those of iteration i-1, etc. It makes pipeline performance better through pipeline stalls reduction. Hence, CPU cycles consumed are expected to reduce. But it may increase the number of instructions executed due to the calculation of iteration i. It also increases code size because startup code is generated before the loop to initialize and cleanup code is generated after the loop to finish operations.

相關文件