• 沒有找到結果。

Comparison with 16KB ICache Only Environment

Chapter 4 Experiments and Results

4.4 Comparison with 16KB ICache Only Environment

In the end, our further design is compared with the execution environment containing only a 16KB instruction cache (no SPM). At first, further design (8KB ICache + 8KB SPM) is compared. Although the total capacity of the 8KB instruction cache and the 8KB SPM equals the capacity of the 16KB instruction cache, the total area cost of the 8KB instruction cache and the 8KB SPM is merely 84.5% of the area cost of the 16KB instruction cache. The comparison result is presented in Figure 4-33, where the baseline is the execution environment containing only a 16KB instruction cache. From this figure, we can derive that the average execution time with further design for CLDC HI is 3.3% less than with the 16KB instruction cache only environment, and the average execution time with further design for GB is 0.94% more than with the 16KB instruction cache only environment. Altogether, instruction cache miss stall cycles and the execution time with further design are averagely 7.64% fewer and 1.18% less than with the 16KB instruction cache only environment. In short, our further design (8KB ICache + 8KB SPM), with 15.5% less cost, performs better (1.18%

60

less execution time) than the 16KB instruction cache only environment. If the two benchmarks with few original instruction cache miss stall cycles, Image Manipulation and Queen, are excluded, the execution time with further design is averagely 3.05% less than with the 16KB instruction cache only environment. For the benchmark that performs best with further design, Richards, the execution time is 11.54% less than with the 16KB instruction cache only environment.

Figure 4-33 Comparison of Execution Time between Further Design (8KB ICache + 8KB SPM) and 16KB Instruction Cache Only Environment

Figure 4-34 shows the comparison of instruction cache miss rate between further design (8KB ICache + 8KB SPM) and the 16KB instruction cache only environment. For CLDC HI, the instruction cache miss rate with further design is averagely 0.14% lower than with the 16KB instruction cache only environment. For GB, the instruction cache miss rate with further design is averagely 0.02% higher than with the 16KB instruction cache only environment. Overall, the average instruction cache miss rate with further design is 0.06%

lower than with the 16KB instruction cache only environment.

61

Figure 4-34 Comparison of Instruction Cache Miss Rate between Further Design (8KB ICache + 8KB SPM) and 16KB Instruction Cache Only Environment

Next, our further design (8KB ICache + 11.6KB SPM) is compared with the execution environment containing only a 16KB instruction cache. The total area cost of the 8KB instruction cache and the 11.6KB SPM is equal to the area cost of the 16KB instruction cache.

Figure 4-35 is the comparison result, where the baseline is likewise the 16KB instruction cache only environment. It is revealed that the execution time with further design for CLDC HI is averagely 5.5% less than with the 16KB instruction cache only environment, and the execution time with further design for GB is averagely 0.5% less than with the 16KB instruction cache only environment. Overall, instruction cache miss stall cycles and the execution time with further design are averagely 18.04% fewer and 3% less than with the 16KB instruction cache only environment. In a word, our further design (8KB ICache + 11.6KB SPM), with the same cost, outperforms (3% less execution time) the 16KB instruction cache only environment. Similarly, if the two benchmarks, Image Manipulation and Queen, are excluded, the average execution time with further design is 5.96% less than with the 16KB instruction cache only environment. The execution time of the benchmark that performs best with further design, Richards, is 15.44% less than with the 16KB instruction

62

cache only environment.

Figure 4-35 Comparison of Execution Time between Further Design (8KB ICache + 11.6KB SPM) and 16KB Instruction Cache Only Environment

Figure 4-36 is the comparison of instruction cache miss rate between further design (8KB ICache + 11.6KB SPM) and the 16KB instruction cache only environment. The average instruction cache miss rate with further design for CLDC HI is 0.24% lower than with the 16KB instruction cache only environment. The average instruction cache miss rate with further design for GB is 0.03% lower than with the 16KB instruction cache only environment.

Altogether, the instruction cache miss rate with further design (8KB ICache + 11.6KB SPM) is 0.14% lower than with the 16KB instruction cache only environment on average.

63

Figure 4-36 Comparison of Instruction Cache Miss Rate between Further Design (8KB ICache + 11.6KB SPM) and 16KB Instruction Cache Only Environment

In principle, as long as the original instruction cache miss stall cycles of a benchmark are not small, the performance with our further design (8KB ICache + 11.6KB SPM) would be better than with the 16KB instruction cache only environment. For example, Richards, Delta Blue, and Chess with our further design all have better performance than with the 16KB instruction cache only environment. Yet there is only one benchmark violating this principle, which is kXML. Instead, its performance with our further design is worse than with the 16KB instruction cache only environment. The cause is that the capacity of the instruction cache in our design is only 8KB, which is too small to well serve the other code (not JIT-compiled code) of kXML, and the candidate for SPM allocation in our design, which chiefly aims to reduce instruction cache misses caused by JIT-compiled code, is only JIT-compiled code, so instruction cache miss stall cycles caused by the other code (not JIT-compiled code) with our further design are much more than with the 16KB instruction cache only environment.

Nevertheless, our further design still effectively reduces instruction cache miss stall cycles caused by JIT-compiled code for kXML, and the stall cycles caused by JIT-compiled code are much fewer than with the 16KB instruction cache only environment.

64

相關文件