• 沒有找到結果。

CHAPTER 2. RELATED WORKS

2.3 Loop Detection

As mention previously, we could use the characteristic of loops in embedded systems.

These loops are small and frequently executed. It is widely known that loops have the temporal and spatial locality. In other words, the same instructions of a loop would be repeatedly accessed in the certain period of program execution. Often the certain period of program execution is finished, the loop may be accessed infrequent or never be accessed again. Thus, we must determine if the current loop should be placed into the loop buffer. As soon as the loop becomes hot, we should place it into the loop buffer. We can use the instructions of loop buffer to reduce the operations in instruction fetch and decoded stage, because the instructions of frequent loops are already put in the buffer. Thus, repeated instruction fetch and decode operations can be avoided. Otherwise, if the instructions of loop buffer are not accessed often when the loop is detected, the instructions of the loop in the loop buffer should be replaced. In this section, we will introduce two basic methods to decide whether the loop should be placed into the buffer, and we also explain why the two methods would bring some problems.

2.3.1 Counter-Based Loop Detection

The first method is counter-based loop detection [1]. The method takes advantage of the loops that may have a backward branch instruction. This characteristic is based on a special class of branch instructions, called the short backward branch instruction (sbb).

Short backward branch instruction is shown in Fig 2-12.

Because it is based on short backward branch instruction, the upper displacements are all ones (indicating a negative branch displacement). The lower portion of displacement is w-bits wide. By definition, a sbb has a maximum backward branch

distance given by 2w instructions. The size of loop buffer is also given by 2w instructions. When a sbb instruction is detected and found to be taken, the hardware assumes that a program is executing a loop and initiates all the suitable control actions to utilize the loop buffer. The sbb is called the triggering sbb.

opcode 111…111 xx...xx

upper displacement lower displacement

w bits Branch displacement

Fig 2-12 Short Backward Branch Instruction

Count_Register +1

18

Counter-based scheme to monitor sbb executions is shown in Fig 2-13. When a sbb is encountered and taken, its lower displacement is load into a w-bit increment counter called Count_Register. The sbb becomes the triggering sbb, and the loop buffer controller enters FILL state. In this state, the instructions that being fetched from instruction cache fill into the loop buffer by controller. The hardware increases this negative displacement by one, each time an instruction is executed sequentially. As the negative value in the Count_Register increases and becomes zero, the controller knows that the instruction currently being executed is the triggering sbb. If the triggering sbb is not taken, controller returns to the IDLE state. Otherwise, it enters the ACTIVE state. In the ACTIVE state, the instructions that originally request to the instruction cache would be directed to the loop buffer by the controller.

Although the method only increases a little overheads and it is very simple, it also have some critical problems. First, the loop that has a triggering sbb cannot have any nested loop. Second, the loop just taken one time, and the instructions of the loop would be putted into loop buffer. Thus, a frequently executed loop might be replacement by a loop that just taken. Third, the loop buffer only stores the instructions of a loop, the other unused space of loop buffer is wasted.

2.3.2 Basic Block Loop Detection

Another way for detecting loop is Basic Block Loop Detection [3] Comparing with

counter-based loop detection, this way is more flexible and simple. Basic block loop detection regards a program is composed of many basic blocks. The basic block is a straight-line code sequence composed of non-branch instructions and one branch instruction, which determines the direction of the following instruction stream. The basic block in program is shown in Fig 2-14.

Non-branch instrucitons Branch instruction Basic block 1:

Basic block 2:

Basic block 3:

Non-branch instrucitons Branch instruction

Non-branch instrucitons Branch instruction

PBAR SC

Fig 2-14 Basic Blocks in Instruction Stream

In Fig 2-14, every line starts from a branch instruction of basic block to a

20

would jump to a non-branch instruction when those branch instructions taken.

Basic block loop detection could be described as follows:

(1) When a branch instruction is fetched, the hardware would compare the address of current branch instruction with the value of Previous Branch Address Register (PBAR). If the two values are equal to each other, it means that the loop was executed because the same branch instruction is fetched again. Thus, if two values match and the current branch instruction is predicted as a taken branch, same instruction within the loop will be fetched again. At the same time, these instructions are transfer to the loop buffer.

Otherwise, if the two values mismatch, it means that a new basic block will be executed. In this status, the PBAR would store the address of current branch instruction and reset the value in Size Counter (SC).

(2) When a non-branch instruction is fetched, the value in Size Counter would be increased by 1. Until a branch instruction is encountered, Size Counter always is added one because of fetching non-branch instruction. Thus, the number of instructions of the previously executed loop could be got from Size Counter.

However, this way also has some troubles. Comparing with the counter-based loop detection, basic block loop detection does not check the displacement of branch instruction. It can reduce some overheads in hardware, but it is also the major problem in counter-based loop detection. That major problem is the frequent loop may be replaced

by an infrequent loop. In other word, the frequent loop must be fetched and transferred into loop buffer again, because it is replaced by an infrequent loop.

相關文件