Address Generation Units in DSPs - 數位訊號處理器中位址產生單元之位址位移配置最佳化

Chapter 2 Backgrounds

2.1 Address Generation Units in DSPs

Address generation units (AGUs) are the special architecture for memory address computation. DSP processors equipped with AGUs can perform indirect address computations in parallel to the execution of other machine instructions. The AGUs feature for indirect addressing are present in many DSP architectures (e.g., TI TMS320C25, Motorola DSP56k, ADSP-210x) and differ mainly in the following parameters [1,2]：

 The number k of address registers (ARs). ARs store the effective addresses of variables in memory and can be updated by load and modify (i.e., adding or subtracting a constant) operations.

 The number m of modify registers (MRs). MRs can be loaded with constants and are generally used to store frequently required AR modify values.

Figure 2-1 Generic address generation unit (AGU) model for DSPs

Further differences in the detailed AGU architectures of DSPs are whether MR values are interpreted as signed or unsigned numbers, and whether ARs and MRs are orthogonal, i.e., whether each MR can be used to modify each AR.

We consider indirect addressing based on the genetic AGU model depicted in Figure 2-1. The AGU contains a file of k address registers and a file of m modify-registers. The indices for ARs and MRs are provided by two AGU inputs：

The AR pointer (ARP) and the MR pointer (MRP). The third AGU input is an immediate value, originating from the instruction word, which can be used to load AR[ARP] or MR[MRP], or to immediately modify AR[ARP]. Further, AR[ARP]

modification can also by adding or subtracting the contents of MR[MRP], or adding the value +1/–1.

Table 2-1 AGU operations and cost values

Operation Functionally Cost value

AR load AR[ARP] = imm 1

MR load MR[MRP] = imm 1

AR immediate modify AR[ARP] += imm 1

AR auto-increment AR[ARP] ++ 0

AR auto-decrement AR[ARP] – – 0

AR auto-modify AR[ARP] += MR[MRP] 0

ARP load ARP = imm 0

MRP load MRP = imm 0

Table 2.1 shows the AGU operations and cost values for each operation. The functionalities are given in C-like notation, where “imm” denotes an immediate value.

Further, immediate value occupies a large portion of the total instruction word-length, so that these operations usually inhibit execution of other machine instructions in parallel. Like all other register transfer (RT) patterns, we assume AGU operations to be executed in a single machine cycle, so that the results are valid in the following cycle.

“AR load”, “MR load”, and “immediate modify”, which involve immediate value in the instruction word. These operations cannot be performed in parallel to other operations, but introduce an extra machine instruction. Therefore, we assign the cost value 1 to these operations. On the other hand, “auto-increment”,

“auto-decrement”, and “auto-modify” only utilize AGU resource and can be regarded as zero-cost operations. These operations can be executed without any overhead in code size or speed. The same hold for “ARP load” and “MRP load”： These require

only “short” immediate values (of length 2 to 3), which are (in direct form) instruction word fields, or (in indirect form) originate from registers which can be loaded in parallel (e.g., TMS320C2x). In indirect form, the required ARP contents must be prepared one machine cycle earlier than in direct form, but this has no impact on the cost metric [2].

Example

To simplify the exposition of address offset assignment, we use a simple processor model that reflects the indirect addressing arithmetic of most DSPs. The model is an accumulator-based machine where, for each instruction, one operand resides in the accumulator and another operand resides in the memory. The operand involves memory is referenced through one of the address registers (AR0, AR1 …).

ARi can point to the desired position by adding or subtracting an immediate value, using the instructions “ADAR” and “SBAR”. Also, we use the instructions “LDAR”

and “LDMR” to load ARi and MRi.

We use *(ARi), *(ARi)+, *(ARi)-, *(ARi)+MRi to denote indirect addressing through ARi, indirect addressing with post-increment, indirect addressing with post-decrement and indirect addressing with post-modify, respectively.

Consider the C code sequence shown in Figure 2-2(a). Assume that the address offset assignment to the various variables is as shown in Figure 2-2 2-2(b). The assembly code for the C program is shown in Figure 2-2(c). In the assembly code, the comment after an instruction indicates which variable AR0 point to after the instruction is executed. The instruction SBAR and ADAR are used to change AR0 to point to the frame location accessed in the next instruction.

Figure 2-2 Example of address arithmetic with AGU

Assume that AR0 initially points to the position 1 of the frame, i.e., variable b and MR0 is initialized by a constant 2. The value of the variable b is loaded in the accumulator, and AR0 is modified by the value of MR0 in the first “LOAD”

instruction. In the fourth instruction “ADD”, the values in b and d are summed and stored in the accumulator. Next, the contents of the accumulator must be stored in the location corresponding to variable a, but AR0 point to d. Therefore, we have to subtract 3 from the content of AR0 using an explicit instruction “SBAR AR0, 3”.

Then, the instruction “STOR” is used to store the contents of accumulator to the location of a; futher, AR0 is incremented and points to the location of b. When the assembly instructions corresponding to “d = b + c” are to be executed, variables access order of variables is b, c, then d. We can see that the locations of these variables are continuous in Figure 2-2(b). So, these address arithmetic operations can be subsumed in “LOAD”, “ADD” or “STOR” instructions. The objective of the solution to the address offset assignment is to find the minimal address pointer arithmetic instructions required using proper placement of variables in memory.

LDAR AR0, 1 ; b

在文檔中數位訊號處理器中位址產生單元之位址位移配置最佳化 (頁 14-19)