• 沒有找到結果。

Overview of PACDSP

3.5 VLIW Datapath

As shown in Fig. 3.4, the VLIW data path of PACDSP is constructed with distributed reg-ister file: ping-pong regreg-isters, accumulator regreg-isters, address regreg-isters, constant regreg-isters and some control flags.

If the instruction must write into two consecutive destination registers, for example, DLW and FMUL.D, the destination register number has to be even because of banked structure.

The VLIW data path of PACDSP is constructed in two clusters, and each contains an arithmetic unit (AU) and a load/store unit (L/S) as shown in Fig. 3.5. Therefore, it can execute four instructions simultaneously, and is thus called a four-way VLIW data path. The VLIW data path supports SIMD (single instruction multiple data) operation. It executes in three modes: single (32-bit or 40-bit), dual (16-bit) and quad (8-bit). There are also three types of precision in the data path of PACDSP: full, integer and fractional.

Arithmetic Unit (AU)

The arithmetic unit comprises 40-bit modules which are divided according to functions.

The function types supported by the AU are shown below:

• Arithmetic and comparison instructions.

Table 3.2: System Register File [1]

No Name Size(bits) Note

SR0 PREDN 16 Predication information SR1 EN INT 1 Interrupt enable flag SR2 MSK EXC 16 Mask inside exception SR3 SWI EXC 16 Software exception SR4 CF0 32 Custom function register 0 SR5 CF1 32 Custom function register 1 SR6 CF2 32 Custom function register 2 SR7 CF3 32 Custom function register 3

SR8 SD Status 8 Mix information 0’s shadow register SR9 SD CPC 32 CPC’s shadow register (ISR return address) SR10 SD BCTG 32 Branch target’s shadow register

SR11 SD R0 32 R0’s shadow register

SR12 Mode 4 Power mode register

SR13 CFU Info Sel 4 CFU Info select register SR14 EXC Cause 16 Exception cause

SR15 Reserved 32 N.A.

• Data transfer instructions.

• Bit manipulation instructions.

• Multiplication and accumulation instructions.

• Special instructions.

All data processing instructions in AU begin at the same stage but not finish at the same time due to different computing complexity.

Load/Store Unit (L/S)

The load/store unit (L/S) comprises 32-bit modules except for one 16-bit address genera-tion unit (AGU) which is used to support the different addressing modes. The funcgenera-tional

Figure 3.4: The VLIW datapath register organization [1].

types supported by L/S are as follows:

• Arithmetic and comparison instructions.

• Data transfer instructions.

• Bit manipulation instructions.

• Load and store instructions.

• Special instructions.

Like AU, all instructions in L/S begin at the same stage but not finish at the same time due to different computing complexity.

The L/S unit supports powerful double load/store instructions, which can load or store two operands in one instruction. It also supports instructions that load and store by bytes or half-words. These instructions make memory access easier and more convenient.

Figure 3.5: The four-way VLIW datapath of PACDSP [1].

3.5.1 Ping-Pong Register File

The ping-pong register file contains sixteen 32-bit registers which are divided into two groups: D0–D7 and D8–D15. The AU and the L/S units can access the ping-pong register file at the same time but the registers have to be in different groups. In other words, both units cannot read or write the same group simultaneously. All possible access conditions are as follows:

• LS reads D0–D7 and writes D0–D7, and AU reads D8–D15 and writes D8–D15.

• LS reads D0–D7 and writes D8–D15, and AU reads D8–D15 and writes D0–D7.

• LS reads D8–D15 and writes D0–D7, and AU reads D0–D7 and writes D8–D15.

• LS reads D8–D15 and writes D8–D15, and AU reads D0–D7 and writes D0–D7.

3.5.2 Address/Accumulator Registers

As shown in Fig. 3.4, the address registers (A0–A7) are all 32-bit and they are dedicated to the load/store (L/S) unit for memory accesses. PACDSP supports several addressing

modes. In modulo addressing mode, A0 and A2 are treated as pointers, A1 and A3 con-tain base addresses, A4 and A6 concon-tain the values of end address plus one, and A5 and A7 are treated as displacements. So it can support two groups of modulo addressing:

(A0,A1,A4,A5) and (A2,A3,A6,A7). In other addressing modes, they can be used as address storage or data processing storage according to the design of the user.

The accumulator registers (AC0–AC7) are 40-bit registers which are dedicated to the arithmetic unit (AU) for data manipulations. The most significant eight bits are guard bits for accumulation operations.

3.5.3 Constant Registers

To avoid high frequency of data movement in the register file, PACDSP provides a small constant register file to keep fixed data. The constant register file has eight 32-bit registers (C0–C7). They can be read as either the first operand or the second operand in instructions that use them. But one instruction cannot simultaneously access the constant register file as both of its source operands.

The constant register file can be read by both the AU and the L/S unit but can only be written by the L/S unit. All accesses to the constant register file must be pointed by the control flags CF0 and CF1, which are pointers to the constant registers. And they are calculated from the values contained in CF2 and CF3, which are the contents of the pointers.

3.5.4 Status and Control Registers

A status register and a control register are provided to monitor the DSP kernel status and handle the operation mode of the DSP kernel. The program status register records the operation status in each cluster and the scalar unit. It includes Overflow, Negative, and Carry bits, and instructions can only read the status register but not set it. There are several addressing modes supported by PACDSP. The addressing mode control register (AMCR) is a 16-bit register. This register is used to set the addressing mode for each address register. The addressing modes are related to where the operands are to be found

and how the address calculations are to be made. The definitions are shown in Table 3.3.

3.5.5 Addressing Modes

PACDSP supports these addressing mode for memory access: linear addressing mode, bit-Reverse addressing Mode, and modulo addressing mode for memory access. They can be altered by setting the AMCR. Table 3.4 shows the syntax of addressing modes that be used and the supporting units in each case.

Fig. 3.6 shows that the address register file A0–A7 is classified into even and odd banks in linear and bit-reversed addressing modes. Some addressing modes use two ad-dress registers, RsA and RsB, at the same time. They must be consecutive registers with RsA in the even bank and RsB in the odd bank.

Linear Addressing Mode

• Offset by immediate (RsA, displacement)

The operand address is the sum of the content of the address register RsA and the displacement (up to 24-bit signed integer, but the value range depends on the implementation of data memory).

• Offset by register (RsA, RsB)

The operand address is the sum of the contents of the address register RsA and the contents of the address register RsB.

• Post-increment by immediate (RsA, displacement+)

Table 3.3: Definitions of AMCR (from [1]) AM[1] AM[0] Addressing Mode

0 0 Linear

0 1 Bit-reversed

1 0 Modulo

1 1 Reserved

Table 3.4: Syntax of Address Modes and Supporting Units [2]

Addressing Mode Syntax Support Unit

1. Linear Scalar Cluster

Offset by Immediate RsA, displacement V V

Offset by Register RsA, RsB V V

Post-increment by Immediate RsA, displacement+ V V Post-increment by Register RsA, RsB+ V V

2. Modulo Scalar Cluster

Post-increment by Register RsA, RsB+ - V Post-increment by Immediate RsA, displacement+ - V

3. Bit-Reversed Scalar Cluster

Post-increment by Immediate RsA, displacement+ - V Post-increment by Register RsA, RsB+ - V

Figure 3.6: Address register file [1].

The operand address is in the address register RsA. After the operand address is used, it is incremented by the displacement (up to 24-bit signed integer, but the value range depends on the implementation of data memory) and stored in the same address register.

• Post-increment by register (RsA, RsB+)

The operand address is in the address register RsA. After the operand address is used, it is incremented by the content of the address register RsB and RsA.

Bit-Reversed Addressing Mode

Bit-reversed addressing mode is also called reverse-carry addressing mode. This mode is selected by setting the corresponding bits in AMCR, and address modification is per-formed in the hardware by propagating the carry from each pair of added bits in the reverse direction (from the MSB end toward the LSB end). It only supports post-increment by immediate and post-increment by register.

This way of address modification is useful for addressing the twiddle factors in 2k point-FFT addressing as well as to unscramble 2k-point FFT data.

Modulo Addressing Mode

Modulo address modification is useful for creating circular buffers for FIFO queues, de-lay lines, and sample buffers. This addressing mode only supports post-increment by immediate and post-increment by register. The definition of modulo addressing, using a base register (Bn) and an end register (En), enables the programmer to locate the mod-ulo buffer at any address. The current address register, An, can initially point anywhere (aligned to its access width) within the defined modulo address range, Bn≤ An < En.

Modulo addressing can be selected by configuring corresponding bits in AMCR. The range of values in modulo registers is from 1 to 216− 1.

3.5.6 Data Communication

The PACDSP provides fast data communication mechanism among scalar unit and two clusters. As shown in Fig. 3.7, it provides a data exchange mechanism between any two of the scalar unit and the two clusters. Fig. 3.8 shows that it can also provide data broadcast to facilitate one of them to broadcast its data to the others. This job is accomplished by using the ports of the memory interface unit (MIU) because MIU has connections with all register files of the scalar unit and the two clusters. It only needs one instruction latency.

Data Exchanges

We can use the instruction DEX to exchange 32-bit data between any two units. Or we can use the instruction DDEX to exchange 64-bit data between the L/S units in two clusters.

Data Broadcast

We can use the instruction pair BDT and BDR to broadcast 32-bit data from one unit to the others. Or we can use the instruction pair DBDT and DBDR to translate 64-bit data between two clusters.