• 沒有找到結果。

The TMS320C6416T DSP is the a fixed-point DSP in the TMS320C64x series of the TMS320C6000 DSP platform family. It is based on the advanced VelociTI very-long-instruction-word (VLIW) architecture developed by TI. The functional block and DSP core diagram of TMS320C64x series is shown in Fig. 3.2.

The C6000 core CPU consists of 64 general-purpose 32-bits registers and eight func-tion units. Features of C6000 device include the following.

• VLIW CPU with eight functional units, including two multipliers and six arith-metic:

– Executes up to eight instructions per cycle.

– Allows designers to develop highly effective RISC-like code for fast develop-ment time.

• Instruction packing:

– Gives code size equivalence for eight instructions executed serially or in par-allel.

– Reduces code size, program fetches, and power consumption.

• Conditional execution of all instructions:

– Reduces costly branching.

– Increases parallelism for higher sustained performance.

• Efficient code execution on independent functional units:

– Efficient C complier on DSP benchmark suite.

– Assembly optimizer for fast development and improved parallelization.

• 8/16/32-bit data support, providing efficient memory support for a variety of appli-cations.

• 40-bit arithmetic options add extra precision for vocoders.

• 32x32-bit integer multiply with 32- or 64-bit result.

In the following subsections, three major parts of the TMS320C64x DSP are intro-duced respectively. They are the central processing unit, memory, and peripherals.

Figure 3.2: Functional block and CPU (DSP core) diagram [13].

3.2.1 Central Processing Unit

The C64x DSP core contains 64 32-bit general purpose registers, program fetch unit, instruction decode unit, two data paths each with four function units, control register, control logic, advanced instruction packing, test unit, emulation logic and interrupt logic.

The program fetch, instruction fetch, and instruction decode units can arrange eight 32-bit instructions to the eight function units every CPU clock cycle. The processing of instruc-tions occurs in each of the two data paths (A and B) shown in Fig. 3.2, each of which contains four functional units and one register file. The four functional units are as fol-lows. The first unit is for multiplier operations (.M). The second unit is for arithmetic and logic operations (.L). The third is for branch, byte shifts, and arithmetic operations (.S).

And the last is for linear and circular address calculation to load and store with external memory operations (.D). The details of the functional units are described in Table 3.1.

Table 3.1: Functional Units and Operations Performed [12]

Parameter Value

.L unit(.L1, .L2) 32/40-bit arithmetic and compare operations 32-bit logical operations

Leftmost 1 or 0 counting for 32 bits Normalization count for 32 and 40 bits Byte shifts

Data packing/unpacking 5-bit constant generation

Dual 16-bit and Quad 8-bit arithmetic operations Dual 16-bit and Quad 8-bit min/max operations .S unit (.S1, .S2) 32-bit arithmetic operations

32/40-bit shifts and 32-bit bit-field operations 32-bit logical operations

Branches

Constant generation

Register transfers to/from control register file (.S2 only) Byte shifts

Data packing/unpacking

Dual 16-bit and Quad 8-bit compare operations

Dual 16-bit and Quad 8-bit saturated arithmetic operations .M unit (.M1, .M2) 16 x 16 multiply operations

16 x 32 multiply operations

Dual 16 x 16 and Quad 8 x 8 multiply operations Dual 16 x 16 multiply with add/subtract operations Quad 8 x 8 multiply with add operations

Bit expansion

Bit interleaving/de-interleaving Variable shift operations Rotation

Galois Field Multiply

.D unit (.D1, .D2) 32-bit add, subtract, linear and circular address calculation Loads and stores with 5-bit constant offset

Loads and stores with 15-bit constant offset(.D2 only) Loads and stores doubles words with 5-bit constant Loads and store non-aligned words and double words 5-bit constant generation

32-bit logical operations

Each register file consists of 32 32-bit registers for each four functional units reads and writes directly within its own data path. That is, the functional units .L1, .S1, .M1, .D1 can only write to register file A. The same condition occurs in register file B. However, two cross-paths (1X and 2X) allow functional units from one data path to access a 32-operand from the opposite side register file. The cross path 1X allows data path A to read their source from register file B. The cross path 2X allows data path B to read their source from register file A. In the C64x, CPU pipelines data-cross-path accesses over multiple clock cycles. This allows the same register to be used as a data-cross-path operand by multiply functional units in the same execute packet.

3.2.2 Memory Architecture and Peripherals

The C64x is a two-level cache-based architecture. The level 1 cache is separated into program and data spaces. The level 1 program cache (L1P) is a 128 Kbit direct mapped cache and the level 1 data cache (L1D) is a 128 Kbit 2-way set-associative mapped cache.

The level 2 (L2) memory consists of 8 Mbit memory space for cache (up to 256 Kbytes) and unified mapped memory.

The external memory interface (EMIF) provides interfaces for the DSP core and external memory, such as synchronous-burst SRAM (SBSRAM), synchronous DRAM (SRAM), SDRAM, FIFO and asynchronous memories (SRAM and EPROM). The EMIF also provides 64-bit-wide (EMIFA) and 16-bit-wide (EMIFB) memory read capability.

The C64x contains some peripherals such as enhanced direct-memory-access (EDMA), host-port interface (HPI), PCI, three multichannel buffered serial ports (McBSPs), three 32-bit general-purpose timers and sixteen general-purpose I/O pins. The EDMA con-troller handles all data transfers between the level-two (L2) cache/memory and the device peripheral. The C64x has 64 independent channels. The HPI is a 32-/16-bit wide parallel port through which a host processor can directly access the CPUs memory space. The PCI port supports connection of the DSP to a PCI host via the integrated PCI master/slave bus interface.

相關文件