• 沒有找到結果。

2891­4244­0367­7/06/$20.00 ©2006 IEEEICME 2006

N/A
N/A
Protected

Academic year: 2022

Share "2891­4244­0367­7/06/$20.00 ©2006 IEEEICME 2006"

Copied!
4
0
0

加載中.... (立即查看全文)

全文

(1)

PAC DSP CORE AND APPLICATION PROCESSORS

David Chih-Wei Chang, I-Tao Liao, Jenq-Kuen Lee, Wen-Feng Chen, Shau-Yin Tseng, Chein-Wei Jen

SoC Technology Center

Industrial Technology Research Institute Hsinchu, Taiwan 310, R.O.C.

[email protected]

ABSTRACT

This paper provides an overview of the Parallel Architecture Core (PAC) project led by SoC Technology Center of Industrial Technology Research Institute (STC/ITRI) in Taiwan. The background of PAC project, a brief introduction to PAC core technologies, PAC SoC development suite, PAC benchmarks, and applications are presented. The main objective of the PAC development plan is to enhance industrial development competitiveness in the core technology related to key components, especially for portable multimedia applications.

1. INTRODUCTION

In recent years, the markets of communication systems and consumer electronics grow dramatically and this also drive the demand for digital signal processor (DSP) solutions. In order to fulfill increasing high-performance, multi-function, and real-time multimedia processing requirements, DSP solutions have been embedded in a wide variety of consumer electronics and home entertainment products, such as cellular phones, MP3 players, GPS, digital cameras, DVD players, set-top boxes, and DTV consoles. The advances in DSP implementations can be in the form of ASIC chips or DSP cores. Considering the flexibility for system and leading-edge algorithm design, a programmable DSP core (DSP chip or DSP/MPU SoC) is the ideal choice for supporting multi-application, high-bandwidth, and multiple communication standard required by emerging mobile multimedia devices.

Because DSP core is regarded as the key component of modern communication and consumer electronics appliances, the PAC project was initiated in early 2004. It aims at developing a 32-bit programmable DSP core based solution to enable richer multimedia capabilities, reduce development efforts, and shorten time to market. The highly integrated PAC SoC platform features a dual-core architecture that combines the command and control capabilities of the RSIC MPU with the high- performance/low-power DSP core having parallel processing capability. The PAC application processor is developed mainly for the next-generation media-rich and

multi-function portable devices, such as PMP, PDA, and smart phones.

2. PAC CORE TECHNLOGIES

PAC DSP is a 32-bit fixed-point low power high performance DSP with 5-way VLIW (Very Long Instruction Word) architecture targeted for mobile applications. It has one scalar unit and two data stream clusters. Each data stream cluster contains two functional units and a distinct partitioned low power register file structure. PAC DSP has a rich, but optimized, instruction set which supports 8-bit and 16-bit SIMD operations. It is targeted to run at a maximum frequency of 250-300MHz.

The PAC DSP core can be used as a co-processor in a dual- core processor architecture platform (e.g. PAC SoC Platform) or used as standalone unit in a single-processor DSP platform. Along with the development of PAC DSP processor, a complete tool chain of compiler, assembler, and linker is also developed. High performance assembly code library will be provided as well for multimedia applications. It targets, but not limited to, the following application domain:

Video and Image processing (H.264, MPEG-4, JPEG, Color space transform, etc.)

Audio and Speech processing (MP3, AAC, and GSM speech processing, etc.)

Voice processing and enhancement (digital hearing aid, voice-controlled gadgets, VoIP Telephony, etc.) 2.1. PAC DSP Core

PAC DSP is a silicon-proven IP core developed by STC. It employs VLIW architecture and SIMD Instruction Set to ensure high parallel computing ability. The PAC DSP kernel contains the instruction pipeline and is the computation engine of PAC DSP. The application-specific Customized Function Unit (CFU) is used to enhance the computational power of PAC DSP kernel. One example of such a CFU is a motion-estimation engine for video encoding application. The CFU executes in parallel with the PAC DSP kernel and interface with the kernel using either the PAC DSP data memory and CFU interface. Fig. 1

289

1­4244­0367­7/06/$20.00 ©2006 IEEE ICME 2006

(2)

shows the PAC DSP Core Architecture. Fig. 2 illustrates a block diagram of the architecture of the PAC DSP Kernel.

It has three main components: a program sequence control unit, a scalar unit, and 2 clusters of VLIW data path.

nReset

Embedded ICE

Power Control Unit

BIU

AHB Master Interface

AHB Slave Interface

PACDSP Core

JTAG Interrupt

Interface

Interrupt Interface

Execution ControlInterface

InstructionMemory InterfaceDataMemoryInterface DebugInterfaceCustomizedFunctionUnitInterfacePowerControlInterface

PACDSP CLK

Kernel

MIU&LocalData MemoryIMIU&Instruction Cache

MemorySubsystem

Accelerators

InterruptInterface

Fig. 1 PAC DSP Core architecture

DSP Kernel V LIW Datapath Program Sequence

Control Unit

Public Ping -Pong RF (16) Private RF (1 6)

Memory Interface Unit (MIU )

Coeff.

RF (32)

Load/ S tore Unit Customized FU

A rithmetic Unit Customized FU

Private RF ( 8)

Public Ping -Pong RF (16 ) Private RF (1 6)

Coeff.

RF (32)

Load/ Store Unit Cu stomized FU

A rithmetic Unit Cu stomized FU

Private RF ( 8) Dispatch

Unit

Interrupt Handler

S calar Unit S calar Unit

RF (8)

Cluster1 Cluster2

Customized Functional Unit

Accelerators Bus Interface Unit (BIU)

Fig. 2 The Architecture of PAC DSP Kernel

The program sequence control unit dispatches instructions to the scalar unit and VLIW data path. It also handles the interrupt and exception events. The scalar unit executes the scalar instructions and has 8 local registers; most of the program sequence control instruction is defined in this unit.

The VLIW data path is composed of two clusters taking care of executing data operations in the program. The number of clusters in the VLIW data path can be scaled up or down based on target application’s performance requirement. Each cluster contains a load/store unit (L/S) and an arithmetic unit (AU). Both units can execute instructions concurrently. Thus, two instruction slots in the instruction packet are allocated for a cluster.

Each cluster has its own register files structure. There are private register files for L/S unit and Arithmetic unit. The

private register file for L/S unit is address register file and the private register file for Arithmetic is AC register file.

The communication between two units is through ping- pong register file. The specific operation defined in ping- pong register file will reduce power consumption. And the data communication between clusters is achieved using explicit“data broadcast” and “receive” instructions.

The effective data communication among register files can be ensured because of well-established register file structure.

The area and power consumption are greatly reduced through register file port reduction using register file partition scheme and Ping-Pong register file structure.

VLIW architecture saves more power than Super Scalar architecture for the static instruction schedule methodology.

It suited the low power requirement in portable applications.

The dynamic and static power management methodologies are defined in PAC DSP. The static power management provides the control register for turning off the sub-block of PAC DSP. The dynamic power management methodology will turn off the unused processing elements in data path dynamically.

In addition, PAC DSP uses Variable Instruction/Packet Length for solving Low Code Density problems. The built- in Hierarchical Encoding/Decoding Technical feature can successfully eliminate complex Dispatch impacts.

Enough performance with minimized power consumption is the requirement for embedded systems. In order to fulfill the requirements of different applications in multi-function portable devices, the computing power of PAC DSP can be re-defined during in design time and well-designed power management methodology can reduce the power consumption.

2.2. PAC DSP Software Development Suite

PAC DSP Software Development Suite offers common user interfaces on Linux environment that allows for easy learning and developing across platforms. From PC to PAC-based platforms, this cross platform functionality empowers you to repurpose originally developed applications and gives you a head start for entering PAC- based product development. Such a suite provides ever- expanding support for the features of PAC’s latest DSP processors, including dual-core technology, cluster-wise Ping-Pong architecture technology, and joint VLIW-SIMD ISA technology.

PAC DSP Software Development Suite includes C Compiler, Assembler/Linker, Debugger, Libraries, and other Supporting Utilities. Those help system developers

290

(3)

deliver applications with good code quality. For example, the PAC DSP C Compiler, which is ported from ORD compiler, ensures that PAC DSP application can be developed in a programmer-friendly environment, thus reducing time-to-market and development cost for the end products.

2.3. PAC SoC Platform

The PAC SoC Platform is designed sophisticatedly to provide an application processor SoC for the next- generation mobile devices such as PMP, smart phones, and PDA. PAC SoC platform features a dual-core architecture that combines the command and control capabilities of the MPU with the high-performance and low-power capabilities of DSP core. The dual-core architecture utilizes both RISC MPU and VLIW DSP technologies.

Fig. 3 shows the basic SoC Platform Architecture. For different applications, it can be either scaled up or down to meet the performance requirements. Basic PAC SoC Platform consists of Dual-Core Processor (MPU + DSP), Memory Subsystem, System DMA, I/O Peripherals, and on- chip System Bus Network. They communicate through the on-chip System Bus Network.

PAC Platform uses ESL (Electronic System Level) design methodology. ESL is a platform that provides co- verification of hardware and software design. In hardware RTL design, compare to traditional verification, ESL platform can provide real data such as H.264 stream data;

in software design, ESL provides verification environment for compilers and debuggers, such as step-by-step debug tools, memory and register analysis.

TIMERs

MPU AHB DSP AHB DMA AHB

ROM Flash SDRAM

TIMERs TIMERs

DMA APB TIMERs

DSPAPB

MPU APB

PAC Core

MPU Peripherals DSP

Peripherals DMA Peripherals

PAC SoC Platform Architecture

On-Chip SRAM

AHB/

APB TIMERs WATCH

DOG PWM I2C UARTs

RTC GPIOs SPI SSP

TIMERs

WATCH DOG I2S

AHB/

APB

Smart Card UARTs

USB OTG

LCD Controller AHB/

APB MPU

M

DSP

M DMA

M

S M

MPU VIC

S

DSP VIC

S

Mail Box

S S

S SMI

S S S

SDRAM Controller

S S S

SSSSSS

S S

S

Fig. 3 PAC SoC Platform Architecture

Besides, PAC Platform uses DVFS (Dynamic Voltage and Frequency Scaling) to solve the problem of power gap.

Power gap is one of major challenges of IC design, and multiple Vdd (mVdd, ie. voltage scaling) is one of most

important and effective low-power design methodology.

PAC uses mVdd and power-aware management technology;

thus it can save 5-70% of original power.

2.4. PAC SoC Embedded Software

PAC platform provides embedded Linux software solution.

Compare to the standard version kernel, lots of features are added to meet the requirement for consumer electronics products, including fast-boot, XIP, hard real-time and power management. And the Inter-Processor Communication (IPC) software framework support makes the communication between dual-cores architecture become easy. With embedded Linux technology, PAC will be a stable, flexible, and extensible platform for dual-cores architecture developers. Fig. 4 shows reference embedded software structure for PMP. The embedded software for PAC platform includes following components: HAL library and boot monitor, Embedded Linux, Middleware, Codec engine & applications, DSP microkernel.

Fig. 4 PAC PMP Reference Software Structure

2.5. PAC Benchmarks

PAC DSP achieves the great power performance ratio as shown in Figure 5.

Fig. 5 Benchmarks of DSP Cores (1)

The signal processing performance of PAC DSP is pre- evaluated using a suite of DSP benchmarks developed by Berkeley Design Technology Inc (BDTI). The figure 6 demonstrates execution cycle count results of each kernel

Multimedia framework

Power-aware IPC

Embedded Linux kernel fast boot, power-management, preemptive, XIP, flash file system, network management DPM

MPU/IOs (USB, IDE, A/V, WLAN ,LCD,…)

Video Audio

DSP/IOs Resource management

DPM, Drivers Data

flow

Micro Kernel/

BIOS DSP Application Layer

USB 2.0, IDE, WLAN H.264

player MP3 player Application

Layer

Photo viewer and extractor

Portable Media Player Platform Hardware

Drivers Linux Kernel Libs &

Services Audio/Video

Codecs User

Interface Middleware

Graphics Libs/W.S.

A/V Drivers

& other I/O Drivers IPC IPC,

F/W, Bootloader F/W, Bootloader

Multimedia framework

Power-aware IPC

Embedded Linux kernel fast boot, power-management, preemptive, XIP, flash file system, network management DPM

MPU/IOs (USB, IDE, A/V, WLAN ,LCD,…)

Video Audio

DSP/IOs Resource management

DPM, Drivers Data

flow

Micro Kernel/

BIOS DSP Application Layer

USB 2.0, IDE, WLAN H.264

player MP3 player Application

Layer

Photo viewer and extractor

Portable Media Player Platform Hardware

Drivers Linux Kernel Libs &

Services Audio/Video

Codecs User

Interface Middleware

Graphics Libs/W.S.

A/V Drivers

& other I/O Drivers IPC IPC,

F/W, Bootloader F/W, Bootloader

StarCore

1.6mm2 -

- 1.2mm2 Area

Yes Yes Yes Yes

Power Management

0.08 (Without Memory) 0.098

(Without Memory) -

0.08 (Without Memory) Power

Consumption (mW/MIPS)

3600 1830 1500 ~

2100 1250

Performance (MIPS)

0.13µm 0.13µm 0.13µm ~

90nm 0.13µm Process

450 305 250~350 250

Frequency (MHz)

8 way VLIW 6 way VLIW 6 way VLIW 5 way VLIW Architecture

CEVA-X 1620 SC1000 (SC1400) SC2000 (SC2400) PAC DSP

v2.0

CEVA ITRI/STC

Vender Property

Yes - 0.107 (Without Memory) 1600 0.13µm ~ ?

400 4 issue Superscalar

ZSP500 LSI

0.16mm2 Yes 0.125 (Without Memory) 640 0.13µm

320 2 way Superscalar

SP5 3DSP

291

(4)

for PAC DSP and its competitors. With the same MACs resource, 30% of the benchmarking results of PAC DSP are better than competitors’. The optimized ISA and special architecture of PAC DSP are the main reasons. In Fig. 7, PAC SoC Processor compares with famous Low-Power Application Processors offered by TI, Freescale, and Intel.

PAC DS P CEVA-X 1620

CEVA-X 1640

StarCore SC1200

StarCore SC1400

TI C6414 250M HZ 450M HZ 340M HZ 305M HZ 300M HZ 1000M HZ

Ve ctor Add 21 33 18 19 19 27

Ve ctor D ot 23 26 19 25 16 25

Ve ctor M ax 43 29 22 44 27 36

Control 444 639 639 425 425 475

Bit unpack 146 106 61 164 124 97

R e al-vauledB lock FIR 317 351 182 354 185 194

Comple x-vaule dB lock FIR 993 1330 690 1333 675 674

SS FIR 18 21 19 16 14 26

IIR 19 9 8 10 9 16

LM S 34 29 24 26 19 37

Vite rbi 3505 2304 1925 2880 1935 1740

FFT 1684 2207 1248 3230 1631 1246

DSP Platform

Architecture 4-way VLIW + Scalar 2MACs

4-way VLIW 2 MACs

6-way VLIW 4 MACs 8-way VLIW

4 MACs 8-way VLIW

2 MACs 8-way VLIW

Fig. 6 Benchmarks of DSP Cores (2)

Note: PAC DSP was submitted for seeking BDTI’s official certification.

PAC TI OMAP

2410/20

TI OMAP1610 (1611/1612)

Freescale

MXC275-30 Intel PXA800F Processor

Core I

ARM9/

S+Core ARM1136JF-S ARM926EJ-S ARM1136JF-S XScale

Freq (MHz) 244 330 204 532 312

Processor

Core II PAC DSP 2.0 TMS320C55x TMS320C55x StarCore

SC140e DSP MSA DSP (Frio)

Freq (MHz) 300 220 204 208 104

Accelerator

(s) Custom Core 2D/3D Graphics,

Video

Video, Security

Security (HW/SW)

16-bit SIMD, Viterbi, Voice Power

(mW@MHz) 450@300 650@330 240@204 650@532 350@312

IC Process 0.13µm 0.09um 0.13µm 0.09um 0.13µm

Core 1.2V N/A 1.1~1.5V Not Open 1.2V

Peripheral

Voltage 2.5/3.3V N/A 1.8V/3.0V 1.8V~3.3V 1.8V~3.3V

Package 288 BGA 289 BGA 289 BGA Not Open 294 TPBGA

Fig. 7 Benchmarks of Low-Power Application Processors

3. PAC APPLICATION PROCESSORS STC cooperates with several fabless IC design companies in Taiwan for developing applications based on PAC design.

Those primary target at low power and superior performance portable multimedia devices which need to process an enormous amount of digital audio and video stream, such as PDA, Smart Phone, PMP, DSC, and DVR;

or VoIP handset/gateway which require real-time signal processing.

PMP and PDA/Smart Phone are two key potential implementations. With PAC as the system fundamental, the PMP will possess multiple multimedia functions, such as MP3/AAC audio encoding/decoding, MPEG-4 D1 resolution encoding/decoding, H.264D1 decoding/QCIF encoding, signal equivalent/amplify control. It also has different kinds of peripheral controls, including monitor, audio/video I/O, and external memory, to meet the hardware requirement of next-generation PMP. PDA/Smart Phone is regarded as biggest market segment for PAC applications. The PAC Media Processor embraces a multimedia application processor (connect to a Baseband

processor externally) and standard peripherals for high- performance PDA/smart phones. PAC Media Processor will be introduced and promoted to Taiwan-based companies in the beginning phase. The goal is to step into the market currently dominated by foreign Media Processor providers.

4. CONCLUSION

PAC SoC Platform consists of 32-bit PAC DSP core and MPU, memory subsystem, DMA, I/O peripherals, and on- chip system bus network. In addition, low-power methodology, performance evaluation, and hardware/

software co-verification techniques are developed during the design process. The complete software tools and hardware development environment further reduce development risks and shorten time to market. Featuring high performance operations at optimized low power consumption, the dual core PAC platform provides an ideal application processor solution to implement more robust SoC designs for next-generation multimedia mobile devices.

ACKNOWLEDGEMENT

We wish to thank those experts who offer valuable advice in the PAC project, especially Dr. HT Kung, William H.

Gates Professor of Computer Science and Electrical Engineering of Harvard University, and Dr. Paul Lin, General Director of Information and Communications Research Laboratories of ITRI. On the other hand, we are very grateful to all PAC team members who made all this work. We would also like to express our appreciation of the assistance given by Alan Kang and Winnie Chu who are planners of the Planning & Promotion Division of STC/ITRI, in compiling information for this paper.

REFERENCES

[1] V. K. Madisetti, “VLSI Digital Signal Processors: An Introduction to Rapid Prototyping”, IEEE Press, 1995

[2] Keshab K. Parhi,“VLSI Digital Signal Processing Systems”, John Wiley and Sons, Inc., 1999.

[3]“DSP56800E 16-Bit DSP Core Reference Manual”, Freescale, Inc.

[4]“TMS320C55x DSP Function Overview”, Texas Instruments, Inc.

[5]“TMS320C6000 Technical Brief”, Texas Instruments, Inc.

[6] Yung-Chia Lin, Chung-Lin Tang, Chung-Ju Wu, Ming-Yu Hung, Yi-Ping You, Ya-Chiao Moo, Sheng-Yuan Chen and Jenq Kuen Lee, “Compiler Supports and Optimizations for PAC VLIW DSP Processors”, LCPC 2005, USA, Oct. 2005 (Also to appear in LNCS).

[7] Chien-Yuan Lai, Jin-Hon Lin, Yaw-Feng Wang, “DVFS SoC Architecture and Implementation”, SoC Technology Journal, vol. 3, pp.84~91, Nov. 2005

[8] CE Linux Forum (CELF) Kernel XIP Specification http://tree.celinuxforum.org/CelfPubWiki/KernelXIPSpecificat ion

292

參考文獻

相關文件

◦ An online stage for scheduling an applicat ion to the most appropriate core type base d on predicted performance interference.  The proposed scheduler can improve ove rall

巫佩蓉��.indd �0 2006/6/12 下午 12:38:00.. ― 91

(12%) If electricity power failures occur according to a Poisson distribution with an average of 7 failures every 15 days, calculate the probability that there will be more than

By this result, we establish the existence theorems of solutions of systems of generalized equations, systems of generalized quasiequilibrium problem, common fixed point

For the proposed algorithm, we establish its convergence properties, and also present a dual application to the SCLP, leading to an exponential multiplier method which is shown

We further construct a nontrivial infinite dimensional linear programming dual for the well-known newsvendor problem with concave ordering cost and prove a strong duality result for

6A - Index and rate of change of CPI-A at section, class, group and principal subgroup levels 6B - Index and rate of change of CPI-B at section, class, group and principal

and Jorgensen, P.l.,(2000), “Fair Valuation of Life Insurance Liabilities: The Impact of Interest Rate Guarantees, Surrender Options, and Bonus Policies”, Insurance: Mathematics