A DESIGN METHODOLOGY FOR INTEGRATING IP INTO SOC SYSTEMS

(1)

A DESIGN METHODOLOGY FOR INTEGRATING IP INTO SOC SYSTEMS

Philippe Coussy, Adel Baganne, Eric Martin

LESTER - UniversitC de Bretagne Sud

-

B P 921 16 - 56321 LOMENT Cedex, France { firstname.lastname } @ univ-ubs .!?

ABSTRACT

Successful integration of IPNC blocks requires a set of view that provides the appropriate information for each IP Block through the design flow for an IP-integration system. In this paper, we present a methodology of IP integration in a System-on a chip (SOC) design, that exploits both IP designer and SOC integrator constraints. First, we describe a method to extract and specify IP functional and timing constraints (YO sequence transfer constraints) from the IP core. Second, we propose a modeling style of the integration constraints and a technique for merging them with IP constraints. This technique allows the specification and design of an optimized IP interface unit required for IP- Socketization. The synthesis output is synthesizable VHDL RT of the interface, a detailed Bus-Functional model of the IP core towards Cosimulation.

I. INTRODUCTION

The complexity of modem embedded systems design requires designers to leverage the reuse of both software and hardware modules.

Reuse is done at the chip level called cores, VC (Virtual Component) or IP (intellectual Property) available in various forms ranging from soft cores to hard cores [I]. These components represent functions of specific domains like signal processing (DCT, FFT), telecommunication (Viterbi, Turbo codes) multimedia application (MPEG2, MPEG4, JPEG) etc. The IP core are integrated in a system- on a chip (SOC) which a typical architecture i s depicted in Fig. 1. Such architecture includes digital signal processors (DSP), shared memory, bus controller and a set of hardware IP blocs connected to the system bus through specific interfaces or Wrappers. IP cores can be, previously or not, created internally by the SOC designer team but can also be bought from an external source. Despite efforts oriented on IP core exchange and IP core catalog development ([l]), communication problems and timing issues can cause SOC design to fail. A successful IP core integration requires the designer to take into account the main following tasks:

1. Synchronization: the components have to be synchronized on different aspect such as global execution, data exchanges and protocols.

2. Protocol conversion: Assure the protocol conversion between blocks that use incompatible protocols. Wrapper can be used for this purpose but introduce overhead that should be taken into account with the timing constraints.

3. U 0 buffer synthesis: data may be buffered to ensure the system behavior and to meet timing constraints.

In practice, the vision of easy and quickly assembling a SOC using cores has not yet become reality for many reasons. Actually, even if cores are pre-verified, it does not mean the whole system will work when they are put together. The integration of cores into a SOC is widely a manual and error-prone process because it requires the designers to fully understand the functionality and interfaces features of complex cores. Besides the protection of the intemal IP bIock architecture can lead the designer to hide some information that may be essential for the IP integration.

Different approaches attempt to ease IP integration today by defining design methodologies or techniques to solve specific problems. Virtual Socket Interface Alliance (VSIA) [2] focussed on defining a standard

< j p i

^Arbiter ^memory^Shared

Fig. 1 A Typical SOC Architecture

on-chip bus, but this soon appeared to be difficult [3]. In [4] authors proposed an interface-based design methodology that attempts to ease integration by separating the communication from the behavior. VSIA [2] provides a Virtual Component Interface (VCI) standard that defines a generic cycle-based address-mapped point-to-point communication protocol. The use of this kind of standard interface can add communication overhead [5]. Some EDA companies provide a set of tools that allows incorporating IP cores for high level specification and system cosimulation. Sonics Inc defines a bus-independent configurable protocol, Open-Core Protocol (OCP). The IP cores communicate through OCP over the pNetwork [6]. This environment is suited for rapid system-level performance analysisCoware N2C provides a Virtual Bus [6] to connect each system block and allows the HW/SW cosimulation at the conceptual and architectural level. VCC (Virtual Component CO-design) [SI proposed by Cadence is a system- level environment for W / S W co-design and IP reuse. This tool allows specifying the system functionality, defining the system architecture, performing the partitioning, refining communications between blocks and analyzing system performances. However, such tools require the system designer to have an efficient IP core modeling adapted for the co4mulation and system-level performance analysis steps. Furthermore, they can not manage low-level details relative to IP interface synthesis (computing latency, YO timing constraints etc.).

Few works have addressed the problem of IP integration and interface synthesis in a global way. Some of them addressed the problem of interface synthesis between standard components that have incompatible protocols [9]. In [lo] authors describe the problem of IP wrapper synthesis and overhead delays to be considered for integration. Others addressed the problem of interface synthesis from hardware 110 transfer sequences in a co-design approach [l I].

In our point of view a global methodology of integration, going from the system level performance analysis down to the synthesis step, is the best way to solve the problem of IP core reuse. In this paper we present a methodology of IP integration that exploits both IP designer and SOC integrator constraints. The paper is organized as follow: First in section 2, we give the formulation of the IP integration problem. In section 3, the proposed integration flow is presented. As illustration, section 4 describes an integration example of an

FFT

core and the synthesis results obtained.

11. PROBLEM FORMULATION

Let us consider a SOC architecture composed of an IP core and a DSP(see Fig. 2). This IP core receives data X,Y,Z from the DSP and sends its result W to the DSP over a single bus. Two functional units compose the IP core: one memory management unit and one

1 8-3- 1

0-7803-7250-6~02/$10.00 Q 2002 IEEE IEEE 2002 CUSTOM INTEGRATED CIRCUITS CONFERENCE

307

(2)

processing unit that exchange data over two busses. All the data used in the processing unit are read from the memory management unit in a fixed order sequence SIP = (X, Y,Z) i.e. t, < ty e tz. The produced output signal W is also stored in the memory management unit. The memory management unit includes a fixed address generator. The order of the data transfer sequences is therefore completely deterministic. Let us consider the I/O sequence constraints imposed by the DSP to be the sequence Ssvs = (X,Z, Y). The produced result will be false because of the wrong data sequence order presented to the IP core interface.

Fig. 2: IP core integration problem

Let us now consider the following DSP data transfer sequence S's, = (X,Y,Z). If the timing requirements imposed by the IP core are not respected the integration process will fail. Successful integration of IP blocks requires a set of views that provides the appropriate information for each IP block through the design flow of an IP-integration system.

Hence, a methodology of IP integration has to exploit both IP provider and SOC integrator constraints. In our work, we consider the real time implementations of computing intensive applications such as image and signal processing. In our work, we consider the real time implementations of computing intensive applications such as image and signal processing. So, the functions processed by the IP cores are supposed to be deterministic.

111. DESIGN FLOW

An overview of our design methodology is described in Fig. 4. The design flow concerns on the one hand IP design tasks and on the other hand System design and integration tasks. The point of contact is done by means of an IP Execution Requirements Model (IPERM) and an IP Delay Model that describe low-level details for IP core integration.

These models should be provided by the IP designer and constitute a key element of successful integration from the performance analysis task to the synthesis step. As it will be. seen in the next sections these models offer to the IP designer an efficient protection of the internal description by hiding architecture details while keeping clear the description of the functionality requirements.

A. IP Design

The design of an IP core begins by a functional specification that describes the behavior of the component. The IP core is then described with hardware language more suitable for implementation. Usually, IP core architecture is based on four main functional units:

-

Processing Unit (PU) releases all the arithmetical operations

-

Memory Management Unit (MMU) stores data during executions.

-

Control Unit (CU) drives all the precedent described units

-

Interface Unit (IU) manages and controls the communications between internal architecture and external environment.

The functional units previously described can be designed by means of manual RTL description or high-level (behavioral) synthesis tools such as SystemC Compiler from Synopsys. Based on these descriptions, we can extract the IPERM model for the IP integration. This design step is discussed in the next section.

B.

IPERM Model Generation

At this stage of the design flow, the functional units of the IP core are described at the RTL level. The processing unit is modeled with a Finite State Machine with Data-path (FSMD) model described in [9].

An FSMD differs from the FSM in that it may include variables with

Mpu in the rest of the pa er. The memory management unit is modeled with a set of FSMD MMe ={MMuI.

....

MMUJ where MMvi represent the

ilh storage element with 1s i g f p and NIP represents the number of busses that connect the processing unit to the memory management unit. The memory management unit and the processing unit are therefore modeled by a set of communicating FSMD.

The first step of the IPERM generation we merge the M'" states with Mpo states in order to obtain a single FSMD MIp. The second step merges sequential M I p states without YO data dependencies into a new state called Super State (see Fig. 3). Thus this super state represents a set of computations and memory accesses that are released between two I/O data transfers.

First Dhase

1) 2) 3)

4) End f o r 5 ) End f o r Second vhase

For all the state in M,.,,

1

For each data dependency of the current state

Merge the data dependent

w'

states with the current state in new M,,. state

6 ) 7) 8) 9)

10) End i f 11) E n d f o r 12) E n d while

while M, state without I / O data dependency For all the state in MI,.

I f (the next state has no I / O data dependency) Merge it with the current state

Fig. 3: Pseudo code of our IPERM design algorithm

Since the IP core is described at the RTL level and that all the PU I/O transfer sequences are fully specified. Timing information can therefore be extracted and added to the generated model of the IP core such as data lifetime Ldd), input data latest arrival date ;and output data earliest emission date T p d d ) . The transfer delay due to the data exchange protocol between the processing unit and the interface unit is expressed by A in cycles (Fig. 5 ) .

The final IPERM model is an annotated FSMD:

-

A set of super states output by states merging steps Timing frames where the data transfers can occur

Fig. 9 depicts the set of communicating FSMD that represents the IP core described in section 2. The MMU is composed of two memories.

The PU reads X, Y in the first one reads Z and writes Vv' in the second one. The PU FSMD includes five states. Fig. 9, 10 respectively show the result of the first and second phase.

Finally, the obtained IPERM Model MIP is composed of three super states.

C. IP delay Model

Embedding intellectual property models into high-level system description allows the system designer to simulate and evaluate appropriate virtual components during the performance analysis phase.

For this purpose, the functional description of the IP core is associated with a delay performance model that describes its timing requirements.

This enables the system designer to anticipate the synchronization problems between the different components of the system and the IP core. Taking care about the timing requirements of the IP core early in the system design flow allows an optimized integration. The IP Delay Model is generated from the IPERM model since if describes the functional and timing requirements of the IP core. For instance, in [8]

IP core can be integrated at the system level for performance simulation. For this purpose, the IP functional model is associated with a DSL performance model using the Delay Script Language.

308 18-3-2

(3)

IP Designer IP Integrator

Fig. 4: Design Flow for IP integration

C. System Design a n d

IP

Integration

The system design begins by a specification capture of the desired application. The system designer select IP cores from a database considering constraints criteria e.g. speed, area, or power etc. Follows an architecture exploration concurrently with a set of codesign techniques [ 131 (HW/SW partitioning, system performance analysis, communication synthesis HW,SW and interface generation). The performance analysis task allows the designer to explore independent dimensions of behavior and architecture to reach optimal design performance within the given constraints. The hardware and software design tasks generate respectively an RTL description of A S I W G A blocs and C/C++ code. To satisfy the integration constraints and to carry out the IP-Socketization, the system designer can incorporate the low-level details of IP provided by the IPERM model (Latency, U 0 sequence transfer, U 0 timing constraints etc.).

1. Integration Constraints Definition:

All the following parameters specify the communication features between the system and the IP core.

-

6, : probability of access to the communication medium when the

-

6, : constant which depends on the used transfer protocol

-

^yw^:overhead introduced by the bus wrapper

-

6, : data transfer delay

communication is done via a shared On-chip-bus

Integration constraints can be of three major types: (1) fully specified by precise dates of data transfer and data sequence. order; (2) partially specified (timing frame of data transfer and data sequence order or partially or not ordered data transfer)- (3) Unspecified. These constraints are specified for the N, busses that connect the IP core to the rest of the system. Each one is modeled with an FSMD that describes the bus transactions. Hence the set Ms =lMsl,

...,

^Msi]

models integration constraints for each extemal bus: Msi represent the

iIh bus with 144Vs. The dependencies set between the M,p and M s are represented by a hierarchical links set. Each link can be decomposed in two subsets: data links and control links. The control links hence model a data exchange protocol as handshake for example. The timing frame of a data associated to the hierarchical links take into account the data transfer latency and the data exchange protocol delay between the system and the interface unit of the IP core.

For each data the transfer delay between the system and the interface unit (see Fig. 5 ) is expressed as 6 = 6,

+

6,,

+

y,,,

+

6,. Fig. 1 1 depicts

6

DSP Interface IP Core I

I

IIO data buffering

-b DSP + IP transfer 4- IP + DSP transfer

--.*

IP call

0

Computation

$ Transfer delay

U

*

Fig. 5: Integration Constraints Specification

the constraints imposed to the IP core of our example. The DSP and the IP core exchange data over a single bus modeled by the Ms FSMD.

The overhead added by wrapper is currently not supported and is left for future work.

2. IP Interface Synthesis

Merging integration constraints and IP constraints allows the design of an optimized IP interface unit required for IP-Socketization. Each YO data is characterized by two timing frames: TI" that represents the interval in which the transfer can occur; L&f) that represents the data lifetime in the interface unit. These information are generated by merging: (1) IP functional and timing constraints provided by the IPERM model (2) system integration constraints N, and data transfer sequences, (3) transfer delays

4

6. The generic interface unit targeted by the synthesis is composed of buffers for storing YO data and an FSM based controller. The hardware synthesis step uses algorithm working from timing requirements and data ordering information.

Interface hardware synthesis generates a synthesizable VHDL RT description. A BFM [14] can be generated manually based on the system bus specification. It is written in VHDL and will drive the simulation with the core's bus response. A new IP delay model can be generated at this stage taking into account the interface unit effects on the timing constraints of the interfaced IP core. This is not supported in the actual design flow and is left for future work.

Iv. IP INTEGRATION EXAMPLE: FFT CORE

The presented method has been applied to an IP core that implements an 8-points complex FFT optimized on area. The system is composed of one DSP and one IP core that communicate through a point to point

18-3-3 ₃₀₉

(4)

link (see Fig. 6). Real and imaginary parts are sent in parallel over one data bus that connects the DSP and the FFT IP core.

...

delay-model () [

input (ai); input(ar);input(bi);input(br); /*Read the inputs*/

run(); /*computing part*/

delay('9.0e-9'); /*Wait before posting output*/

output(wi);oupout(wr); /*Post the outputs*/

input (ai); input(ar);input(bi);input(br); /*Read the next inputs*/

I...,

s=((xh XZP 4. ^%)i(XI- x39 XI. X7). (SO. SI. $2. s3). (%. S5s S67 s 7 ) )

Fig. 6: Integration Constraints of the FFT core

The memory management unit implements eight 16 bits width registers containing the real and imaginary parts of the data each one coded with 8 bits. The PU exchanges data with the M M U over four 16 bits width busses for intermediate results. The PU reads input data and writes final results from or to the system on its YO ports. The data are exchanged serially between the DSP and the FFT core and in parallel between the interface unit or the memory management unit and the processing unit. A simple handshake enable protocol synchronize the DSP and the FFT core (&=I, &=I). The processing unit and the memory management unit are synchronous: the PU reads and writes data on intemal busses at fixed dates. Fig. 7 shows a piece of IP delay model script of the FFT core written with the Delay Script Language.

DSL is a C-like language used in the VCC tool [8] to describe the DSL Performance Model of hardware components. This IP delay model associated with the system-level description of the FlT core can be used for the performance analysis.

Constraints MMU 6. 8, A 16x16 bits

registers

Interface

FSM Registers Mux Demux

states 4x16 bits 4-1 1+4 Table 1:Experiments parameters and Results

Fig. 8: Interface Unit of the FFT example

The synthesized interface unit is optimized for the data sequence transfers imposed by the DSP. One multiplexor and one demultiplexor process respectively the parallel-serial and serial-parallel transfer mode translation on the DSP side. Registers are directly connected to the processing unit ports. This interface unit allows the integration of the IP core into the system design.

V. CONCLUSION

In this paper we presented a design methodology of IP integration in a S O C design that exploits both IP designer and SlOC integrator constraints. The integration task is based on the IPERM and IP delay models that describe low-level details of the IP executiion constraints.

These models can be deliverable since the intemal features of the IP core are hidden. We also presented a method for IP interface synthesis that can be easily automated. As a future work, we plan to refine the method by incorporating timing overhead added by bus wrapper, and handling the stochastic nature of applications where predicable behavior can not be guaranteed.

VI.

REFERENCES [ 11 Inventra, httP://www.mentor.com/inventra/

[2] Virtual Socket Interface Alliance, httu:l/www.vsi.org

[3] A. Cataldo, "VSI abandons plans for system-chip bus", E,E'fimes,, 1997 [4] J.A. Rowson and A.L. Sangiovanni-Vincentelli, "Interface-Based Design", in Proc. of DAC, June 9-13 1997

[SI R. L. Lysescky, F. Vahid, T. D. Givargis, "Techniques for reducing Read Latency of Core Bus Wrapper 'I, in Proc. ofDATE, March 2000

[6] Sonics Inc, "Sonics pnetworks Technical Overview", June 2000 [7] K. Van Rompaey, D. Verkest, I. Bolsens, H. De Man, "Coware A design environment for heterogeneous hw/sw systems", in Proc of EUIPODAC, 1996.

[8] Cadence VCC 2001 http:llwww.cadence.coml datasheets,'vcc.html.

[9] R. Passerone, J.A. Rowson, A. Sangiovanni-Vincentelli, "Automatic Synthesis of Interfaces between Incompatible Protocols", Proc. of DAC, 1998 [lo] G. Cyr, G. Bois, M. Aboulhamid, "Synthesis of commumcation Interfaces for SOC using VSIA recommendation", in Proc. of DATE, 2001

[l 11 A. Baganne, J-L. Philippe, E. Martin, "A Formal Technique For Hardware Interface Design", in Proc of ISCAS, 1997

[12] D. Gajski, N. Dutt, A. Wu, S. Lin, "High-level synthesi.! Zntroduction to Chip and Sysrem Design", Kluwer Academic Publishers, Boston, 1992.

[13] J. Staunstrup, W. Wolf :"Hardwardsoftware Co-desigi Principles and practice", Kluwer academic publishers 1997.

[14] M Keating, P Bricaud, "Reuse Merhodology Manual J?r System-On-A- Chip Designs" Kluwer Academic Publishers, Boston, 1998

init

t

Fig 11: System Constraints

I

^{Fig 10:}^2"Phase Result

A DESIGN METHODOLOGY FOR INTEGRATING IP INTO SOC SYSTEMS