H IGH L EVEL ASIP S YNTHESIS F LOW

[21] shows typical procedures of high level synthesis. Some instruction set matching and selection techniques are described in [22][23] [24]. [25] introduces the critical path and data path optimization for ASIP design.

High-level synthesis takes some kind of behavioral description of the algorithm, available hardware resources, and a set of constraints and goals, to generate the hardware architecture in register-transfer level (RTL). The set of constraints and goals define the desired performance and characteristics of the final architecture. The most common constraints are area and performance constraints.

Area constrained problems means that given a set of resources (or functional units), try to implement the application using those resources such that it has the highest performance. This is known as resource-constrained scheduling.

The performance constrained problem is known as time-constrained scheduling, where the designer is given a desired sample rate or iteration period and the goal is to minimize the total area of the final architecture.

There are other goals during the synthesis problem depending on the user requirement such as minimizing the number of memory modules, reducing the power consumption, minimizing the number of busses, incorporating reliability and testability into the design, etc.

Language description of algorithms

Internal graphical representation (SDFG)

Algorithmic Optimization

Module binding and control circuit generation (RTL)

RTL description of the final architecture Resouce

Goals and constraints

Functional units library Scheduling

Resource

Allocation Binding

Figure 4-1 High-level synthesis of DSP datapath

Figure 4-1 illustrates the design flow of the high-level DSP synthesis system.

The behavioral description which may be represented in C/C++ is first converted to a graph-based representation, and such as data-flow graph[]. In the DFG representations, the nodes represent computations (or functions or subtasks) and the directed edges represent data paths and each edge has a nonnegative number of delays associated with it. The following tasks in high-level synthesis of DSP datapath include high-level optimization, scheduling, resource allocation, module binding, and control generation.

The final architecture produced by high-level synthesis is typically at the synthesizable RTL. Many high-level synthesis systems have been designed and a great deal of progress has been made in finding good techniques for optimizing and exploring design tradeoffs. In addition, the trend towards more automation at higher level of design process is expected to continue.

Scheduling

Scheduling and resource allocation are two important tasks in hardware or software synthesis of DSP system. They are both interrelated and dependent on each other and are among the most difficult problems of high-level synthesis.

Scheduling involves assigning every node of the DFG to time steps. Time steps are the fundamental sequencing units in synchronous systems and correspond to clock cycles.

In general, there are two types of scheduling: one is time-constrained scheduling and the other is resource-constrained scheduling.

Time-constrained scheduling is to minimize the cost of hardware bound by some specific allowed operation time. For example, in many digital signal processing (DSP) systems, the sampling rate of the input data stream dictates the maximum time allowed for carrying out a DSP algorithm on the present data sample before the next sample arrives.

On the other hand, the resource-constrained scheduling problem is encountered in many applications where we are limited by the silicon area. The constraint is usually given in terms of either a number of functional units or the total allocated area.

When total area is given as a constraint, the scheduling algorithm determines the type of functional units used in the design. The goal of such an algorithm is to produce a design with the best possible performance but still meeting the given area constraint.

Resource allocation & binding

Resource allocation is the process of determining how many and what types of hardware required to implement the desired behavior at lowest cost. The hardware resources consist primarily of functional units, memory modules, multiplexers, and communication datapaths.

Binding involves the mapping of the variables and operations in the scheduled DFG into the functional, storage, and interconnection units, while ensuring that the design behavior operates correctly on the selected set of components. For the every operation in the DFG, we need a functional unit that is capable of executing the operation. For every variable that is used across several time steps in the scheduled DFG, we need a storage unit to hold the data values during the variable’s lifetime.

Finally, for every data transfer in the DFG, we need a set of interconnection units for the transfer. Besides the design constraints imposed on the original behavior and represented in the DFG, additional constraints on the binding process are imposed by the type of hardware units selected. For example, a functional unit can execute only one operation in any given time step. Similarly, the number of multiple accesses to a storage unit during a control step is limited by the number of parallel ports on the unit.

Figure 4-2 illustrates the mapping of DFG into register transfer components.

Figure 4-2 (a) show a scheduled DFG to be mapped and we assume that two adders and four registers are selected. Operation “+1” and “+2” cannot be mapped into the same adder because they must be performed in the same time step 1. On the other hand, operation “+1” can share an adder with operation “+3”, because they are carried out during different control steps. Thus, operation “+1” and “+3” are both mapped into adder1. Variables a and e must be stored separately because their values are need concurrently in time step 2. Register 1 and 2, where variables a and e reside, must be connected to the input ports of ADD1; otherwise, operation “+3” will not be able to execute in adder1. Similarly, operation “+2” and “+4” are mapped to adder2. Note that there are several different ways of performing the binding. For example, we can map

“+2” and “+3” to adder1 and “+1” and “+4” to adder2.

a b c d

1 2

3 4

g h

Time step 1

Time step 2

(a)

a b, e, g

+1, +3

reg1 reg2

adder1

d c, f, h

+2, +4

reg4 reg3

adder2

(b)

Figure 4-2 (a) Scheduled DFG; (b) mapped operation

4.2 Proposed Application Specific Programmable Processor

在文檔中具複雜運算單元之低功率多執行緒資料路徑的研究與設計 (頁 53-58)

 Scheduling

 Resource allocation & binding

4.2 Proposed Application Specific Programmable Processor

Scheduling

Resource allocation & binding