Thesis Organization - 電子系統層級上的設計方法

Chapter 1 Introduction

1.4 Thesis Organization

The rest of the thesis is organized as follows. Chapter 2 introduces the languages we use at different levels and the OFDM system we use at this case study. In Chapter 3, our platform is presented in detail. Experiment flow and experimental results are given and discussed in Chapter 4. Finally, the conclusion is made in Chapter 5.

Chapter 2 Preliminary

2.1 Refinement

As shown in Figure 1, there are five steps in our design flow of refinement: spec, sequential executable code, Process network, timed functional model, cycle-accurate model, and RTL design. The design flow is provided for engineers to systematically find a good solution of designing a system. Following the design flow to design a complicated system saves engineer’s time.

Figure 1 Refinement

In spec step, nature language is used for hardware description. The step is the starting point of our design flow. Engineers write the need, the inputs and the outputs of the system. Next, engineers use C/C++ language to write the sequential executable code to verify the algorithm in sequential executable code step. After algorithm is verified, the sequential executable code is refined to a process network for parallel computation. Then, the process network is added timing information for estimating time. Finally, engineers write the cycle-accurate model and RTL design of the system.

2.2 SystemC Features

In our design flow, the process network and the timed functional model of a system are both written by SystemC. In other words, we used SystemC to complete our main work. The reason we use SystemC is that SystemC has many features which are convenient for us to write these.

SystemC provides a model of time which is not only easy-to-use but also with high readability [6]. It is convenient for us to use the model of time to add timing information to a model of system.

SystemC provides libraries for parallel computation [6]. It is convenient for us to use these libraries to model a system with multi concurrent block.

SystemC provides some H/W data types [6]. Instead of spending time to design H/W data types, we use these data types to model hardware’s behavior in the system.

SystemC provides mechanism for synchronization [6] [7]. We can use the mechanism to model arbiter’s behavior. Thus, we can study how the system with limited resources works better.

2.3 Process Network

Process network is a model of computation and a collection of processes which are connected and communicated over FIFO. Multiple parallel processes can execute simultaneously while using the process network to model a system [5].

We usually use a directed graph notation to represent a process network. In a directed graph notation which is used to represent a process network, nodes are used to represent processes, and edges are used to represent one-way FIFO queues of data.

Figure 2 is the typical directed graph notation representing a network with four nodes and three FIFOs. The four nodes are node A, node B, node C and node D. The three FIFOs are FIFO p, FIFO q, and FIFO r. FIFO q connects node A and node B, FIFO r connect node A and node D, and FIFO p connects node B and node C. Node C has an input to the another process network. Node D has an input to another process network, too. Node A has an output to another process network.

Figure 2 A directed graph representing a process network

The set of firing rules be sequential when we use process networks to model a system. Thus, we set our firing rule below. In a directed graph notation which is used to represent a process network, a node is fired just after all its input edges are filled

data. It means that a process executes when all its input data arrived.

2.4 The OFDM System

The OFDM system used by this case study is provided by Meng-Lin Ku and Chia-Chi Huang. Figure 3, Figure 4, and Figure 5 depict the OFDM system blocks.

Figure 3 depicts the transmitter architecture part of the system. Figure 4 depicts the receiver architecture part of the system. Figure 5 deeply depicts the Channel Estimation block of the receiver architecture [4].

Figure 3 Transmitter architecture

Figure 4 Receiver architecture

Figure 5 Channel estimation block

Chapter 3 Our Platform

In this work, we apply our design flow we mentioned in 2.1 to the OFDM system we mentioned in 2.4. Accordingly, we refine the model of the OFDM system step by step. We add more information and get more practical result when refining every time.

What must be noted is that we focus on how to estimate the execution time of the whole system at this work. Thus, the timing information plays an important role there.

3.1 Coding Guideline of Sequential Executable Code

We bring up five coding guidelines which algorithm designers should follow when they writing their C/C++ program. A C/C++ program wrote by these coding guidelines will be easy to be refined into the process network. The detailed description is what follows and the code in Figure 6 illustrates the rules we bring up in this sub-section.

//Constant data

Figure 6 An example of coding guidelines

We defined that communication variables are variables which are used to exchange data and local variables are variables which are only used in a module there.

The first rule we bring up is that the people writing sequential executable code to

verify the algorithm of the system should separate communication variables and local variables. It also implies them should not use global variables. The reason to follow that rule is that it is helpful to find the range, inputs, and outputs of processes. Finding the range, inputs, and outputs of processes in a general C/C++ program is hard. If the communication variables and local variables are separated, the inputs and outputs in a function block are clear. Thus, we can find them easily. Then, we can also infer the code range of the module. The code shown in Figure 6 gives an example of the rule.

In the code shown in Figure 6, the local variables are only used for computation in a block, like a, b, and c and the communication variables are used for communication between blocks, like the array “dat1” and the array “dat2”. Take block 1 in the code, the inputs of block 1 are the array “in1” and the array “pilot” and the output of block 1 is the array “dat1” are clear. Finding the inputs and outputs of other blocks is also easy.

When refining the sequential code to the process network, verifying the behaviors of sequential code and the process network are the same or not is necessary.

If all random functions in the program use the same random sequence, the order of getting random number is changed because the processes in the code may execute parallelly while refining the sequential code to the process network. Therefore, we bring up our second rule, all random functions in the program use different random sequence. If it is followed, we can get the same random number in sequential code and the process network easily. Thus, we could verify the behaviors of sequential code and the process network easily. The statement (1) and statement (2) in the code shown in Figure 6 are the instance of this rule.

Complex data is often used in a communication system. Using different data type in a program to model it is trouble to modify the problem. Therefore, we bring up our

C/C++ standard library should be used to compute complex number because of readability and facilitating modeling data. If we used it, modeling data exchange in different block in the process network may be not annoying and the problem may be easily understand. Otherwise, we may see that different data type used in different module and modeling data exchange spends much time. The statement (3) and these declarations of “comp” in the code instantiate how to follow this rule.

The fourth rule we bring up is to consider data grain size which is used in hardware block. While refining the sequential code to the process network, we hope the behavior of the process network is similar to the real system. Thus, we use real data grain size to exchange data. If it is considered when writing sequential code, refining is easier and less time is wasted to complete whole work. In the code shown in Figure 6, the functions compute a symbol of data instead of a packet of data. It accords with the common condition of hardware and sets the example of this rule.

The fifth rule we bring up is to keep block operation sequence by topological order. It facilitates modeling inputs and outputs of the whole system. It is important when modeling a system with many blocks. Take the code in Figure 6; the data exchange in the code could use the directed graph to model in Figure 7. Obviously we keep block operation sequence by topological order in the code.

Figure 7 The data exchange of the code example

3.2 A Process Network Model

We establish a process network we mention in 2.3 to model the OFDM system with parallel computation. In this section, we introduce how to establish the process network model. It should be noticed that we use TLM library which is described in [8]

to model the system because of convenience.

First, we should design how to model a node in a process network by SystemC.

The code in Figure 8 is the pattern of the node of the system written in SystemC. In the example of modeling a node, we model the node A in the process network which is mentioned in 3.1. In the object declaration of SystemC in the code, we see the three parts of it the declarations of ports, the body function “proc”, and the object constructor of SystemC. In the function “proc”, the statement (1) is not executed until the input_1, using the get-interface function of TLM library, has data. The statement (2) has same condition, too. That is, they work with blocking read. Thus, our firing rule of process network model is guaranteed. In other words, we guarantee that the process does not execute until all its input queues has data. The part of the object constructor of SystemC puts the function “proc” to an individual thread. And so forth, we can write the SystemC code of other nodes.

SC_MODULE(NODE_A) {

port<tlm_get_if<packet> > input_1;

port<tlm_get_if<packet> > input_2;

port<tlm_put_if<packet> > output;

void proc() {

Figure 8 SystemC code of a node of the process network

The code in Figure 9 is the pattern of the top-view of the system written in SystemC. It illustrates how the connection between nodes and FIFOs are implemented.

The process network described by the code in Figure 9 is the same as the process network described by the directed graph in Figure 2. There are the declarations of nodes and FIFOs and an object constructor of SystemC in the object declaration of SystemC in the code. The statements in the object constructor of SystemC describe

how to connect the FIFOs and the node. For example, the statement “C.output(p)”

means the connection of the output pin of node C and the FIFO p.

SC_MODULE(platform) {

NODE_A A;NODE_B B;NODE_C C;NODE_D D;

tlm_fifo<packet> p;tlm_fifo<packet> q;tlm_fifo<packet> r;

SC_CTOR(platform):NODE_A("NODE_A"),NODE_B("NODE_B"),

Figure 9 The system top-view written in SystemC

Therefore, we use a process network to model the OFDM system in this work.

We describe our procedure of making process network model here. First, we divide whole system to processes, according to the original data flow of the system. Then, we use FIFOs to connect these processes. Specially, while modeling the OFDM

model of a system for being ease to observe the data flow of the system.

In this work, we use process network to model an OFDM system. Because we want to observe the operation of the receiver, we divide the receiver to small processes. Figure 10 and Figure 11 depicts how the receiver part of the OFDM system blocks are divided into processes. We will use four processes, “Guard”, “Fine signal Detection” “Estimation”, “Coarse Signal Dectection”, and a delay element and some FIFOs connecting these nodes to model the receiver in the OFDM system. The block

“Down Converter” and the block “A/D LPF” are combined to the channel model and the block “Signal Demapper” and the block “P/S” are ignored in the original C code.

Thus we ignore these block in our receiver model. We use the method we mentioned in this section to establish the process network in this work.

Figure 10 Receiver architecture divided into processes

Figure 11 Receiver architecture divided into processes

According to the above-mentioned method, we divide the OFDM system into the process network in Figure 12. We also divide the receiver process into the four smaller processes and one delay element to reflect the detail operation of the receiver.

Figure 12 depicts our process network model of the OFDM system

Figure 12 Process network model of the OFDM system

When using the process network to model the OFDM system, the loop including

estimation value of the last iteration to estimate the channel at this iteration.

Estimation and Coarse-signal detection all need the result of Estimation at last iteration. Thus, “Coarse-signal detection” and “Estimation” can not run at the same time because Coarse-signal detection must wait the result Estimation at the same iteration.

In consequence, we try to add a delay element after the original delay element as shown in Figure 13 shows. That means the OFDM system uses the channel estimation value of the penultimate iteration. Thus, “Coarse-signal detection” and “Estimation”

can run at the same time. It may increase the bit error rate of the OFDM system.

However, it also increases the parallelization efficiency. We use the example to prove that changing architecture in the process network level of our design flow is useful.

Figure 13 Process network model of the OFDM system (with two delay element)

3.3 Timed Functional Model

After establishing process network of a system, we add timing information to it

to establish timed functional model of the OFDM system. Hence, we must get timing information for establishing timed functional model. While getting timing information, using hardware or software to implement certain function is must be considered. We get software timing information for modeling a block which is implemented in software and get hardware timing information for modeling a block which is implemented in hardware.

Before we get software timing information, we must make the code realistic.

Hence, some things must be done before running sequential executable code on instruction set simulator (ISS). For example, two things must be done when establishing timed functional model of the OFDM system. First, fixed-point modification should be considered. If we want to estimate a function which executes on an environment without floating point unit, we should modify the function to a fixed-point function. Second, we should use table-lookup acceleration to accelerate a function which is frequently used because we want to get realistic timing information.

Then we run sequential executable code which is modified on ISS and get software timing information.

Moreover, we get hardware timing information according to information from the bottom layer or reasonable assumption.

Therefore, we add timing information to the process network model of the system. Thus, we get a timed functional model of the system.

SC_MODULE(NODE_A) {

port<tlm_get_if<packet> > input_1;

port<tlm_get_if<packet> > input_2;

port<tlm_put_if<packet> > output;

void proc() {

Figure 14 SystemC code of timed functional model

The code in Figure 14 is the form of a timed functional process model written by SystemC. The difference between the process network model and the timed functional process model is that time timed functional process model is added timing information. The boldface statement in the figure is just the difference in this example.

It is a “wait” statement. When the simulator counts the execution time, it waits at the statement for 10 micro-second in this example. Thus, we establish the timed

functional model doesn’t work until all input pins have data, counts its computation time and put its result to their output pins.

3.4 Resource Management and Scheduling

When the issues of real design are discussed, the problem of limited resource must be paid attention to. Thus, a resource scheduler must exist. In this case study, we must concern memory scheduling and FFT scheduling. When modeling a system without infinite resource, we get the more realistic result from simulation after constructing the mechanism.

First of all, we establish a model of resource scheduler to handle resource request in the system model. In the model of resource scheduler, we use mutexes to decide which process gets the resource. A mutex represents a resource available in our scheduler model. When a process requests a resource, the scheduler decides that the process gets resource or not. If there is any resource, the process gets resource;

otherwise, the process gets no resource. While the process completes its work about the resource, it usually releases the resource. However, if we would like to transfer the resource and some things sticking the resource to another process, we transfer the resource to the budgeted process. For example, a process may want to transfer the memory and the data in the memory to another process. The transformation is budgeted when designing the system. We will give an example below.

In Figure 15, a mutex represent a resource and a man in the figure represents a process. We will use figures of this style to illustrate our mechanism of memory scheduling.

Figure 15 Motions of resource scheduler I

As shown in Figure 16, when a process requests a memory, the scheduler checks whether there is memory available or not. If there is memory available, the scheduler gives the process a grant to use a memory.

Figure 16 Motions of resource scheduler II

After the process which gets resource in last step completes its work about the resource, it transfers the resource to the budgeted process. This movement transfers

not only the memory but also the data in the memory. Figure 17 depicts it.

Figure 17 Motions of resource scheduler III

As shown in Figure 18, the process with a grant requests another and the scheduler gives a grant to it. Whether a process owns any memory or not, it can always request memory. The scheduler gives it a grant when there is a memory available.

As shown in Figure 19, the scheduler can handle the requests of two processes at the same time. Besides, if any process requests a memory now, the scheduler does not reply until there is a memory released. While the process gets no response, it stops its work.

Figure 19 Motions of resource scheduler V

The example in Figure 20 depicts the model of scheduler. The other resources scheduling can completely imitate this form. In the figure, the F1 block and the F2

在文檔中電子系統層級上的設計方法－以正交多頻多工系統為例 (頁 12-0)