• 沒有找到結果。

Chapter 2 Preliminary

2.4 The OFDM System

The OFDM system used by this case study is provided by Meng-Lin Ku and Chia-Chi Huang. Figure 3, Figure 4, and Figure 5 depict the OFDM system blocks.

Figure 3 depicts the transmitter architecture part of the system. Figure 4 depicts the receiver architecture part of the system. Figure 5 deeply depicts the Channel Estimation block of the receiver architecture [4].

Figure 3 Transmitter architecture

Figure 4 Receiver architecture

Figure 5 Channel estimation block

Chapter 3 Our Platform

In this work, we apply our design flow we mentioned in 2.1 to the OFDM system we mentioned in 2.4. Accordingly, we refine the model of the OFDM system step by step. We add more information and get more practical result when refining every time.

What must be noted is that we focus on how to estimate the execution time of the whole system at this work. Thus, the timing information plays an important role there.

3.1 Coding Guideline of Sequential Executable Code

We bring up five coding guidelines which algorithm designers should follow when they writing their C/C++ program. A C/C++ program wrote by these coding guidelines will be easy to be refined into the process network. The detailed description is what follows and the code in Figure 6 illustrates the rules we bring up in this sub-section.

//Constant data

Figure 6 An example of coding guidelines

We defined that communication variables are variables which are used to exchange data and local variables are variables which are only used in a module there.

The first rule we bring up is that the people writing sequential executable code to

verify the algorithm of the system should separate communication variables and local variables. It also implies them should not use global variables. The reason to follow that rule is that it is helpful to find the range, inputs, and outputs of processes. Finding the range, inputs, and outputs of processes in a general C/C++ program is hard. If the communication variables and local variables are separated, the inputs and outputs in a function block are clear. Thus, we can find them easily. Then, we can also infer the code range of the module. The code shown in Figure 6 gives an example of the rule.

In the code shown in Figure 6, the local variables are only used for computation in a block, like a, b, and c and the communication variables are used for communication between blocks, like the array “dat1” and the array “dat2”. Take block 1 in the code, the inputs of block 1 are the array “in1” and the array “pilot” and the output of block 1 is the array “dat1” are clear. Finding the inputs and outputs of other blocks is also easy.

When refining the sequential code to the process network, verifying the behaviors of sequential code and the process network are the same or not is necessary.

If all random functions in the program use the same random sequence, the order of getting random number is changed because the processes in the code may execute parallelly while refining the sequential code to the process network. Therefore, we bring up our second rule, all random functions in the program use different random sequence. If it is followed, we can get the same random number in sequential code and the process network easily. Thus, we could verify the behaviors of sequential code and the process network easily. The statement (1) and statement (2) in the code shown in Figure 6 are the instance of this rule.

Complex data is often used in a communication system. Using different data type in a program to model it is trouble to modify the problem. Therefore, we bring up our

C/C++ standard library should be used to compute complex number because of readability and facilitating modeling data. If we used it, modeling data exchange in different block in the process network may be not annoying and the problem may be easily understand. Otherwise, we may see that different data type used in different module and modeling data exchange spends much time. The statement (3) and these declarations of “comp” in the code instantiate how to follow this rule.

The fourth rule we bring up is to consider data grain size which is used in hardware block. While refining the sequential code to the process network, we hope the behavior of the process network is similar to the real system. Thus, we use real data grain size to exchange data. If it is considered when writing sequential code, refining is easier and less time is wasted to complete whole work. In the code shown in Figure 6, the functions compute a symbol of data instead of a packet of data. It accords with the common condition of hardware and sets the example of this rule.

The fifth rule we bring up is to keep block operation sequence by topological order. It facilitates modeling inputs and outputs of the whole system. It is important when modeling a system with many blocks. Take the code in Figure 6; the data exchange in the code could use the directed graph to model in Figure 7. Obviously we keep block operation sequence by topological order in the code.

Figure 7 The data exchange of the code example

3.2 A Process Network Model

We establish a process network we mention in 2.3 to model the OFDM system with parallel computation. In this section, we introduce how to establish the process network model. It should be noticed that we use TLM library which is described in [8]

to model the system because of convenience.

First, we should design how to model a node in a process network by SystemC.

The code in Figure 8 is the pattern of the node of the system written in SystemC. In the example of modeling a node, we model the node A in the process network which is mentioned in 3.1. In the object declaration of SystemC in the code, we see the three parts of it the declarations of ports, the body function “proc”, and the object constructor of SystemC. In the function “proc”, the statement (1) is not executed until the input_1, using the get-interface function of TLM library, has data. The statement (2) has same condition, too. That is, they work with blocking read. Thus, our firing rule of process network model is guaranteed. In other words, we guarantee that the process does not execute until all its input queues has data. The part of the object constructor of SystemC puts the function “proc” to an individual thread. And so forth, we can write the SystemC code of other nodes.

SC_MODULE(NODE_A) {

port<tlm_get_if<packet> > input_1;

port<tlm_get_if<packet> > input_2;

port<tlm_put_if<packet> > output;

void proc() {

Figure 8 SystemC code of a node of the process network

The code in Figure 9 is the pattern of the top-view of the system written in SystemC. It illustrates how the connection between nodes and FIFOs are implemented.

The process network described by the code in Figure 9 is the same as the process network described by the directed graph in Figure 2. There are the declarations of nodes and FIFOs and an object constructor of SystemC in the object declaration of SystemC in the code. The statements in the object constructor of SystemC describe

how to connect the FIFOs and the node. For example, the statement “C.output(p)”

means the connection of the output pin of node C and the FIFO p.

SC_MODULE(platform) {

NODE_A A;NODE_B B;NODE_C C;NODE_D D;

tlm_fifo<packet> p;tlm_fifo<packet> q;tlm_fifo<packet> r;

SC_CTOR(platform):NODE_A("NODE_A"),NODE_B("NODE_B"),

Figure 9 The system top-view written in SystemC

Therefore, we use a process network to model the OFDM system in this work.

We describe our procedure of making process network model here. First, we divide whole system to processes, according to the original data flow of the system. Then, we use FIFOs to connect these processes. Specially, while modeling the OFDM

model of a system for being ease to observe the data flow of the system.

In this work, we use process network to model an OFDM system. Because we want to observe the operation of the receiver, we divide the receiver to small processes. Figure 10 and Figure 11 depicts how the receiver part of the OFDM system blocks are divided into processes. We will use four processes, “Guard”, “Fine signal Detection” “Estimation”, “Coarse Signal Dectection”, and a delay element and some FIFOs connecting these nodes to model the receiver in the OFDM system. The block

“Down Converter” and the block “A/D LPF” are combined to the channel model and the block “Signal Demapper” and the block “P/S” are ignored in the original C code.

Thus we ignore these block in our receiver model. We use the method we mentioned in this section to establish the process network in this work.

Figure 10 Receiver architecture divided into processes

Figure 11 Receiver architecture divided into processes

According to the above-mentioned method, we divide the OFDM system into the process network in Figure 12. We also divide the receiver process into the four smaller processes and one delay element to reflect the detail operation of the receiver.

Figure 12 depicts our process network model of the OFDM system

Figure 12 Process network model of the OFDM system

When using the process network to model the OFDM system, the loop including

estimation value of the last iteration to estimate the channel at this iteration.

Estimation and Coarse-signal detection all need the result of Estimation at last iteration. Thus, “Coarse-signal detection” and “Estimation” can not run at the same time because Coarse-signal detection must wait the result Estimation at the same iteration.

In consequence, we try to add a delay element after the original delay element as shown in Figure 13 shows. That means the OFDM system uses the channel estimation value of the penultimate iteration. Thus, “Coarse-signal detection” and “Estimation”

can run at the same time. It may increase the bit error rate of the OFDM system.

However, it also increases the parallelization efficiency. We use the example to prove that changing architecture in the process network level of our design flow is useful.

Figure 13 Process network model of the OFDM system (with two delay element)

3.3 Timed Functional Model

After establishing process network of a system, we add timing information to it

to establish timed functional model of the OFDM system. Hence, we must get timing information for establishing timed functional model. While getting timing information, using hardware or software to implement certain function is must be considered. We get software timing information for modeling a block which is implemented in software and get hardware timing information for modeling a block which is implemented in hardware.

Before we get software timing information, we must make the code realistic.

Hence, some things must be done before running sequential executable code on instruction set simulator (ISS). For example, two things must be done when establishing timed functional model of the OFDM system. First, fixed-point modification should be considered. If we want to estimate a function which executes on an environment without floating point unit, we should modify the function to a fixed-point function. Second, we should use table-lookup acceleration to accelerate a function which is frequently used because we want to get realistic timing information.

Then we run sequential executable code which is modified on ISS and get software timing information.

Moreover, we get hardware timing information according to information from the bottom layer or reasonable assumption.

Therefore, we add timing information to the process network model of the system. Thus, we get a timed functional model of the system.

SC_MODULE(NODE_A) {

port<tlm_get_if<packet> > input_1;

port<tlm_get_if<packet> > input_2;

port<tlm_put_if<packet> > output;

void proc() {

Figure 14 SystemC code of timed functional model

The code in Figure 14 is the form of a timed functional process model written by SystemC. The difference between the process network model and the timed functional process model is that time timed functional process model is added timing information. The boldface statement in the figure is just the difference in this example.

It is a “wait” statement. When the simulator counts the execution time, it waits at the statement for 10 micro-second in this example. Thus, we establish the timed

functional model doesn’t work until all input pins have data, counts its computation time and put its result to their output pins.

3.4 Resource Management and Scheduling

When the issues of real design are discussed, the problem of limited resource must be paid attention to. Thus, a resource scheduler must exist. In this case study, we must concern memory scheduling and FFT scheduling. When modeling a system without infinite resource, we get the more realistic result from simulation after constructing the mechanism.

First of all, we establish a model of resource scheduler to handle resource request in the system model. In the model of resource scheduler, we use mutexes to decide which process gets the resource. A mutex represents a resource available in our scheduler model. When a process requests a resource, the scheduler decides that the process gets resource or not. If there is any resource, the process gets resource;

otherwise, the process gets no resource. While the process completes its work about the resource, it usually releases the resource. However, if we would like to transfer the resource and some things sticking the resource to another process, we transfer the resource to the budgeted process. For example, a process may want to transfer the memory and the data in the memory to another process. The transformation is budgeted when designing the system. We will give an example below.

In Figure 15, a mutex represent a resource and a man in the figure represents a process. We will use figures of this style to illustrate our mechanism of memory scheduling.

Figure 15 Motions of resource scheduler I

As shown in Figure 16, when a process requests a memory, the scheduler checks whether there is memory available or not. If there is memory available, the scheduler gives the process a grant to use a memory.

Figure 16 Motions of resource scheduler II

After the process which gets resource in last step completes its work about the resource, it transfers the resource to the budgeted process. This movement transfers

not only the memory but also the data in the memory. Figure 17 depicts it.

Figure 17 Motions of resource scheduler III

As shown in Figure 18, the process with a grant requests another and the scheduler gives a grant to it. Whether a process owns any memory or not, it can always request memory. The scheduler gives it a grant when there is a memory available.

As shown in Figure 19, the scheduler can handle the requests of two processes at the same time. Besides, if any process requests a memory now, the scheduler does not reply until there is a memory released. While the process gets no response, it stops its work.

Figure 19 Motions of resource scheduler V

The example in Figure 20 depicts the model of scheduler. The other resources scheduling can completely imitate this form. In the figure, the F1 block and the F2 block are general processes and the FFT1 block and the FFT2 block represent the processes with FFT functions. We assume there is only a FFT block in hardware in the example. First, FFT1 request the FFT block in hardware and get grant. Second, FFT2 request the FFT block in hardware, gets no grant, and wait that until there is a FFT block in hardware available in the system. Third, the FFT1 complete its work about FFT function and release the FFT block in hardware and FFT2 gets the grant and starts their work.

Figure 20 The model of resource scheduling

SC_MODULE(sched)

Figure 21 The pseudo-code of resource scheduler

Figure 21 shows the code pattern of a scheduler written in SystemC. The SC_MODULE in Figure 21 has three sub-functions Request, Release, and Proc. The

object constructor of SystemC describes that Proc is put in an independent thread.

Then, the Proc finds next job to execute, signal grant event and wait release event.

When a process wants to request a resource, it calls the request function. The request function pushes a job to queue and wait for grant event. Then, when it completes its work about the resource, it calls the release function.

Therefore, we use the mechanism above-mentioned to model our scheduler.

The code in Figure 22 is the pattern of a process connecting to a scheduler written in SystemC. The process requests the hardware resource, waits the grant, computes its function, and releases the hardware resource.

SC_MODULE(estimation) {

port<tlm_get_if<packet> > input;

port<tlm_put_if<packet> > output;

port<sched> fft_sched;

Figure 22 SystemC code of timed functional model with resource scheduling

About the OFDM system used at this case study, there are two aspects we concern about scheduling: FFT scheduling and memory scheduling. We will make a

detailed description with two figures, Figure 23 and Figure 24, below.

Figure 23 depicts the mechanism’s operation of memory scheduling at our case study.

We describe how it works in an iteration below. There are several steps in an iteration.

At first, we always give the delay element a grant of one unit memory, and the first arrow points out the transaction. When the whole system starts, the process “Guard”

request one unit memory. The second arrow points out this transaction. It starts to computer its result when getting the grant of one unit memory. After finishing its work, it transfers its result and the grant of the memory which it got to “Coarse-signal detection”, “Estimation”, and “Fine signal detection”. Then, the process

“Coarse-signal detection” starts to work and request another unit memory to store data, and the third arrow points out the transaction. There the process “Coarse signal detection” can not use the memory which the process “Guard” to store its output because the result of the process “Guard” is used by the two other processes, the process “Estimation” and the process “Fine signal detection”. Then, the result of

“Coarse signal detection” and that memory are transferred to the process “Estimation”

after “Coarse signal detection” finishes its work. Thus the process “Estimation” uses the memory got by “Guard” and “Coarse-signal detection” to finish its work. “Fine signal detection” does too. “Estimation” transfer its data and the memory got from

“Coarse signal detection” to “Fine signal detection”. Finally, the fourth arrow points out that the process “Fine signal detection” finish its work and release the two

“Coarse signal detection” to “Fine signal detection”. Finally, the fourth arrow points out that the process “Fine signal detection” finish its work and release the two

相關文件