Resource Management and Scheduling

Chapter 3 Our Platform

3.4 Resource Management and Scheduling

When the issues of real design are discussed, the problem of limited resource must be paid attention to. Thus, a resource scheduler must exist. In this case study, we must concern memory scheduling and FFT scheduling. When modeling a system without infinite resource, we get the more realistic result from simulation after constructing the mechanism.

First of all, we establish a model of resource scheduler to handle resource request in the system model. In the model of resource scheduler, we use mutexes to decide which process gets the resource. A mutex represents a resource available in our scheduler model. When a process requests a resource, the scheduler decides that the process gets resource or not. If there is any resource, the process gets resource;

otherwise, the process gets no resource. While the process completes its work about the resource, it usually releases the resource. However, if we would like to transfer the resource and some things sticking the resource to another process, we transfer the resource to the budgeted process. For example, a process may want to transfer the memory and the data in the memory to another process. The transformation is budgeted when designing the system. We will give an example below.

In Figure 15, a mutex represent a resource and a man in the figure represents a process. We will use figures of this style to illustrate our mechanism of memory scheduling.

Figure 15 Motions of resource scheduler I

As shown in Figure 16, when a process requests a memory, the scheduler checks whether there is memory available or not. If there is memory available, the scheduler gives the process a grant to use a memory.

Figure 16 Motions of resource scheduler II

After the process which gets resource in last step completes its work about the resource, it transfers the resource to the budgeted process. This movement transfers

not only the memory but also the data in the memory. Figure 17 depicts it.

Figure 17 Motions of resource scheduler III

As shown in Figure 18, the process with a grant requests another and the scheduler gives a grant to it. Whether a process owns any memory or not, it can always request memory. The scheduler gives it a grant when there is a memory available.

As shown in Figure 19, the scheduler can handle the requests of two processes at the same time. Besides, if any process requests a memory now, the scheduler does not reply until there is a memory released. While the process gets no response, it stops its work.

Figure 19 Motions of resource scheduler V

The example in Figure 20 depicts the model of scheduler. The other resources scheduling can completely imitate this form. In the figure, the F1 block and the F2 block are general processes and the FFT1 block and the FFT2 block represent the processes with FFT functions. We assume there is only a FFT block in hardware in the example. First, FFT1 request the FFT block in hardware and get grant. Second, FFT2 request the FFT block in hardware, gets no grant, and wait that until there is a FFT block in hardware available in the system. Third, the FFT1 complete its work about FFT function and release the FFT block in hardware and FFT2 gets the grant and starts their work.

Figure 20 The model of resource scheduling

SC_MODULE(sched)

Figure 21 The pseudo-code of resource scheduler

Figure 21 shows the code pattern of a scheduler written in SystemC. The SC_MODULE in Figure 21 has three sub-functions Request, Release, and Proc. The

object constructor of SystemC describes that Proc is put in an independent thread.

Then, the Proc finds next job to execute, signal grant event and wait release event.

When a process wants to request a resource, it calls the request function. The request function pushes a job to queue and wait for grant event. Then, when it completes its work about the resource, it calls the release function.

Therefore, we use the mechanism above-mentioned to model our scheduler.

The code in Figure 22 is the pattern of a process connecting to a scheduler written in SystemC. The process requests the hardware resource, waits the grant, computes its function, and releases the hardware resource.

SC_MODULE(estimation) {

port<tlm_get_if<packet> > input;

port<tlm_put_if<packet> > output;

port<sched> fft_sched;

Figure 22 SystemC code of timed functional model with resource scheduling

About the OFDM system used at this case study, there are two aspects we concern about scheduling: FFT scheduling and memory scheduling. We will make a

detailed description with two figures, Figure 23 and Figure 24, below.

Figure 23 depicts the mechanism’s operation of memory scheduling at our case study.

We describe how it works in an iteration below. There are several steps in an iteration.

At first, we always give the delay element a grant of one unit memory, and the first arrow points out the transaction. When the whole system starts, the process “Guard”

request one unit memory. The second arrow points out this transaction. It starts to computer its result when getting the grant of one unit memory. After finishing its work, it transfers its result and the grant of the memory which it got to “Coarse-signal detection”, “Estimation”, and “Fine signal detection”. Then, the process

“Coarse-signal detection” starts to work and request another unit memory to store data, and the third arrow points out the transaction. There the process “Coarse signal detection” can not use the memory which the process “Guard” to store its output because the result of the process “Guard” is used by the two other processes, the process “Estimation” and the process “Fine signal detection”. Then, the result of

“Coarse signal detection” and that memory are transferred to the process “Estimation”

after “Coarse signal detection” finishes its work. Thus the process “Estimation” uses the memory got by “Guard” and “Coarse-signal detection” to finish its work. “Fine signal detection” does too. “Estimation” transfer its data and the memory got from

“Coarse signal detection” to “Fine signal detection”. Finally, the fourth arrow points out that the process “Fine signal detection” finish its work and release the two memories got form the process “Estimation”.

Figure 23 OFDM memory scheduling

Figure 24 depicts the mechanism’s operation of FFT scheduling at our case study. We connect the process having FFT function to FFT scheduler. While these processes need a FFT block in hardware, they send a request to FFT scheduler. If they get grants, they compute their FFT function. If they get no grants, they wait until the scheduler gives them grants. After they finish their work about FFT function, they release the FFT block in hardware to the FFT scheduler. Because the FFT block in hardware does not need transfer data, the mechanism of FFT scheduling is simpler than memory scheduling.

Figure 24 OFDM FFT scheduling

When observing the operation of the timed functional model with a scheduler, we see that the process “Guard” gets memory too often. It makes the system work inefficiently. We think that setting the order of memory grant at this case study may make the system more efficient. We will see the effect at this case study in Chapter 4.

We describe how to set the order below. As shown in Figure 25, the delay element always gets a unit of memory; the process “Guard” and the process “Coarse signal detection” get one unit of memory and another unit of memory in an iteration respectively. In the same iteration, the process “Guard” sends a request to the memory scheduler before the process “Coarse signal detection” sends a request to the memory scheduler. Thus, we set that the process “Guard” does not gets grant before the process “Coarse signal detection” gets in the last iteration. We will see the effect at this case study in Chapter 4.

Figure 25 OFDM memory static scheduling

Chapter 4 Experimental Results

4.1 Experimental Information

In this chapter, we introduce some experiments with different hardware architecture and different scheduling strategy on our platform. These experiments confirm practicability of our platform.

As we said earlier, we must get hardware timing or make reasonable assumptions make timed functional model. At this case study, FFT architecture and the latency of matrix operation are the factors. Table 1 shows that how we make the assumption about the two factors. Besides, all of our experiments compute 30000 symbols.

Item Clock

Complex number adder timing 100MHz

Complex number multiplier timing 100MHz

Butterfly processing element timing 100MHz

Table 1 Experimental fundamental assumptions of timing

We base on these assumptions to decide the timing of modules in the system.

Then we make simulation of systems with different scheduling strategy and different hardware architecture to verify the effect of static scheduling, adding delay element to increase parallelism and different FFT architectures. These experiments also verify that platform based design is helpful for design space exploration. The experimental results are introduced in 4.2.

4.2 Experimental Results

4.2.1 Experiment

Experiment I is our general case. In other words, the result is our standard to let us understand the effect of different scheduling strategy and different hardware architecture. The hardware architecture of system in experiment I is shown in Table 2.

Item Architecture Memory 2048 bytes, unlimited memory bandwidth

FFT Memory based, radix-4

Delay element One

Static scheduling none

Table 2 Experimental assumptions of architecture in experiment I

Figure 26 shows the result of experiment I.

FFT

In 3.2, we supposed that if a delay element is added to the system, the system may be more efficient than the original system. In experiment II, we verify whether the strategy is valid or not. The hardware architecture of system in experiment II is shown in Table 3.

Item Architecture Memory 2048 bytes, unlimited memory bandwidth

FFT Memory based, radix-4

Delay element Two

Static scheduling none

Table 3 Experimental assumptions of architecture in experiment II

Figure 27 shows the result of experiment II.

FFT

Figure 27 OFDM execution time of experiment II (ms)

In the figure, we see that using more than two units of FFTs hardware or more than seven units of memories does not help decrease the execution time and the system in the condition gets no result with less than five units of memories.

Experiment III

We supposed that our mechanism of static scheduling decreases the need of resource in 3.4. In experiment III, we verify the mechanism we suggested is efferent or not. The hardware architecture of system in experiment III is shown in Table 4.

Item Architecture Memory 2048 bytes, unlimited memory bandwidth

FFT Memory based, radix-4

Delay element One

Static scheduling Set the order of memory grant

Table 4 Experimental assumptions of architecture in experiment III

Figure 27 shows the result of experiment III.

Figure 28 OFDM execution time of experiment III (ms)

In the figure, we see that using more than two units of FFTs hardware or more than five units of memories does not help decrease the execution time and the system in the condition gets no result with less than three units of memories.

We use the strategy suggested in 3.2 and our mechanism of static scheduling suggested in 3.4 in Experiment IV at the same time. Table 5 depicts the hardware architecture of system in experiment IV.

Item Architecture Memory 2048 bytes, unlimited memory bandwidth

FFT Memory based, radix-4

Delay element Two

Static scheduling Set the order of memory grant

Table 5 Experimental assumptions of architecture in experiment IV

Figure 28 shows the result of experiment IV

Figure 29 OFDM execution time of experiment IV (ms)

The convenience of changing architecture in a system model is one main advantage of platform design. It allows us try different architectures in a system to find the best configuration of hardware. Experiment V just exhibits the convenience.

Table 6 depicts the hardware architecture of system in experiment V. We change the FFT architecture in the system.

Item Architecture Memory 2048 bytes, unlimited memory bandwidth

FFT Memory based, radix-4

Delay element One

Static scheduling None

Table 6 Experimental assumptions of architecture in experiment V

Figure 30 shows the result of experiment V.

FFT

Figure 30 OFDM execution time of experiment V (ms)

In the figure, we see that using more than two units of FFTs hardware or more than six units of memories does not help decrease the execution time and the system in the condition gets no result with less than four units of memories. It also shows the effect of FFT architecture the system in the condition gets no result with less than four units of memories.

4.2.2 Performance Analysis

architecture. We ignore cases using three units of FFT hardware there because the results of these cases are the same as cases using two units of FFT hardware and we show these things above. In the table we see some interesting thins we mention below.

In experiment II, we add a delay element to increase parallelism. The strategy

sufficient memory. Thus, it gets good performance with these cases using seven and eight units of memories.

In experiment III, we use static scheduling to make good use of memories. Thus, it gets good performance with these cases using five and three units of memories.

In experiment IV, we use static scheduling and add a delay element at the same time. It gets good performance with these cases using six units of memories.

Therefore, we see that design space exploration at electric system level is useful.

It helps us choice different architectures at different condition. For example, at this case study, we can decide which architecture should be used basing on these experimental results, like deciding to use static scheduling and add no delay element when having only three units of memories to use etc.

Besides, we also can see the effect of FFT architecture in experimental results. It should notice that the process may transfers memories to another process instead of releasing every condition. Thus, some cases without enough can not get results.

Chapter 5 Conclusion and Future Work

In this work, we apply a top-down design methodology to an OFDM design. By incorporating the process network model, the timed functional model, systems with different configurations can be simulated in a short time. Some important parameters are then extracted from the simulation result and the performance of the system can be assessed before the system is implemented.

Concretely speaking, we establish a framework with process network model, timed functional model, and resource scheduling for design space exploration and design modeling. Performances of systems with different configurations can be examined.

Also, we try to use our framework to verify our static scheduling mechanism and our idea to increase parallelism. The experimental results show that our platform is practical.

Reference

[1] A. Sangiovanni-Vincentelli, “Defining platform-based design,” in EEDesign of EETimes, 2002

[2] J. Henkel, “Closing the SoC design gap,” in Computer, Sept. 2003, Volume 36, Issue 9, pages 119 – 121

[3] F. Balarin, Y. Watanabe, H. Hsieh, L. Lavagno, C. Passerone, A.

Sangiovanni-Vincentelli, “Metropolis: An Integrated Electronic System Design Environment,” in IEEE Computer, April 2003, p 45-52.

[4] M.-L. Ku and C.-C. Huang, “A complementary code pilot-based transmitter diversity technique for OFDM systems,” in IEEE Transactions on Wireless Communications, March 2006, Volume 5, Issue 3, pages 504 – 508

[5] G. Kahn, “The semantics of a simple language for parallel programming," in Proceedings of the IFIP Congress, 1974.

[6] SystemC 2.0.1 Language Reference Manual, 2003. Available from the Open SystemC Initiative (OSCI) http://www.systemc.org.

[7] Sudeep Pasricha, "Transaction level modelling of soc with SystemC 2.0,” In Synopsys User Group Conference, 2002.

[8] A. Rose, S. Swan, J. Pierce, and J. Fernandez. “Transaction Level Modeling in SystemC,” OSCI TLM Working Group, 2005.

VITA

Guan-Hao Chen was born in Hualien, Taiwan on April 6, 1982. He received the B.S. degree in Electronics Engineering from National Chiao Tung University in June 2004 and entered the Institute of Electronics, National Chiao Tung University in September 2004. His research interests include electronic design automation (EDA) and VLSI design. He received the M.S. degree from National Chiao Tung University in August 2006.

在文檔中電子系統層級上的設計方法－以正交多頻多工系統為例 (頁 30-0)