Experiment environment - Proposed Algorithm

Proposed Algorithm

4.1 Experiment environment

4.1.1 SystemC model

The following experiment environment is developed under SystemC v2.1 with the transaction level model (TLM) library [28, 29]. In Figure 4.1, a simple system with N masters and one slave is shown as an example. Each master has a thread to generate bus traffic and puts the requests into the respective tlm transport channel.

After a master initiates the request to the tlm transport channel, it waits for the cor-responding response from the slave. The thread in the arbiter polls all of the request fifos. It decides which is the most important request and then forwards the request from the master to the slave at the same time. After the slave responds, the arbiter puts the response into the response fifo through the relevant tlm transport channel.

The master then picks up the corresponding response up and completes the transac-tion. It is a high abstraction-level model to evaluate the system performance under different arbitration algorithms.

The arbiter is able to adopt any available algorithms to make arbitration decisions. In the following experiments, five arbitration algorithms, static priority, lottery, TDMA + Lottery (the first level arbitration is TDMA and the second level arbitration is Lottery), RT lottery and RB lottery are evaluated for comprehensive comparisons.

30 The high abstraction-level model is capable of providing a fast simulation speed that is up to one million cycles per second. Therefore, we can efficiently estimate the system performance and explore the proper system design parameters in this environment.

Figure 4.1: An example system using transaction level model in SystemC

4.1.2 Traffic types

We classify three high abstract-level traffic models to emulate the IP cores behavior in SoC systems. They have some parameters defined as follows. First, the transactions of a request is represented as the beat number. For example, if the beat number of a request is 4, it means that it is a 4-beat transaction. Second, the time of next request can be initiated is determined as the interval time. Third, the real-time requirement is represent as Rcycle which is the dead-line of a request.

• D type(D for dependency):

A D type master has no real-time requirements and initiates the next request

at the time depending on the finish time of the current request. As show in Figure 4.2, the beat number is 5 and the interval time is 10. If the first request is initiated at cycle 2 and is granted at cycle 5, the request will be completed at cycle 9. The next request will be initiated at cycle 19 which is 10 cycles later than the finish time.

Figure 4.2: D type master (beat number = 5; interval time = 15)

• D R type(D for dependency, R for real-time):

A D R type master behaviors like a D type master with the real-time require-ment. Each request must be completed before its deadline. Figure 4.3 is an example with the same parameters used in Figure 4.2. But the master has an extra real-time requirement, Rcycle, which is set to 10 cycles. Consequently, the request initiated at cycle 2 must be completed before cycle 12 which is 10 cycles later than the initiated time. It is a real-time violation, if the request is not completed until cycle 12.

Figure 4.3: D R type master (beat number = 5; interval time = 15; Rcycle = 10)

• ND R type(ND for non-dependency, R for real-time):

A ND R type master is another kind of master with the real-time requirement.

32 But the initiated time of the current request from an ND R type master is independent of the completed time of its previous request. In other words, the ND R type masters issue requests periodically. As shown in Figure 4.4, the same parameters used in Figure 4.3. Since the interval time is 15, the second request is initiated at cycle 17, which directly depends on the initiated time of the first request.

Figure 4.4: ND R type master (beat number = 5; interval time = 15; R_cycle = 10)

4.1.3 Traffic behavior

In the following experiments, we set up a system with eight masters. As shown in Table 4.1, the second column is the master type and the third column is the real-time requirement, Rcycle. However, the Rcycle of a D type master is left undefined since they do not have real-time requirements. The fourth and the fifth column are the probability of the beat size and the interval time between two successive requests initiated by a master, respectively.

For example, the second row shows that the Master 1 may initiate 50% chance of 8-beat transactions and 50% chance of 16-beat transactions. And the next request of Master 1 may wait for 6, 7, 8, 9 or 10 cycles with the probability of 10%, 20%, 30%, 40% and 50%, respectively.

Moreover, Master 1, Master 2, Master 3 and Master 4 are D type masters which only require a fraction of bandwidth allocation. Master 5, Master 6, Master 7 and Master 8 are masters with real-time requirements. They not only require a

fraction of bandwidth allocation but also restrict the completed time of each request.

Master 1, Master 3, Master 5 and Master 7 are heavy traffic masters and the others are light traffic masters. The heavy traffic masters have larger burst beats and shorter interval time than the light ones. In other words, the heavy traffic masters generate a heavier traffic load to the shared bus than the light ones do.

4.2 Experiment 1

In Experiment 1, we compare the performance of different arbitration algo-rithms, static priority, Lottery, TDMA + Lottery( the second level arbitration is Lottery), RT lottery and RB lottery. The level of difficulty to meet both real-time and bandwidth requirements generally depends on the bus workload in terms of the percentage of bus bandwidth utilization. As a result, we randomly generate pat-terns for different bus workloads and compare the results. As shown in Table 4.2, the first column gives the bus workload varying from 60% to 95%. For each bus workload, 100 random patterns of different required bandwidth combinations for the eight masters are generated. And then we simulate the input patterns with different arbitration algorithms. The results in 102400 simulation cycles are recorded and analyzed to see if the arbitration algorithms can meet the real-time and bandwidth

Table 4.1: The behavior of each master in the experiments type Rcycle beat/probability interval/probability

Master 1 D 8/50 16/50 6/10 7/20 8/40 9/20 10/10

Master 2 D 1/50 4/50 10/10 11/20 12/40 13/20 14/10

Master 3 D 8/50 16/50 6/10 7/20 8/40 9/20 10/10

Master 4 D 1/50 4/50 10/10 11/20 12/40 13/20 14/10

Master 5 D R 128 8/50 16/50 10/10 11/20 12/40 13/20 14/10 Master 6 D R 196 1/50 4/50 10/10 11/20 12/40 13/20 14/10 Master 7 ND R 65 8/50 16/50 65/10 66/20 67/40 68/20 69/10 Master 8 ND R 85 1/50 4/50 85/10 86/20 87/40 88/20 89/10

34 requirements simultaneously. If the real-time requirements are not all met or the allocated bandwidth is less than the required bandwidth with 2% error range during simulation, it is a failed pattern.

The parameters of compared arbitration algorithms are set as follows:

• fixed priority:

The priority of each master is assigned according to the required bandwidth.

The master with higher required bandwidth has a higher priority.

• Lottery:

The weight of each master is assigned according to the required bandwidth.

The required bandwidth ratio is the weight ratio.

• TDMA + Lottery:

1^stlevel – TDMA: Masters with real-time requirements are allocated with time slots accordingly.

2^nd level – Lottery: The weight of each master is assigned according to the required bandwidth. The required bandwidth ratio is regarded as the weight ratio.

• RT lottery:

The weight of each master is assigned according to their bandwidth require-ments and the traffic behaviors initially. To achieve better bandwidth alloca-tion, a weight tuning mechanism is used to redistribute tickets among masters.

More details can be found in [21].

• RB lottery:

The weight of each master is assigned and tuned as the process of RT lottery.

The size of observation window and bandwidth variance are set to 256 and 10 cycles, respectively.

As shown in Table 4.2, the first column is the total required bandwidth varying from 60% to 95%. For each case of total required bandwidth, we generate 100 random patterns of different required bandwidth combinations for the eight masters.

And then we simulate the input patterns with different arbitration algorithms. We record and analyze the results in 102400 simulation cycles to see if the arbitration algorithms can meet the real-time and bandwidth requirements simultaneously. If one of the requirements is violated, it is a failed pattern.

Table 4.2: The number of fail patterns under different arbitration algorithms

Fixed TDMA + RB lottery

Workload(%) Priority Lottery Lottery RT lottery FRB ARB

60 100 100 95 0 0 0

The second and third column show the simulation results of the fixed priority and Lottery, respectively. Since fixed priority and Lottery do not take real-time re-quirements into consideration, they fail in the 100 patterns under different workload.

The fourth column shows the simulation results of TDMA + Lottery. It handles the real-time requirements at the first level and provides control over bandwidth requirements at the second level. As a result, it may survive when the workload is less than 70%, but it still fails to meet the requirements in high workload. The fifth column is the results of RT lottery. Compared to other arbitration algorithms,

36 RT lottery is outstanding. Because of the hard real-time guarantees and the fine-control over bandwidth, it successfully meets the requirements of most patterns.

However, it still loss the bandwidth controllability in the high workload. The sixth and seventh column show the results of the proposed algorithm, RB lottery. The first fail pattern appears when the workload is 85%, which is an extremely high traffic load. Under the same observation window, we also find that ARB performs better than FRB.

Figure 4.5: Figure of Table 4.2

As shown in Figure 4.5, the number of fail patterns monotonically increases while the workload increases. And the proposed algorithm, RB lottery, still meets the requirements in more than half of patterns when the workload is 95%.

From this experiment, we have the following summaries. Since fixed priority and lottery do not consider the real-time requirements, they fail in the 100 random

patterns under different bus workloads. TDMA + Lottery may survive in the cases of low bus workload. Compared to other existing arbitration algorithms, RT lottery is remarkable good. However, RB lottery performs even better. The first failed case appears when the bus workload reaches 85%, which is an extremely high traffic load.

The number of fail cases monotonically increases while the total required bandwidth rises. And the proposed algorithm, RB lottery, still holds more than 50% successful cases even when workload is 95%.

Table 4.3: The summaries of experiment 1 real-time capability bandwidth capability fixed priority no consideration poor

Lottery no consideration required weight tuning TDMA + Lottery no guarantees required weight tuning

RT lottery always hold good except the highly loaded bus RB lottery always hold outperform than others

4.3 Experiment 2

The observation window size is one of the key parameters in our proposed algorithm. The performance under different window sizes is compared in the experi-ment. We experiment the size of observation window from 256 to infinite and observe the performance of RB lottery included FRB and ARB. Similar to experiment 1, we generate 100 random required bandwidth combinations for each workload and simulate 102400 cycles for each case.

As shown in Table 4.4, larger size of observation window in FRB can provide better performance. In the highly loaded bus, for example, 95% of workload in the seventh column, only 27 patterns miss the requirements in the 100 random patterns while the observation window is large enough. However, large observation window leads to higher hardware cost. There is a tradeoff between performance and cost.

Table 4.4: The number of fail patterns under different size of observation window in FRB the size of observation window in FRB

Workload(%) 128 256 512 1024 2048 ∞

85 4 1 0 0 0 0

ARB shows the same results with FRB in Table 4.5. Larger observation window results in better performance. However, the unlimited size of observation window for ARB is not useful. The bandwidth variance cooperates with the win-dows. If there are no windows during operation, ARB works like FRB. The seventh column shows that the number of fail pattern is the same as FRB in Table 4.4.

Comparing the performance of ARB and FRB in Figure 4.6, ARB shows better ability on handling real-time and bandwidth requirements simultaneously

Table 4.5: The number of fail patterns under different size of observation window in ARB the size of observation window in ARB

Workload(%) 128 256 512 1024 2048 ∞

85 1 0 0 0 0 0

87 5 1 0 0 0 0

89 13 4 0 0 0 0

91 32 13 5 4 0 4

93 34 23 16 10 9 12

95 52 39 27 22 20 27

under the same situation. About 30% to 100% fail cases in FRB could be solved in ARB due to the dynamic bias boundary.

Figure 4.6: Figures of Table 4.5 and Table 4.4

4.4 Experiment 3

In the following experiment, we compare the different bandwidth variance in our proposed architecture. Similar to experiment 1, we generate 100 random patterns with different required bandwidth combinations for each workload and simulate 102400 cycles for each case. The bandwidth variance varies from 5% of required bandwidth to 30% of required bandwidth and the observation window is set to 256 in the experiment.

As shown in Table 4.6, the larger bandwidth variance in ARB can provide better performance. For example, when the workload is 95%, there are 42 fail pat-terns when the bandwidth variance is 5% in the second column and 33 fail patpat-terns when the bandwidth variance is 30% in the seventh column. The bandwidth vari-ance reflects the relationships between the adjacent windows. The larger size of bandwidth variance records more communication behaviors of windows while ARB dynamically tunes the bandwidth allocation. However, larger bandwidth variance also leads to higher hardware cost. We need to make the tradeoff between perfor-mance and cost during design.

Table 4.6: The number of fail patterns under different size of bandwidth variance in ARB the bandwidth variance in ARB

Conclusions

A three-level arbitration algorithm, RB lottery, is proposed in this paper.

It provides not only the hard real-time guarantee but also the better capability of bandwidth control. The bandwidth regulator is utilized to dynamically monitor the bus communication and thus can precisely control the bandwidth allocation.

Four existing arbitration algorithms, static priority, Lottery, TDMA + Lottery, and RT lottery, are compared with RB lottery. The experimental results clearly show that RB lottery is the best among these five algorithms.

Hence, the lottery-based arbiter with a bandwidth regulator can be a better choice for those SoC buses with both the real-time and bandwidth constraints.

Bibliography

[1] “Peripheral Interconnect Bus Architecture.” http://www.omimo.be [2] “Open Core Protocol Specification – v1.0.” http://www.sonics.com, 1999.

[3] Virtual Socket Interface Alliance, http://www.vsi.org [4] “IBM Microelectronics CoreConnect Bus Architecture.”

http://www.chips.ibm.com/products/coreconnect

[5] “AMBA 2.0 Specification.” http://www.arm.com/armtech/AMBA [6] “Sonics Integration Architecture.” Sonics Inc., http://www.sonicsinc.com

[7] J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Ap-proach. Morgan Kaufmann Publishers, 2002

[8] J. L. Hennessy and D. A. Patterson, Computer Organization and Design: The Hardware/Software Interface Morgan Kaufmann Publishers, 2004

[9] H. Chang, L. Cooke, M. Hunt, G. Martin, A. McNelly, and, L.Todd, Surviving the SoC Revolution. Kluwer Academic Publishers, 1999.

[10] J. Liang, S. Swaminathan and, R. Tessier, ASOC: A Scalable, Single-Chip Commu-nications Architecture, Parallel Architectures and Compilation Techniques, 2000, Page(s):37-46.

[11] F. Poletti, D. Bertozzi, L. Benini and, A. Bogliolo, “Performance Analysis of Ar-bitration Policies for SoC Communication Architectures,” ACM Transactions on Embedded Computing Systems, 2003, Page(s):189-210.

[12] M. Yang, S. Q. Zheng, Bhagyavati, and, S. Kurkovskyt, “Programmable Weighted Arbiters for Constructing Switch Schedulers,” Workshop on High Performance Switching and Routing, 2004, Page(s): 203-206

[13] C.-H. Pyoun, C.-H. Lin, H.-S. Kim and, J.-W. Chong, “The Efficient Bus Arbi-tration Scheme in SoC Environment,” System-on-Chip for Real-Time Applications, 2003, Page(s):311-315

[14] I.E. Sutherland and J. Ebergen, “Computers Without Clocks,” Scientific American, INC, 2002, Page(s):62-69

[15] L. Benini and G. De Micheli, “Powering Networks on Chips: Energy-Efficient and Reliable Interconnect Design for SoCs,” 14 International Symposium on Systems Synthesis, 2001, Page(s):33-38

[16] K. Goossens, J. van Meerbergen, A. Peeters and, P. Wielage. “Networks on Silicon:

Combining Best-Effort and Guaranteed Services.” Design, Automation and Test in Europe, 2002, Page(s):423-425

[17] E. Rijpkema, K. Goossens, A. R adulescu, J. van Meerbergen, P. Wielage and, E. Waterlander, “Trade-offs in the Design of a Router with both Guaranteed and Best-effort Services for Networks on Chip,” Design Automation and Test in Europe, 2003, Page(s):294-302

[18] K. Lahiri, A. Raghunathan, G. Lakshminarayana and, S. Dey, “Design of High-Performance System-on-Chips using Communication Architecture Tuners,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2004, Page(s):620-636

[19] K. Lahiri, A. Raghunathan and, G. Lakshminarayana, “LOTTERYBUS: A New High-Performance Communication Architecture for System-on-Chip Designs,” De-sign Automation Conference, 2001, Page(s): 15-20

[20] C. A. Waldspurger and W. E. Weih, “Lottery Scheduling: Flexible Proportional-Share Resource Management,” Proceeding of the First Symposium on Operating Systems Design and Implementation, 1994, Page(s): 1-11

[21] C.-H. Chen, G.-W. Lee, J.-D. Huang and, J.-Y. Jou, “A Real-Time and Bandwidth Guaranteed Arbitration Algorithm for SoC Bus Communication,” Asia South Pa-cific Design Automation Conference, 2006, Page(s):600-605

[22] Y. Zhang, “Architecture and Performance Comparison of A Statistic-Based Lottery Arbiter for Shared Bus on Chip,” Asia South Pacific Design Automation Confer-ence, 2004, Page(s):1313-1316

[23] S. Ross, A First Course in Probability, Prentice Hall, 2002.

[24] C. Liu and J. Layland. “Scheduling Algorithms for Multiprogramming in a Hard Real-time Environment,” Journal of the ACM, 1973, Page(s):46-61

[25] J. Lehoczky, L. Sha and, Y. Ding, “The Rate Monotonic Scheduling Algorithm: Ex-act CharEx-acterization and Average Case Behavior,” IEEE Real-time Systems Sym-posium, 1989, page(s):201-209

[26] L. Sha and J. B. Goodenough, “Real-Time Scheduling Theory and Ada,” IEEE Computer, 1990, Page(s):53-62

44 [27] W. D. Weber, J. Chou, I. Swarbrick and, D. Wingard, “A Quality-of-Service

Mech-anism for Interconnection Networks in System-on-Chips,” Design, Automation and Test in Europe, 2005, Page(s):1530-1591

[28] The Open SystemC Initiative. http://www.systemc.org

[29] A. Rose, S. Swan, J. Pierce, and J.-M. Fernandez, “Transaction Level Modeling in SystemC,” http://www.systemc.org, 2005

在文檔中應用在硬式即時系統的頻寬裁演算法 (頁 38-53)