Thesis organization - 應用於匯流排矩陣系統之仲裁器權重調整演算法

Chapter 1 Introduction

1.4 Thesis organization

The remainder of this thesis is organized as follows. The lottery-based arbitration algorithms and an existing weight tuning algorithm proposed in [16] are briefly introduced in Chapter 2. Chapter 3 presents the detail of the proposed weight tuning algorithm, MC weight tuning algorithm. Experimental results are reported in Chapter 4. Finally, we conclude this thesis in Chapter 5.

Chapter 2 Preliminary

We introduce previous works in this chapter. First, we briefly introduce the traffic models.

Then, lottery-based arbitration algorithms and local-bus weight tuning algorithm are introduced briefly. Finally, we show some motivational examples for our weight tuning algorithm.

2.1 Traffic models of masters

Four parameters are defined to describe the behaviors of a master. The first parameter of a request is the beat number. For example, if the beat number of a request is 4, it means that it is a 4-beat transaction. In other words, the request needs 4 cycles to complete it works. Second, the time of next request can be initiated is determined as the interval time.

For example, if the interval time is 17, the next request initiates after 17 cycles. Third, the real-time requirement is represented as Rcycle which is the dead-line of a request. For

example, if the Rcycle is 10, the request must complete in 10 cycles. At last, we classify three high abstract-level traffic types to emulate the masters behavior [24]. That is D type master, D_R type master, and ND_R type master. The behavior of three different type masters is shown in the following.

z D type (D for dependency):

The D type master has no real-time requirement. The time of the D type master initiating a request depends on the finish time of the previous request. In Figure 6, the beat number is 4 and the interval time is 17. If a request is initiated at cycle 2 and granted at cycle 5, the request is completed at cycle 9. The next request is initiated at cycle 26 which is 17 cycles later than the finish time.

Figure 6 : D type master (beat number = 4; interval time = 17) z D_R type (D for dependency, R for real-time):

The behavior of the D_R type master is the same as the D type master except the real-time requirement. Requests of the D_R type master has real-time requirement. In Figure 7, we use the same parameters used in Figure 6 as an example. Because of the real-time requirement, a new parameter, Rcycle, is added in Figure 7. Rcycle is 10 cycles in the example. The first request is also initiated at cycle 2 and the request must be completed before cycle 12 because of the Rcycle is 10 cycles. It is a real-time violation, if the request is not completed

Figure 7 : D_R type master (beat number = 4; interval time = 17; Rcycle = 10) z ND_R type (ND for non-dependency, R for real-time):

The ND_R type master is another kind of master with the real-time requirement.

The behavior of ND_R type master is similar to D_R type master except on one thing. The time of the ND_R type master initiating a request does not depend on the finish time of the previous request. Actually, the time of the ND_R type master initiating a request depends on the initiated time of the previous request.

In other words, the ND_R type masters initiate requests periodically. In Figure 8, the same parameters are used in Figure 7. Since the interval time is at cycle 17 and the initiated time of the first request is at cycle 2, the second request is initiated at cycle 19, which directly depends on the initiated time of the first request.

Figure 8 : ND_R type master (beat number = 4; interval time = 17; Rcycle = 10) There is a limitation for all masters. If a master had initiated a request, it cannot initiate a new request when the previous request has not been finished. For example, a state machine is involved by a master. The master begins next state after previous state is completed. In other words, all masters are in serial execution.

2.2 Lottery-based arbitration algorithms

Lottery-based arbitration algorithms are probabilistic arbitration algorithm [16, 17, 24].It stochastically grants one of the contending masters according to the ticket assigned to them, either statically or dynamically. Each master holds a number of tickets for lottery-based algorithms. When a bus contention occurs, the lottery manager accumulates tickets of masters. According the tickets assignment, lottery manager probabilistically choose a master granted to access bus. As shown in Figure 9, there are four masters and each of them has a number of lottery tickets as the probability of bus granted. First, the lottery manager accumulates tickets of masters which has pending request. Then the lottery manager probabilistically chooses a master granted to access the bus from all contending masters. In other words, the lottery tickets act as the weight and lottery-based arbitration algorithms are weighted random arbitration algorithm to grant a master while contention.

Let the set of masters, M , M , ..., M , and each of them has ₁ ₂ _n t , t , ..., t tickets ₁ ₂ _n respectively. A set of Boolean variables, r , r , ..., r , represents the corresponding ₁ ₂ _n pending request. r is 1 if _i M has pending requests. Otherwise, _i r is 0. _i

The first step, the lottery manager accumulates the total tickets of masters which has pending requests, given by

∑

. Then the lottery manager generates a random

number from the range

[

^0,T

)

. The symbol

[

^0,T

)

means that all integers between 0 to T are included except T. If the random number lies in the range

1 and hold 1, 2, 3, and 4 tickets respectively. Three of them have pending requests, M1, M3, and M4, and the lottery manager accumulates their tickets

And it generates a random number, e.g. 5, from the range

[

^0,8

)

. The number lies between

1 1 2 2 3 3 4

r t +r t +r t = and r t_{1 1}+r t_{2 2}+r t_{3 3}+r t_{4 4} = , and then the bus is granted to M4. The 8 probability of M granted to access the bus is shown in Equation2.1. _i

Figure 10 : An example of Lottery

Since tickets act as granted probabilities of each master for lottery-based arbitration algorithms, ticket assignment is important to system performance. Lottery-based arbitration algorithms need additional algorithm to assign tickets to each master. The additional algorithm is called weight tuning algorithm. We introduce a weight tuning algorithm in the following section.

2.3 Local-bus weight tuning algorithm

The lottery-based arbitration algorithms need a weight tuning algorithm to result proper ticket assignment for all masters. A weight tuning algorithm can result tickets for each master. Proper ticket assignment makes masters meet their requirements as many as possible. By Equation2.1, tickets can decide grated probabilities for each master, and grated probabilities can obviously affect allocated bandwidth of each master. In other words, tickets have strong impact on bandwidth allocation for each master. For example, a shared bus system has three masters (named M0, M1, and M2) with same traffic models.

We simulate with different ticket assignments and respective bandwidth allocation is shown in Figure 11. The notation “10:1:1” means that M0 has tickets ten times larger than M1; and also ten times larger than M2. In Figure 11, it is easily observed that different ticket assignments result totally different bandwidth allocations. The weight tuning algorithm redistributes tickets between masters and result proper ticket assignment for masters. Masters with proper ticket assignment meet their requirements as many as possible.

0%5%

1:1:1 2:1:1 4:1:1 6:1:1 8:1:1 10:1:1

tickets ratio

bandwidth allocation(%)

M0 M1 M2

Figure 11 : Bandwidth allocation under different tickets assignment ratio

Finding an efficient weight tuning algorithm is a difficult challenge. Most of requirements can be met by ticket assignment resulting from an efficient weight tuning algorithm. In Figure 12 and Table 1, an example shows that why an efficient weight tuning algorithm is a difficult challenge. There are four masters with their traffic models in Table 1. Lottery algorithm with total 1024 tickets is used in Figure 12. We simulate with three different tickets assignments, and simulation results are shown in Figure 13. The notation

“252:143:220:409” in Figure 13 means M0 has 252 tickets, M1 has 143 tickets, M2 has 220 tickets, and M3 has 409 tickets. Comparing with three different ticket assignments, we only move the tickets from M0 to M2, but allocated bandwidth of all masters are changed.

Bandwidth allocation is totally changed when we redistribute tickets of two masters. When a weight tuning algorithm redistributes tickets between masters, the bandwidth allocation is disordered and not easily predictable.

Figure 12 : An example of single share bus architecture with four masters

Table 1 : The traffic models of Figure 12 Type Beat Interval

For single shared bus architecture, an efficient weight tuning algorithm, local-bus weight tuning algorithm, is proposed in [16]. Local-bus weight tuning algorithm results proper tickets for each master and masters can meet most of their requirements. The simple flow of local-bus weight tuning algorithm is shown in Figure 14. At first, the local-bus weight tuning algorithm analyzes the simulation result according the bandwidth allocation of each master. If a master gets bandwidth more than its requirement, it is grouped into S_more. If a master gets bandwidth less than its requirement, it is grouped into S_less. If a master gets bandwidth almost equal to its requirement, it is grouped into Smet. The master in S_more who gets the most bandwidth than its requirement is called M_most. The master in Sless who gets the least bandwidth than its requirements is called Mleast. Each master in Sless

gets insufficient bandwidth because each of them does not have enough tickets. If the master does not meet its bandwidth requirement, the local-bus weight tuning algorithm increases its tickets. When tickets of a master are increased, the granted probability is increased and the master can get more bandwidth than before. The local-bus weight tuning algorithm redistributes the tickets of masters in S_more and S_less, and tries to meet bandwidth requirements of each master.

The local-bus weight tuning algorithm is efficient for single shared bus system.

Lottery-basd arbitration algorithms with local-bus weight tuning algorithm in single shared bus system can meet hard real-time requirements and bandwidth requirements simultaneously with very high successful probability [24].

2.4 Motivation

In our thesis, we choose lottery-based arbitration algorithms as our arbitration algorithm of bus matrix architecture. With the evolution of process, more and more SoC systems need bus matrix architecture to deal with the complex communication between massive components. Since the lottery-based arbitration algorithms meet the hard real-time and bandwidth requirement simultaneously with high successful probability, using the lottery-based arbitration algorithms for bus matrix architecture is a good choice.

In this thesis, we find how to result proper ticket assignment for bus matrix architecture because we choose lottery-based arbitration algorithms as our arbitration algorithm. Proper ticket assignment of masters is important to lottery-based arbitration algorithms because it makes masters meet their requirements as many as possible. A weight tuning algorithm is needed for bus matrix architecture. Local-bus weight tuning algorithm produces proper ticket assignment for single shared bus architecture. In following, two methods are introduced that try to achieve our goal, finding proper ticket assignment for bus matrix architecture.

2.4.1 Model bus matrix architecture by shared bus architecture

Since lottery-based arbitration algorithms often use local-bus weight tuning algorithm before, we try to the use same weight tuning algorithm for bus matrix architecture. Because local-bus weight tuning algorithm used to be with single shared bus architecture, bus matrix architecture is intuitionally separated into many “single shared bus architecture”. In other words, bus matrix architecture is modeled by shared bus architecture. As shown in Figure 15 and Figure 16, bus matrix architecture has two buses, but two buses are

separated into two “single shared bus architecture” intuitionally. Bus 0 is single shared bus architecture, and bus 1 is single shared bus architecture, too. The traffic behavior of M0 on the bus 0 is independent to M0 on the bus 1. In other words, two buses are independent to each other. Therefore, the architecture which separates from bus matrix architecture is called “independent buses” or “IB” for simplification.

Figure 15 : An example of Bus matrix architecture

Figure 16 : Independent buses architecture separated from Figure 15

After independent buses architecture is generated, traffic model of bus matrix architecture has to be modified for independent buses architecture. In Figure 15, for example, the request ratio of M0 is 40% on the bus 0. It means that M0 on the bus 0 holds about 40% of total traffic amount of M0. In Figure 16, M0 on the bus 0 holds 100% of total traffic amount of M0. To make the behavior of independent buses architecture more similar to bus matrix architecture, traffic model of bus matrix has to be modified. The traffic model of Figure 15 is shown in Table 2. We simulate the bus matrix system and record the traffic

comparing with bus matrix architecture. We modify the parameters of Table 2 continuously until the traffic behavior of independent buses architecture is similar to bus matrix architecture. The modified traffic model of Table 2 is shown in Table 3. In Table 3, the interval time of each master is increased comparing with Table 2. The modified traffic model for independent buses architecture is to reflect similar behavior as original.

Table 2 : The traffic model of Figure 15 Bus Master Type Beat Interval

M0 D 12 5

Table 3 : The modified traffic model of Figure 16

Bus Master Type Beat Interval

M0 D 12 50

With independent buses architecture, modified traffic model, and bandwidth requirements (shown in Table 4, and same bandwidth requirements are used for independent buses architecture and bus matrix architecture.), local-bus weight tuning can result a ticket assignment (shown in column3 of Table 5) after simulation. All masters meet their bandwidth requirements with the ticket assignment for independent buses architecture (shown in column5 of Table 5). We then check whether the ticket assignment is a proper

ticket assignment or not for bus matrix architecture. The bandwidth allocation for each master is shown in column6 of Table 5 after bus matrix architecture is simulated with the ticket assignment resulted from independent buses architecture. As shown in Table 5, the ticket assignment resulted from independent buses is not a proper ticket assignment for bus matrix architecture because some of bandwidth requirements are not met (shown in column6 of Table 5). In other words, independent buses architecture fails to model behavior of bus matrix architecture.

Table 4 : The bandwidth requirement of Figure 15 and Figure 16 Bus Master Required bandwidth

M0 15 Bus Master Tickets Required

bandwidth

In section 2.1, we have introduced the limitation of masters; if a master had initiated a

request is not completed, M0 can not initiate any new request. The traffic behavior on different bus is not independent to each other on the bus matrix architecture. As shown in Figure 18, if M0 had initiated a request on bus 1, M0 can initiate another request on bus 0.

Traffic behavior of bus matrix architecture and independent buses architecture is totally different. In other words, it is no suitably that using independent buses architecture models behavior of bus matrix architecture.

Figure 17 : An example of the request limitation of masters

Figure 18 : An example shows the difference between IB and Figure 17

2.4.2 Bus matrix architecture with local-bus weight tuning

In fact, bus matrix architecture can use local-bus weight tuning directly. Local-bus weight tuning algorithm deals with ticket redistribution on only one bus at one time.

Local-bus weight tuning algorithm takes information of only one bus into consider first, and then it redistributes tickets of masters on the bus. After it completes ticket redistribution on previous bus, local-bus weight tuning considers information of another bus and redistributes tickets of masters on that bus. Local-bus weight tuning algorithm

redistributes tickets bus by bus. In Figure 19, for example, there are two buses, bus 0 and bus 1. Bus 0 consists of M0, M1, M2, and M3, and Bus 1 consists of M0, M1, M4, and M5.

Assume M_least of bus 0 is M1 and M_least of bus 1 is M0 (section2.3). Local-bus weight tuning algorithm takes information of bus 0 into consideration at first. According to information of bus 0, local-bus weight tuning algorithm increases tickets to M1 on the bus 0. After redistributing tickets on the bus 0, then local-bus weight tuning algorithm take information of bus 1 into consideration. According to information of bus 1, local-bus weight tuning algorithm increases tickets to M0 on the bus 1.

With bus matrix architecture shown in Figure 19 and respective traffic model shown in Table 6, local-bus weight tuning can result a ticket assignment for bus matrix architecture after simulation. Resultant ticket assignment and allocated bandwidth of masters are shown in Table 7. As shown in Table 7, each bus has total 1024 tickets. M1 on the bus 0 has more than 50% tickets of total tickets of bus 0, but it does not meet its bandwidth requirement; M0 on the bus 1 has more than 50% tickets of total tickets of bus 1, but it does not meet its bandwidth requirement either.

Figure 19 : An example of the bus matrix architecture

Table 6 : The traffic model of Figure 19 Bus Master Type Beat Interval

M0 D 12 5

Table 7 : The simulation result of Figure 19 with the local-bus weight tuning Bus Master Tickets Required

bandwidth

In this example, we observe that the bandwidth allocation of a MM on each bus has a fixed proportion. For example, M0 is a MM connecting with bus 0 and bus 1 in Figure 20.

When M0 initiates a request, the request has 20% probabilities to bus 0 and 80%

probabilities to bus 1. The request ratio of M0 is 1:4. Assume the beat number of M0 on bus 0 is as same as on bus 1. We observe that if M0 gets 1% bandwidth on bus 0, M0 gets about 4% bandwidth on bus 1. If M0 gets 8% bandwidth on bus 1, M0 gets about 2%

bandwidth on bus 0. The ratio of bandwidth allocation on bus 0 to bandwidth allocation on

bus 1 is 1:4. The bandwidth allocation of an MM is proportional to the request ratio if the beat numbers of the MM on all connecting buses are the same. In other words, the bandwidth allocation is related to the request ratio.

Figure 20 : An example of a MM with the request ratio equaling to 1:4

The relationship between bandwidth allocation and request ratio implies that if an MM misses one bandwidth requirement on one bus, the MM misses all bandwidth requirements on other buses which connect to the MM. As shown in Figure 21, the bandwidth requirement of M0 on bus 0 is 4%, and the bandwidth requirement of M0 on bus 0 is 16%.

Assume the beat number of M0 on bus 0 is as same as on bus 1, and the request ratio of M0 is 1:4. If M0 gets 2% bandwidth on the bus 0 which is 2% less than the bandwidth requirement, and then M0 gets 8% bandwidth on the bus 1 which is 8%less than the bandwidth requirement. If M0 gets 4% bandwidth on the bus 1 which is 12% less than the bandwidth requirement, and then M0 gets 1% bandwidth on the bus 0 which is 3% less than the bandwidth requirement. Only if M0 gets 4% bandwidth on the bus 0 or 16%

bandwidth on the bus 1, M0 can meet all bandwidth requirements on different buses.

Figure 21 : An MM misses or meets all bandwidth requirements

The relationship between bandwidth allocation and request ratio implies another thing.

Figure 22 is as same as Table 7. As shown in Figure 22, M0 on the bus 0 holds too fewer tickets and misses it bandwidth requirement. It induce that M0 misses its bandwidth requirement on bus 1. An MM misses its bandwidth requirement on a bus because it does not hold enough tickets to meet the bandwidth requirement on other buses.

When the local-bus weight tuning algorithm tunes the tickets of masters, it only considers with the information of one bus at one time and every bus tunes its tickets individually. If a master misses its bandwidth requirement on a bus, the local-bus weight tuning algorithm thought that the master needs more tickets and increase its tickets on this bus. But, an MM may miss its bandwidth requirement on a bus because it does not hold

在文檔中應用於匯流排矩陣系統之仲裁器權重調整演算法 (頁 17-0)