Thesis Organization - 雙向通道網路晶片之模塑

CHAPTER 1 INTRODUCTION

1.4 Thesis Organization

The rest of this Thesis is organized as follows. In Chapter 2, we will first introduce the background about the architecture of BiNoC and several routing algorithms. Chapter 3 will describe a 3-dimensional model of BiNoC first. Then, A routing algorithm and a flow control mechanism based on the 3-dimensional model are presented. In Chapter 4, we will explain how we implement our router, and show the experimental simulation results in Chapter 5. Finally, Chapter 6 will draw a conclusion.

CHAPTER 2 BACKGROUND

In this chapter, we will present the architecture of bidirectional channel network on chip (BiNoC) in Section 2.1 [11][12]. In a conventional router, all channels is unidirectional, thus it may lead to the following scenario where one channel is busy or in congestion and another channel is idle because the direction of the idle channel is an input channel. BiNoC is proposed to overcome this problem by make all channels bidirectional. In Section 2.2, deadlock, deadlock-avoidance, and some deadlock-free routing algorithms will be introduced.

2.1 Bidirectional Channel in BiNoC

A bidirectional channel network-on-chip (BiNoC) architecture is proposed to enhance the performance of on-chip communication [11] [12]. The BiNoC allows each communication channel to be dynamically self-configured to transmit flits in either direction in order to better utilize on-chip hardware resources. For example, as shown in Fig. 2-1(a), every vertex represents a task with a value of its computation time, and every edge represents the computing dependence with a value of communication volume which is inverse of the bandwidth in a data channel. A mesh NoC with most optimized mapping is shown in Fig. 2-1(b). We can find that the NoC only use three channels with the other idle. A timing analysis in Fig. 2-2 indicates that conventional NoC needs 80 cycles to execute. However, if we make the same mapping solution on BiNoC which can dynamically change the direction of each channel between each pair of routers like the architecture in Fig. 2-3, the bandwidth utilization will be improved and the total execution time can be reduced to 55 cycles.

. (a) (b) (c)

Fig. 2-1. Example of (a) Task Graph Mapping to (b) a Conventional NoC, and (c) BiNoC [11].

Fig. 2-2. Detailed Execution Schedules of Typical NoC and BiNoC [11].

BiNoC reconfigures direction of channels by a request-based design as shown in Fig. 2-3, where the bidirectional channels are connected with an InOut Buffer to control the data conveyed from the bidirectional channel. Furthermore, we also need a finite state machine (FSM) to control the availability of each channel, or conflict will occur when both of the routers want to deliver packets. Once the availability signal is asserted, the Switch Controller can use the channel to deliver packets. The FSM will cooperate with another FSM in the downstream router such that only one direction is used to convey packet. For any direction of a router, a high priority channel is set up as a default direction. For example, in Fig. 2-3, channel A is a default link for router 2

In_req_A, will be sent to router 1, and router 1 will have no availability for channel A, which can prevent channel from changing direction frequently if both of the routers have packets to deliver. Moreover, it may lead to unfair issue, causing starvation.

Under this architecture, if both of the routers have packets to deliver, only one channel can be used for both of the routers. We will give a more detailed explanation on the router design in Chapter 4.

Fig. 2-3. Bidirectional Transmission Scheme.

2.2 Deadlock-Free Routing Algorithms

Routing algorithm influences the performance of NoC a lot. There are different routing algorithms for different applications. All have a common requirement that they must be deadlock-free. In this section, we will give a clear introduction of this issue.

2.2.1 Deadlock and Deadlock-Avoidance

Deadlock is a crucial problem in NoC. Without an appropriate settling manner, the performance of interconnection network will be degraded very much, and even all the packets cannot be delivered to the destination. A theorem of deadlock avoidance proposed in [13] indicated that a connected and adaptive routing function R for an interconnection network I is deadlock-free if there is no cycle in its channel dependency graph (CDG). It is obvious by observing Fig. 2-4. The four packets at the four channels cannot make progress because they wait for each other cyclically.

Another way to deal with deadlock is deadlock recovery. We focus on deadlock avoidance in this work because deadlock recovery needs deadlock detection and deadlock handle mechanisms in addition. Overhead is thus increased, because of the deadlock detection and deadlock handle mechanisms. Recovery also increases the latency of packets. If deadlock is frequent, the performance is degraded heavily.

(a) (b)

Fig. 2-4. (a) A Deadlock Configuration and (b) Channel Dependency Graph.

2.2.2 XY Routing Algorithm

XY routing algorithm presented in [14] is a kind of deterministic routing algorithm.

With XY routing, a packet first traverses along the x dimension and then along the y dimension. XY routing is deadlock-free because no packet can traverse form y

a 2-D mesh NoC. However, for a given source and destination, XY routing always generates the same path such that XY routing has a poor load-balancing ability that leads to congestion problem. Fig. 2-5 gives an example.

Packet A

Packet B

Packet C

Fig. 2-5. An Example of XY Routing.

We can see that all of the three packets A, B, and C traverse along x dimension then along y dimension. Inevitably, some paths are congested with packets greatly like column 2 in Fig. 2-5.

2.2.3 Turn Model Based Routing Algorithm

Considering load balance, we prefer to use an adaptive routing algorithm that has more paths to choose for a packet delivery. There are two kinds of adaptive routing algorithms. One is minimal adaptive routing which routes a packet without using detour paths. Another is non-minimal adaptive routing which routes a packet with detour paths. However, non-minimal routing has to handle the livelock problem and its latency is higher, so we focus on minimal adaptive routing. Glass and Ni presented an elegant concept of turn model [15]. The basic idea of turn model is to prohibit the minimum number of turns that break all of the cycles in CDG such that

routing algorithms based on turn model can be deadlock-free. Three adaptive routing algorithms, namely west-first, north-last, and negative-first, are designed based on turn model. We show the four cases of prohibited turns of the three routing algorithms in Fig 2-6. Note that the solid lines indicate the allowed turns and the dash line indicate the prohibited turns. For example, Case two uses the turn model that prohibits S-W turn and N-W turn. According to this turn model, west-first routing delivers all the packets to west first if packets need to be delivered to west. Similar with the west-first routing, negative-first routing and north-last routing are designed according to the other turn models.

However, not all the turns can be used to prohibit deadlock. As indicated in Case four, a cycle will be generated if we use W-N turn in clockwise cycle and S-W in counterclockwise cycle. Turn model provides a simple way to design a deadlock-free adaptive routing, nevertheless, it is highly uneven in global view. At least half of the source-destination pairs are limited to having only one minimal path, while full adaptive is provided for the rest of the pairs.

Fig. 2-6. Four Cases of Turn Models.

To solve the unfair problem in a turn model, an odd-even turn model was presented by Chiu in [10]. This odd-even turn model restricts certain turns based on the locations such that none of the turns are eliminated in an NoC. Chiu defined that a

packet is not allowed to take an E-N turn or an E-S turn at any nodes located in an even column, and any packet is not allowed to take an N-W turn or an S-W turn at any nodes located in an odd column. Fig. 2-7 shows the odd-even turn model in a 4x3 mesh.

Fig. 2-7. Odd-Even Turn Model in a 4x3 Mesh.

Although the odd-even still restricts some turns for a packet to use, these restricted turns are unobvious in global view. Therefore, the odd-even turn model has a higher path-diversity than the original turn model. Based on the odd-even turn model, we can design various deadlock-free adaptive routing. Fig. 2-8 shows an example that follows the odd-even turn model in an 8x8 mesh NoC. We call it OE-Routing.

Fig. 2-8. Flow Control of OE-Routing.

2.3 Another Implementation of Bidirectional Channel

Another work similar with BiNoC is bandwidth-adaptive NoC presented by Cho et al. [16]. The aim is same with BiNoC that using idle channel to increase bandwidth.

Nevertheless, a difference between BiNoC and bandwidth-adaptive NoC is that there are several bidirectional channels between two routers. They added a bandwidth allocator between two routers to decide the direction of channels. The bandwidth allocator uses a signal named pressure as input. The two routers send pressure to the bandwidth allocator and let this allocator to arbitrate the direction of channel. The block diagram is shown in Fig. 2-9.

Fig. 2-9. Connection Between two Network Nodes Through a Bidirectional Link [16].

With this architecture, bandwidth allocator can arbitrate more than two channels.

This allocator can dynamically decide how much bandwidth a router should have. An example is shown in Fig. 2-10.

Fig. 2-10. Adaptability of a Mesh Network with Bidirectional Links [16].

CHAPTER 3

MODELLING OF BIDIRECTIONAL-CHANNEL NOC

In this chapter, we present a three-dimensional model of BiNoC and two case studies that exploit the characteristics of bidirectional channels based on the three-dimensional model of BiNoC. One is a new routing algorithm for BiNoC called bidirectional routing (BI-Routing). BI-Routing uses the bidirectional channels to route packets and provides higher path-diversity. Another is a new mechanism called TDM-BiNoC which uses the time division multiplexing (TDM) concept to dynamically allocate the bandwidth of channels. Extensive simulation results shown in Chapter 5 indicate that these two works can improve the performance of a Mesh-based NoC.

3.1 Three-Dimensional Model of BiNoC

BiNoC can reduce packet latency and achieve higher bandwidth utilization by making channel bidirectional as shown in Fig. 3-1. Compared with the conventional NoC, both the two channels can switch their direction to generate four channel patterns.

Fig. 3-1. Variation of Channels in BiNoC.

Since the original model of mesh NoC cannot show the behavior of BiNoC, we have to represent these four kinds of bidirectional channel patterns as a three-dimensional model in Fig. 3-2. The new Z-dimension is time related, which shows the channel diversity during time changed. Notice that not all the directions of channels in an NoC are changed in the different layers and we only care about the channel direction between two neighboring routers in a layer. In other words, we build our model in the view of one packet. Of course, we can just use any combination of these four patterns in a layer to represent the behavior of a BiNoC, but the combinations are too many and hard to represent or understand. The three-dimensional graph as shown in Fig. 3-2 is not a physical three-dimensional IC, but a conceptual model to represent the behavior of a BiNoC. We can use this model to express several researches related to BiNoC, like Quality of Service (QoS) [17] and fault tolerance [18]. In order to manage on-chip network resources adequately, traffic flow can be categorized in to guaranteed service (GS) and best effort (BE), and a QoS-BiNoC lets GS packets delivered to more layers in the model. Therefore GS packets can have higher path diversity than BE packets. As for fault-tolerance-BiNoC, our model indicates that bidirectional channels can have another layer to route when the channel of the original path fails. Moreover, as shown in Fig. 3-2, the odd even turn model in BiNoC can also be represented in our three-dimensional model. By using time division multiplexing concept, the odd even turn model in the three-dimensional model is complete. In other words, the odd even turn model in the L4 layer is complete when packets are delivered to the other layers like the L3 layer at another time. Besides, with this three-dimensional model, some good ideas pop out of our mind. We will show how to use this model to develop a routing algorithm and a time-division multiplexing mechanism for BiNoC.

Fig. 3-2. Three-Dimensional Model of BiNoC.

3.2 Bidirectional Routing Algorithm

The three-dimensional model of BiNoC mentioned in Section 3.1 indicates that BiNoC has higher path diversity than the original NoC. We use this path diversity to develop a BI-Routing algorithm for BiNoC in this section. After a brief explanation of motivation, we will introduce the detail of BI-Routing in Subsection 3.2.3.

3.2.1 Motivation of BI-Routing

In previous work, adaptive routing algorithms using turn model or odd-even turn model prohibiting some turns to prevent NoC from deadlock, hence, some paths are kept idle. The BI-Routing idea is shown in Fig. 3-3. On a conventional NoC, a

deadlock cycle formed by the paths of packet A, packet B, packet C, and packet D can be broken by using another layer of channel (in the Z-dimension). Therefore, we need not prohibit any turn and all paths can be included in the feasible routing set of BI-Routing.

Fig. 3-3. Cycles Breaking in BiNoC.

Figure 3-4 shows that the number of available paths of BI-Routing is much more than OE-routing. There are two cases in this example. In Case one, source is on the upper right corner and destination is on the lower left; in Case two, source is on the lower left and destination is on the upper right corner.

Fig. 3-4. Comparison Between OE-Routing and BI-Routing in a 4x4 Mesh NoC.

Though we give a simple example showing that our routing choice is more than OE-routing, only in a 4x4 mesh NoC and only one packet sent. However, if the topology size is larger, the more advantage we can take.

3.2.2 Bidirectional Routing

Bidirectional routing (BI-Routing) is a minimal adaptive routing algorithm. As mentioned in Subsection 3.1.2, BI-Routing is deadlock-free without prohibiting any path. We develop the BI-Routing based on Theorem 1 brought up by Duato [19].

 Theorem 1: A connected and adaptive routing function R for an interconnection network I is deadlock-free, if there are no cycles in its channel dependency graph [19].

A channel dependency graph D for a given interconnection network I and routing function R is a directed graph, D = G(C, E). The vertices of D are the channels of I. The arcs of D are the pairs of channels ( , ) such that there is a direct dependency from to . The meaning of connected routing function is that for any packet, the connected routing function can find a path to deliver the packet to the destination.

Therefore, from Theorem 1, if we can break the cycle in a channel dependency graph, the routing algorithm is deadlock-free. Hence, three rules are brought up for our BI-Routing algorithm.

 Rule 1: Packets use reverse channel at the E-S turn and the E-N turn.

As mentioned in Subsection 3.1.2, we escape from deadlock in an BiNoC by using another layer to route, as shown in Fig. 3-3. Here, we choose E-S turn and E-N turn as a breaking position in clockwise and counter-clockwise cycles. An example is shown in Fig. 3-5, where the red lines mean a path which will lead to deadlock, and our concept is represented by blue lines which escape from the deadlock cycle by using a reverse channel in BiNoC. There are two conditions on what the direction of another channel is. Thus, we use two blue lines in the two layers to represent all these two network paths. In other words, packets use L2 or L3 to route when taking an E-S turn as shown in Fig. 3-5(a), and packets use L4 and L3 to route when taking and an E-N turn as shown in Fig. 3-5(b). Although Fig. 3-5 just shows one cycle in clockwise and counter-clockwise, any other cycle does not exist in the network because of Rule 1. However, it is oblivious that Rule 1 is not sufficient to remove deadlock. A cycle may happen after we take Rule 1, so we still need another rule.

A E-N turn in a

Fig. 3-5. Example of Rule 1 for (a) a Clockwise Cycle, and (b) a Counter-Clockwise Cycle.

We use Rule 1 to escape from the deadlock cycle. Rule 2 and Rule 3 are needed to avoid a packet connecting back to layer 1; that is, the blue line in Fig. 3-5 connects back to green line and leads to a cycle.

Rule 2. Packets from south (north) reverse channel and delivered to north (south) must use reverse channel.

An inter-layer deadlock will appear without Rule 2. Rule 2 indicates that packets should keep using a reserve channel in south or north such that an inter-layer deadlock can be removed. In the view of three-dimensional model, packets keep using L2 (L3) to route when delivered from north in the L2 (L3) layer as shown in Fig.

3-6(a), and packets keep using L4 (L3) to route when delivered form south in the L4 (L3) layer as shown in Fig. 3-6(b). An example is shown in Fig. 3-6, where blue lines represent paths obeying Rule 2, and red dotted lines represent paths violating Rule 2.

(a) (b)

Fig. 3-6. Example of Rule 2 for (a) a Clockwise Cycle, and (b) a Counter-Clockwise Cycle.

 Rule 3: Packet form reverse channel cannot take S-W or N-W turn.

The essence of Rule 3 is similar to the conventional turn model. It eliminates a turn in a higher layer. In other words, reverse channels will make up a cycle, if we do not have rule to regulate packets. In the view of three-dimensional model, packets cannot take S-W turn and N-W turn when packets are not in the L1 layer. For a more clear understanding, Fig. 3-7(a) shows an example, where the red dotted lines represent prohibiting turns in a higher layer. From Theorem 1, BI-Routing is deadlock-free because these three rules keep the channel dependency acyclic. A more detailed example is shown in Fig. 3-7(b) and a traffic condition comparison is shown in Fig. 3-7(c).

(a)

(b) (c)

Fig. 3-7. (a)Example of Rule 3, (b) Example of BI-routing, and (c) Traffic Condition Comparison.

In Fig. 3-7(b), P4 is transferred to an upper layer by Rule 1, and breaks a cycle which may be constructed from P6, P7, and P8. Another bigger cycle may be constructed from P6, P7, P9, and P3. However, Rule 1 holds it back at P3. Rule 2 plays an important role in preventing BI-Routing from an inter-layer deadlock. Notice that only reverse-south channel and reverse-north channel are in the scope of Rule 2, and packets can be routed to east freely. P2 and P1 have explained these cases. P2 in a middle layer connects back to a bottom layer, as shown in Fig. 3-7(b). Nevertheless, Rule 1 stops P1 from forming a deadlock cycle. At last, an inter-layer cycle does not exist in the top layer because of Rule 3.

Although we still need three rules to acquire our BI-Routing algorithm, we provide a fully adaptive routing algorithm. Fully adaptive routing algorithm can spread traffic load to the whole network instead of keeping some parts of network in heavy congestion. Fig. 3-7(c) shows the comparison of OE-routing and BI-Routing. In OE-routing, three packets have no other choices but to route on the same path, leading a congestion node and two minor congestion nodes. Contrarily, in our BI-Routing, those three packets have much more choices and can be delivered to at most two dimensions, thus can choose paths that increase the balance of BiNoC.

在文檔中雙向通道網路晶片之模塑 (頁 33-0)