• 沒有找到結果。

Chapter 1   Introduction

1.2  Thesis Organization

The rest of this thesis is organized as follows: Chapter 2 introduces related works and introduces problems and solution of NSI-mesh. Chapter 3 presents proposed Transport-Layer Assisted Routing algorithm. Chapter 4 shows the experimental results of proposed techniques. Chapter 5 introduces proposed network interface and router architecture design, and compares area of synthesis results with proposed memory reduction techniques. Chapter 6, we conclude the work of this thesis and points out other potential future research direction.

9

Chapter 2

Related Works and Problem Description

The related and prior works about 3D thermal management and reactive routing for thermal-aware 3D NoC are introduced in this chapter. In order to utilize the interconnection advantage of 3D NoC, the related architecture designs are referred as our basic design. We evaluate the prior works of the thermal management and reactive routing to analyze the pros and cons of them. Finally, we discover problem and provide solution to solve these problem.

2.1 Run-Time Thermal Management for 3D NoC System

Since the thermal issue becomes more serious in 3D chips, Chao et al. proposed a thermal management for 3D NoC in [22], which is based on ThermalHerd [19]. The following techniques are proposed to control temperature:

Considering the heat flow in the 3D ICs, the author proposed a throttling scheme that shut down the routers along a vertical pillar, called the Vertical Throttling.

Because we want to cool efficiently, we should throttle a vertical path to let heat conduct to heat sink, which is like a cooling channel. When throttling the routers, system performance greatly degrades. To prevent the performance degradation, the Vertical Throttling determinates different throttling ratio based on the temperature of routers, as shown in Fig. 2-1. The experimental results show that the Vertical

10

Throttling provides an efficient way to cool the NoC, comparing to the conventional distributed throttling [19].

Fig. 2-1 Vertical throttling at different emergency level.[22]

2.2 Definition of Non-Stationary Irregular Mesh 3D NoC

Before describing the detail of problem of related work, we define a new term:

Non-stationary irregular mesh (NSI-mesh). First, irregular mesh means that there are routers or links removed in the network, which means the topology is not regular. We define the time-varying irregular-mesh topology as a Non-Stationary Irregular Mesh (NSI-mesh). A mesh-based thermal-aware 3D NoC adopting vertical throttling can be categorized in NSI-mesh topology. In NSI-mesh, the distributed thermal sensors detect temperature and trigger throttling. Throttling will cause topology changing during the online operation. The topology changing in thermal-aware 3D NoC results in the problem of packet delivery. For faster cooling, the traffic quota of the near-overheat router has to be very small and even zero. When the router reaches the limit of quota, it has to fully block the input packets. This behavior results in that the topology becomes a Non-Stationary Irregular Mesh (NSI-mesh), as shown in Fig. 2-2.

11

Although we do not know which router will throttled or not, but we know the present topology change by 107 cycles, and is different form fault tolerant NoC, which change topology by cycle.

Fig. 2-2 An example of Non-Stationary Irregular Mesh (NSI-Mesh) network. The topology changes in online operation because of the throttling in RTM.

The key characteristics of the NSI-mesh of Thermal-Aware Vertical Throttling (TAVT) based 3D NoC are: (i) if a router is throttled, all the routers above it are throttled, and (ii) if a router is not throttled, all the routers below it are not throttled.

As shown in Fig. 2-2. , the topology in offline stage and starting time of online stage is a traditional mesh. As temperature arises, near-overheat routers will be throttled, and the topology turns into irregular-mesh. Owing to the accuracy of temperature sensing, we can view that the network changes its topology at ms interval, which means 10 or 100 of millions cycles (107~108 cycles) if the NoC is operating at 1GHz.

12

2.3 Related Works of Reactive Routing for Thermal-Aware 3D NoC

To prevent the packet congested because of throttled tiles, we need reactive routing to detour throttled routers or prevent throttled paths. A downward routing algorithm [22] is proposed to migrate the horizontal routing to bottom layer. Besides, a traffic-aware downward level selection scheme is proposed to prevent network saturation. It compares the features of different downward levels. It also specifies the spatial thermal distribution in the 3D NoC system is non-uniform, while the traffic load is balanced. In order to balance the spatial thermal distribution, downward routing provides different downward levels for balancing thermal distribution, and the maximum network throughput improvement is shown under normal thermal limit.

The reactive routing is simply the extension of Downward Routing. While the Vertical Throttling guarantees the layer close to the heat sink always available, the reactive routing detours packets from throttled routers by different downward levels.

In the worse case, all the packets are transported at the layer close to the heat sink, as shown in Fig. 2-3.

Fig. 2-3 Downward routing is applied to detour throttled routers. [22]

13

2.4 Problem of Data Delivery in Non-Stationary Irregular Mesh 3D NoC

Downward routing can detour throttled routers successfully, and guarantee the routing path to destination router. However, if we use downward routing vertically to detour throttled router, all packet will congest around throttled router and the bottom layer, and the traffic become unbalance and performance will degrade soon. This is not algorithm we think as high performance based on throttling. We should consider reasons for packets being blocked in 3D NoC, and know how to solve them by our proposed algorithm.

Before describing reasons for fail delivery, we know NoC can be divided to five layers [33]: Application layer, Transport layer, Network layer, Data link layer and Physic layer, shown in Fig. 2-4

Fig. 2-4 NoC composed by five layers.

14

And we can define five layers as:

 Application layer: Network architectures and control algorithms constitute the infrastructure and provide communication services to the end nodes, which are programmable in most cases.

 Transport layer: Atop the network layer, the transport layer decomposes messages into packets at the source. It also resequences and reassembles the messages at the destination. Packetization granularity presents a critical design decision because most network-control algorithms are highly sensitive to packet size.

 Network layer: This layer implements end-to-end delivery control in network architectures with many communication channels.

 Data link layer: Data-link protocols increase the reliability of the link, up to a minimum required level, under the assumption that the physical layer by itself is not sufficiently reliable.

 Physical layer: The physical layer is an unreliable digital link in which the probability of bit upsets is non-null. And it composed the basic physical connection between any nodes.

We know the routing and throttling occur in network layer. In [22], when the temperature of router surpasses the thermal limit, we should trigger throttling to prevent overheat, which makes system unstable or break. To efficiently cool down overheated tiles, the Vertical Throttling in [22] shuts down the overheat tile and the tiles below it, except at the bottom layer. However, when triggering throttling (no

15

matter single or pillar routers), the performance degrade. Since the throttled tiles are unavailable, network packets cannot directly go through but take a turn to detour them. Consequently, routers neighboring to the throttled tiles become more congested, and more seriously, the throttled routers make the packet in network layer with no routing paths to destination.

To ensure the success of packet delivery in a NSI-mesh network, we should prevent the occurrence of all the following four cases:

(i). Source router is not serving

(ii). Destination router is not serving

(iii).Any router on selected path is not serving

(iv).Any required channel on the selected path is occupied (Head-of-

Line, HoL blocking)

The first one, as shown in Fig. 2-5(a), the source router is fully throttled. The second one, as shown in Fig. 2-5(b), the destination router is fully throttled. The third case, as shown in Fig. 2-5(c), some of the router on the routing path is fully throttled.

The last one is shown in Fig. 2-5(d), where the channels on the routing path are blocked by other blocked packets, and we take vertical dimension as example. (iv) is also emerges in the horizontal dimension, and we call (iv) as Head of Line Blocking (HoL).

16

Fig. 2-5 Problem of usual reactive routing: (a) Source router is not serving. (b) Destination router is not serving. (c) Any router on selected path is not serving. (d) Head of line Blocking.

We know routing emerges in network layer, and the congestion also emerges in network layer. We should try to solve the problems (i)-(iv) by considering applying layers of NoC. We know about throttling influences performance and should minimize the influence of throttling, but we cannot solve the problem caused by run-time thermal management only by network layer. We should consider other layers to help solve our problem. For problem (iv), we can solve in flow control layer (data link layer), like virtual channel or increasing buffer or link, to prevent packet block by other packet. And we know if we solve other three problems, we can eliminate this HoL blocking problem. If we eliminate other three problems, HoL blocking will sometimes emerge, and it block for up to hundreds of cycles, which is different to 107 cycles caused by run-time thermal management. If we block for 107 cycles, it will decrease our performance a lot. (i) to (iii) is our consideration, because it influences our performance more than (iv).

17

For application layer, it consider all the system and algorithm what it should execute. It does not need to consider these detail problems, so we exclude it. For physical layer, it is too detailed for use to consider this problem, so we still exclude it.

For transport layer, we can solve tow problems (i) and (ii). For transport layer, if we have source router and destination router status, we can understand source or destination router is serving or not. However, we can determine transmit packet to network layer or not in transport layer. We can eliminate these two problems in transport layer.

For the rest problem (iii), we know throttling problem emerge in network layer, so we consider solving this problem in network layer previously. Nevertheless, we cannot solve it only in network layer. Because we cannot predict next router routing path with no throttled router, we cannot guarantee that the routing path has no throttled routers. Additionally, we cannot see all NoC buffer and router status to determine which routing path is routable, because it is source routing, and it differs from our constraints and goal. We can take Fig. 2-6 as an example.

Fig. 2-6 (a) (iii) situation cannot guarantee routing path which router choose is routable. (b) Block other packet, which is same as (iv).

18

In Fig. 2-6(a), from source router, we know the router in east is not throttled, so we may go eastern router. But after we arrive at eastern router, we only have northern router to route, and it is throttled. So packet is blocked by throttled router, and it will be block for 107 cycles, and the other packet is blocked by this packet. As shown in Fig. 2-6(b), the packet blocked by throttled routers will block other packets, and the congestion tree will grow soon to whole network.

To completely remove the case of (iii), we have to jointly consider the available information of the network layer and transport layer. Here we choose the style of distributed routing instead of source routing for performance consideration. Although traditional source routing can be applied in this scheme, the computation overhead of source routing for optimizing performance of NSI-mesh is too high. Besides source routing cannot balance the loading of the network by adapting the network information as adaptive routing. If the topology of the network is far from regular mesh, it would be difficult to use source routing. The small changing interval and large range of inactive number characteristics of throttled NSI-mesh make conventional routing algorithms infeasible. The routing algorithms for irregular-mesh [34] are not feasible owing to the non-stationary characteristics. Besides, the regulations of the location of oversized-IP make the conventional algorithms infeasible because throttling may be required for all the upper layer routers. Moreover, the offline optimization effort for routing in irregular-mesh is not affordable for the online computation of the throttled NSI-mesh. The fault-tolerant routing algorithms those detour packets from faulty routers could be candidates. However, the characteristics of faulty NSI-mesh and throttled NSI-mesh are very different. The

19

number of faulty router is non-decreasing but usually small. Besides, the interval of topology transformation in faulty NSI-mesh is much longer and unpredictable. The topology changing of faulty NSI-mesh is occurred after detection, testing, and reconfiguration of the system. Usually the latter two operations are even done in the reboot sequence, which makes the problem going back to traditional offline irregular-mesh. Similarly the regulations of the location of faulty routers make the conventional fault-tolerant routing algorithms infeasible because throttling may be required for all the upper layer routers.

If we use transport layer to solve (iii), it still not work. If we know status of source and destination router, we still cannot guarantee the packet can route to destination router successfully. When a packet inject form transport layer to network layer, we think it can transmit to destination router in previous knowledge, excluding congestion or head–of–line blocking. It can work in normal NoC, but it fail in throttled NoC.

We can conclude throttling problem emerge in network layer, but we cannot solve (iii) only in network layer or transport layer. If we cannot solve (iii) situation, the whole network will stop for a period of time. Therefore, we should combine these two layers to solve the (iii). See in Fig. 2-7.

Fig. 2-7 Transport layer and network layer operation.

20

2.5 Summary

We reference related works for thermal-aware 3D NoC, and discover a new problem: Non-Stationary Irregular Mesh (NSI-Mesh). We find four problems of delivery packets, and also show how to solve them by using transport layer and network layer. Finally, we conclude that we need to joint transport layer and network layer to solve the problems caused by NSI-Mesh.

21

Chapter 3

Transport-Layer Assisted Routing

Transport-layer assisted routing is composed of the transport layer assisted routing schemes and the dual-mode routing algorithms. Transport layer shares topology information with network layer for high performance in NSI-mesh. Network layer follows the initial routing decision provided by transport layer, and tries to balance the lateral traffic loading. In this chapter, we introduce the proposed Transport-Layer Assisted Routing (TLAR) schemes and algorithms.

3.1 Operation Flow of Transport-Layer

The proposed operation flow of Transport-layer assisted routing is shown as following.

The system of 3D NoC is switching between the normal stage and the reconfiguration stage. In normal stage the 3D NoC works as usual irregular or regular mesh network. In this stage, we assume distributed thermal sensing mechanism is embedded in the network for each router to obtain its own temperature, and each router has a timer for synchronizing their operation stages. After N-cycle normal stage, the network enters the R-cycle reconfiguration stage. The reconfiguration stage means that we should prepare some management and controller, which let 3D NoC remains execution in normal work. In comparison with the cycle number in normal operation stage, the cycle number required for reconfiguration is very small. Here we

22

assume the network is operated at 1GHz. In each 10ms interval, 104 cycles is absolutely sufficient for each tile to reconfigure, and N is around 107. The reconfiguration stage only occupies 0.1% of the total available time, so the overhead of reconfiguration is negligible. If the interval is 100ms, the overhead is 0.01%, which is more negligible. The reconfiguration stage, shown in Fig. 3-1, consists of three sub-stage: (i) cleaning up and policy determination; (ii) synchronization of topology information; (iii) routing mode checking and throttling. The detail is described as following:

Fig. 3-1 Network states and operation stages in transforming topology for run-time thermal management.

(i) Cleaning up and policy determination: In order to make sure packet

transport t0 destination router successfully in next normal work, the network has to be cleaned up before topology changing. In this stage, the packetization of the payloads from transport layer to network layer is paused. As shown in Fig. 3-2, the payloads stay in the transmitter payload queue. In this stage, we should not only stop transmitting packet form transport layer to network layer,

23

but also deal with the rest packet still in network layer. It means transmitter packet queue will become empty after a small period of time. In the meanwhile, the distributed thermal-aware controller in each tile should determine the throttling of the router within the tile for the next normal stage.

The implementation of thermal-aware management can be in the transport layer controller or in the application layer as a software routine. No matter which layer the policy is determined, the application layer and transport layer share the information of control policy of this tile. The important thing for us is that the new throttling emerges for the next normal work stage, and we should guarantee no packet still in network layer is blocked in next normal work stage.

Fig. 3-2 Block diagram of transport layer in the tile of thermal-aware 3D NoC.

24

(ii) Synchronization of topology information: If we trigger throttling, we

should let every router in 3D NoC know which router is throttled and how the topology change in next normal work. In this stage, all routers have to transmit packets containing their throttling information to all their upstream and downstream routers. No matter in current normal stage the router is fully throttled or not, it is not throttled in this sub-stage. Because all the routers are not throttled in this sub-stage, the network is regular mesh in each layer. We can see topology table in Fig. 3-2 which is shared by application layer and transport layer. In this topology table, each router requires one bit for representing the state of each router in the next normal stage. If a router is fully throttled in the next normal stage, the corresponding bit will be toggled to inactive. Otherwise the bit will be active. Then the information of topology is synchronized to each tile. The technology of transmitting throttling information is not our consideration. Because the throttling is triggered by 10ms, the transmission of throttling information is just up to hundreds of cycles, and it is just 0.1% of 10ms. We have 99.99% time of normal work, and we collect correct throttling information and make correct routing selection. We can see in Fig. 3-3, so the transmission of throttling information is not our problem in NSI-mesh.

25

Fig. 3-3 Required time of transmitting throttling information.

(iii) Decisions of routing mode and throttling: In this stage, the throttling of

router is applied now. If all routers in 3D NoC are not throttled, the routing is just like in regular mesh. But when throttling is trigger, we need determine routing mode for transmission toward each destination router in the transport layer. If the source router is throttled, the payload will stay in the transmitter payload queue. If the source router is not throttled, we should execute transport-layer assisted routing (TLAR) to check all routing mode of destination router for ensuring no packet is blocked by run-time thermal management. After executing TLAR, network goes back to the normal stage, and the packet injection continues for the tile where the router is not throttled.

相關文件