Chapter 2 Related Works and Problem Description9
2.3 Related Works of Reactive Routing for Thermal-Aware 3D NoC 12
To prevent the packet congested because of throttled tiles, we need reactive routing to detour throttled routers or prevent throttled paths. A downward routing algorithm [22] is proposed to migrate the horizontal routing to bottom layer. Besides, a traffic-aware downward level selection scheme is proposed to prevent network saturation. It compares the features of different downward levels. It also specifies the spatial thermal distribution in the 3D NoC system is non-uniform, while the traffic load is balanced. In order to balance the spatial thermal distribution, downward routing provides different downward levels for balancing thermal distribution, and the maximum network throughput improvement is shown under normal thermal limit.
The reactive routing is simply the extension of Downward Routing. While the Vertical Throttling guarantees the layer close to the heat sink always available, the reactive routing detours packets from throttled routers by different downward levels.
In the worse case, all the packets are transported at the layer close to the heat sink, as shown in Fig. 2-3.
Fig. 2-3 Downward routing is applied to detour throttled routers. [22]
13
2.4 Problem of Data Delivery in Non-Stationary Irregular Mesh 3D NoC
Downward routing can detour throttled routers successfully, and guarantee the routing path to destination router. However, if we use downward routing vertically to detour throttled router, all packet will congest around throttled router and the bottom layer, and the traffic become unbalance and performance will degrade soon. This is not algorithm we think as high performance based on throttling. We should consider reasons for packets being blocked in 3D NoC, and know how to solve them by our proposed algorithm.
Before describing reasons for fail delivery, we know NoC can be divided to five layers [33]: Application layer, Transport layer, Network layer, Data link layer and Physic layer, shown in Fig. 2-4
Fig. 2-4 NoC composed by five layers.
14
And we can define five layers as:
Application layer: Network architectures and control algorithms constitute the infrastructure and provide communication services to the end nodes, which are programmable in most cases.
Transport layer: Atop the network layer, the transport layer decomposes messages into packets at the source. It also resequences and reassembles the messages at the destination. Packetization granularity presents a critical design decision because most network-control algorithms are highly sensitive to packet size.
Network layer: This layer implements end-to-end delivery control in network architectures with many communication channels.
Data link layer: Data-link protocols increase the reliability of the link, up to a minimum required level, under the assumption that the physical layer by itself is not sufficiently reliable.
Physical layer: The physical layer is an unreliable digital link in which the probability of bit upsets is non-null. And it composed the basic physical connection between any nodes.
We know the routing and throttling occur in network layer. In [22], when the temperature of router surpasses the thermal limit, we should trigger throttling to prevent overheat, which makes system unstable or break. To efficiently cool down overheated tiles, the Vertical Throttling in [22] shuts down the overheat tile and the tiles below it, except at the bottom layer. However, when triggering throttling (no
15
matter single or pillar routers), the performance degrade. Since the throttled tiles are unavailable, network packets cannot directly go through but take a turn to detour them. Consequently, routers neighboring to the throttled tiles become more congested, and more seriously, the throttled routers make the packet in network layer with no routing paths to destination.
To ensure the success of packet delivery in a NSI-mesh network, we should prevent the occurrence of all the following four cases:
(i). Source router is not serving
(ii). Destination router is not serving
(iii).Any router on selected path is not serving
(iv).Any required channel on the selected path is occupied (Head-of-
Line, HoL blocking)
The first one, as shown in Fig. 2-5(a), the source router is fully throttled. The second one, as shown in Fig. 2-5(b), the destination router is fully throttled. The third case, as shown in Fig. 2-5(c), some of the router on the routing path is fully throttled.
The last one is shown in Fig. 2-5(d), where the channels on the routing path are blocked by other blocked packets, and we take vertical dimension as example. (iv) is also emerges in the horizontal dimension, and we call (iv) as Head of Line Blocking (HoL).
16
Fig. 2-5 Problem of usual reactive routing: (a) Source router is not serving. (b) Destination router is not serving. (c) Any router on selected path is not serving. (d) Head of line Blocking.
We know routing emerges in network layer, and the congestion also emerges in network layer. We should try to solve the problems (i)-(iv) by considering applying layers of NoC. We know about throttling influences performance and should minimize the influence of throttling, but we cannot solve the problem caused by run-time thermal management only by network layer. We should consider other layers to help solve our problem. For problem (iv), we can solve in flow control layer (data link layer), like virtual channel or increasing buffer or link, to prevent packet block by other packet. And we know if we solve other three problems, we can eliminate this HoL blocking problem. If we eliminate other three problems, HoL blocking will sometimes emerge, and it block for up to hundreds of cycles, which is different to 107 cycles caused by run-time thermal management. If we block for 107 cycles, it will decrease our performance a lot. (i) to (iii) is our consideration, because it influences our performance more than (iv).
17
For application layer, it consider all the system and algorithm what it should execute. It does not need to consider these detail problems, so we exclude it. For physical layer, it is too detailed for use to consider this problem, so we still exclude it.
For transport layer, we can solve tow problems (i) and (ii). For transport layer, if we have source router and destination router status, we can understand source or destination router is serving or not. However, we can determine transmit packet to network layer or not in transport layer. We can eliminate these two problems in transport layer.
For the rest problem (iii), we know throttling problem emerge in network layer, so we consider solving this problem in network layer previously. Nevertheless, we cannot solve it only in network layer. Because we cannot predict next router routing path with no throttled router, we cannot guarantee that the routing path has no throttled routers. Additionally, we cannot see all NoC buffer and router status to determine which routing path is routable, because it is source routing, and it differs from our constraints and goal. We can take Fig. 2-6 as an example.
Fig. 2-6 (a) (iii) situation cannot guarantee routing path which router choose is routable. (b) Block other packet, which is same as (iv).
18
In Fig. 2-6(a), from source router, we know the router in east is not throttled, so we may go eastern router. But after we arrive at eastern router, we only have northern router to route, and it is throttled. So packet is blocked by throttled router, and it will be block for 107 cycles, and the other packet is blocked by this packet. As shown in Fig. 2-6(b), the packet blocked by throttled routers will block other packets, and the congestion tree will grow soon to whole network.
To completely remove the case of (iii), we have to jointly consider the available information of the network layer and transport layer. Here we choose the style of distributed routing instead of source routing for performance consideration. Although traditional source routing can be applied in this scheme, the computation overhead of source routing for optimizing performance of NSI-mesh is too high. Besides source routing cannot balance the loading of the network by adapting the network information as adaptive routing. If the topology of the network is far from regular mesh, it would be difficult to use source routing. The small changing interval and large range of inactive number characteristics of throttled NSI-mesh make conventional routing algorithms infeasible. The routing algorithms for irregular-mesh [34] are not feasible owing to the non-stationary characteristics. Besides, the regulations of the location of oversized-IP make the conventional algorithms infeasible because throttling may be required for all the upper layer routers. Moreover, the offline optimization effort for routing in irregular-mesh is not affordable for the online computation of the throttled NSI-mesh. The fault-tolerant routing algorithms those detour packets from faulty routers could be candidates. However, the characteristics of faulty NSI-mesh and throttled NSI-mesh are very different. The
19
number of faulty router is non-decreasing but usually small. Besides, the interval of topology transformation in faulty NSI-mesh is much longer and unpredictable. The topology changing of faulty NSI-mesh is occurred after detection, testing, and reconfiguration of the system. Usually the latter two operations are even done in the reboot sequence, which makes the problem going back to traditional offline irregular-mesh. Similarly the regulations of the location of faulty routers make the conventional fault-tolerant routing algorithms infeasible because throttling may be required for all the upper layer routers.
If we use transport layer to solve (iii), it still not work. If we know status of source and destination router, we still cannot guarantee the packet can route to destination router successfully. When a packet inject form transport layer to network layer, we think it can transmit to destination router in previous knowledge, excluding congestion or head–of–line blocking. It can work in normal NoC, but it fail in throttled NoC.
We can conclude throttling problem emerge in network layer, but we cannot solve (iii) only in network layer or transport layer. If we cannot solve (iii) situation, the whole network will stop for a period of time. Therefore, we should combine these two layers to solve the (iii). See in Fig. 2-7.
Fig. 2-7 Transport layer and network layer operation.
20