Chapter 3 Flow Control for On-Chip Interconnection Network
3.3 Buffered Flow Control
3.3.2 Flit-Buffer Flow Control
Cut-through routing is assumed to occur at the flit level with the routing information contained in 1 flit. This model assumed that there is no time penalty for cutting through a router if the output buffer and output channel are free. Depending on the speed of operation of the routers, this may not be realistic. Note that the header experiences routing delay, as well as the switching delay and wire delay at each router.
This is because the transmission is pipelined and the switched is buffered at the input and output. Once the header flit reaches the destination, the cycle time of this message pipeline is determined by the maximum of the switching delay and wire delay between routers. If the switch had been buffered only at the input, then in one cycle of operation, a flit traverses the switch and channel between routers. In this case, the coefficient of the second term and the pipeline cycle time would be . Note that the unit of message flow control is a packet. Therefore, even though the message may cut through the router, the sufficient buffer space must be allocated for a complete packet in case the header is blocked.
3.3.2 Flit-Buffer Flow Control
3.3.2.1 Wormhole Flow Control
The need to buffer complete packets within a router can make it difficult to construct small, compact, and fast router. Wormhole flow control operates like cut-through, but with channel and buffers allocated to flits rather than packets. In wormhole switching, the buffer requirements within the routers are substantially reduced over the requirement for VCT switching. The primary difference between wormhole switching and VCT switching is that, in the former, the unit of message flow control is a single flit and, as a consequence, small buffers can be used.
virtual channel. In contrast, cut-through flow control requires several packets of buffer space, which is typically at least an order of magnitude more storage than wormhole flow control. This savings in buffer space, however, comes at the expense of some throughput, since wormhole flow control may block a channel mid-packet. Blocking may occur with wormhole flow control because the channel is owned by a packet, but buffers are allocated on a flit-by-flit basis.
The base latency of a wormhole-switched message can be computed as follows:
This expression assumes flit buffers at the router inputs and outputs. Note in the absence of contention, VCT and wormhole switching have the same latency. Once the header flit arrives at the destination, the message pipeline cycle time is determined by the maximum of the switch delay and wire delay. For an input-only and output-only buffered switch, this cycle time would be given by the sum of the switch and wire delays.
Figure 3.5 Time-space diagram of a wormhole-switched message [3.1]
3.3.2.2 Virtual Channel Flow Control
The preceding switching techniques were described assuming that messages or parts of messages were buffered at the input and output of each physical channel.
Buffers are commonly operated as FIFO queues. Therefore, once a message occupies a buffer for a channel, no other message can access the physical channel, even if the message is blocked. Alternatively, a physical channel may support several logical or
virtual channels multiplexed across physical channel. Each unidirectional virtual channel is realized by an independently managed pair of message buffers as illustrated in Figure 3.6. Consider wormhole switching with a message in each virtual channel [3.3]. Each message can share the physical channel on a flit-by-flit basis. The physical channel protocol must be able to distinguish between the virtual channels using the physical channel. Logically, each virtual channel operates as if each were using a distinct physical channel operating half the speed. Virtual channel were originally introduced to solve the problem of deadlock in wormhole-switched networks.
Deadlock is a network state where no messages can advance because each message requires a channel occupied by another message. By allowing messages to share a physical channel, messages can make progress rather than remain blocked. Virtual channels can also be used to improve message latency and network throughout.
Virtual-channel flow control decouples the allocation of channel state from channel bandwidth. This decoupling prevents a packet that acquires channel state and then blocks from holding channel bandwidth idle. This permits virtual-channel flow control to achieve substantially higher throughput than wormhole flow control.
As in wormhole flow control, an arriving head flit must allocate a virtual channel, a downstream flit buffer, and channel bandwidth to advance. Subsequent body flits from the packet use the virtual channel allocated by the header and still must allocate a flit buffer and channel bandwidth. However, unlike wormhole flow control, these flits are not guaranteed access to channel bandwidth because other virtual channels may be competing to transmit flits of their packets across the same link.
In fact, given the same total amount of buffer space, virtual-channel flow control also outperforms cut-through flow control because it is more efficient to allocate buffer space as multiple short virtual-channel flit buffers than as a single large cut-through packet buffer.
We can envision continuing to add virtual channels to further reduce the blocking experienced by each message. The result is increased network throughput measured in flits/s, due to increased physical channel utilization. However, each additional virtual
overshadow the reduction in latency due to blocking, leading to overall increasing average message latency.
Figure 3.6 Virtual channels [3.1]