Three types of networks were developed for on-chip communications [8]. A point-to-point communication network is shown in Fig. 2.1 which is constructed using a dedicated channel between the source and the destination. Without sharing channel with other com-munication traffic, this network has minimum run-time uncertainty, but it requires large silicon area due to the large amount of communication paths. These communication paths
1 PE
2 PE
3 PE 4
PE
Figure 2.1: A point-to-point network.
9
PE 1
PE 3
PE 6
Bus
PE 4 Bus
PE 2
PE 5
Figure 2.2: A bus-based network.
PE 1
PE 4
PE 3
PE 6 PE
2
PE 5
Switched Network
Figure 2.3: A switch-based network.
11
also need to be recognized at design time. Hence, this communication network is often employed in application-specific designs. Fig. 2.2 shows a bus-based communication net-work which is often used in IP-reused designs. It is a centralized netnet-work, and needs an arbitration mechanism to decide which processing element (PE) can use the bus. Such a centralized network will become a communication bottleneck as the number of PEs in-creases. Since on-chip communication is more and more complex, and traffic is heavier in a SoC design, the switch-based network is hence developed. A conceptual sketch of the switch-based network is shown in Fig. 2.3. This network is decentralized and concurrent, and hence its energy does not waste on meaningless signal transitions.
Circuit switching and packet switching are mostly used for network communications.
The connectionless approach and connection-oriented approach are usually employed in packet switched networks. The store-and-forward switching, the virtual cut-through switching, and the wormhole switching are the connectionless approaches. The worm-hole switching is suitable for on-chip communications due to good average latency and low memory usage, but it has unpredictable latency under heavy traffic. The connection-oriented approach employs the circuit switched concept in the packet switched network, and it is hence called as virtual-circuit switching.
For Network-on-Chip (NoC), circuit switching and wormhole switching are widely used for on-chip communications [10]. To achieve high resource utilization and perfor-mance guarantee, the hybrid method combining circuit switching and wormhole switching
were proposed [11]. However, some applications need both high resource utilization and performance guarantee in each path. In this work, an NoC generator is developed based on the virtual-circuit switching typically used in computer networks to achieve high resource utilization and performance guarantee.
The rest of this chapter is organized as follows. Section 2.1 introduces the network switching. Section 2.2 introduces the developed switch architecture, and the NoC plat-form design. In Section 2.3, the communication-aware task binding methodology is de-scribed. Then, experimental results and discussions are given in Section 2.4. Finally, a summary is remarked.
2.1 Overview of Network Switching
In this section, circuit switching, connectionless packet switching, and virtual-circuit switching are briefly reviewed.
2.1.1 Circuit Switching
Circuit switching uses the dedicated resources to meet the real-time requirement. How-ever, the dedicated resources will be wasted if traffic is not continuous. Since on-chip traffic is usually a burst transaction, the circuit switching is therefore not adequate to such applications. On the other hand, it is satisfactory in real-time applications.
2.1. OVERVIEW OF NETWORK SWITCHING 13
Figure 2.4: Circuit switched network.
Time-division multiplexing (TDM) is employed in circuit switching for transmitting data through dedicated channels. As connection paths are established, the required time slots and buffers will be reserved for data transactions. Hence, the contention will not happen and the performance can be guaranteed.
Fig. 2.4 shows the conceptual plot of a circuit switching network, where T 1 to T 4 are time slots in a round. The highlighted time slots and buffers are reserved for the path indicated using the solid arrows as shown in Fig. 2.4. If the time slots are well-arranged, the latency can be reduced, but the throughput will not be improved.
BF2 BF1
BF1 BF2
BF2 BF1
BF1 BF2
T1 SW1
SW4 T4 T2
SW2
SW3 T3
Figure 2.5: Packet switched network.
2.1.2 Connectionless Packet Switching
Connectionless packet switching is widely used in data communications. This switching approach employs the shared resources to achieve high resource utilization, but it has unpredictable latency. If there is heavy traffic, the resources will be occupied, and the packet switching network cannot work well.
In connectionless packet switching, buffers are shared by all transactions. Buffers may overflow and will drop packets if the network has no handshaking schemes. If there exists handshaking scheme, a transaction will stall until the buffers of the destination are released. The conceptual sketch of a packet switched network is shown in Fig. 2.5. BF1 and BF2 Buffers in a switch are shared by all packets regardless of the source and the destination. When the switch receives a packet, it reserves a buffer for this packet, and
2.1. OVERVIEW OF NETWORK SWITCHING 15
then releases this buffer when the packet passes through to the destination. Hence, the buffers in the packet switched network can achieve high utilization.
For store-and-forward switched networks, a switch receives a complete packet, and then forwards to the destination. Hence, the switch requires to reserve sufficient buffers for this packet. For virtual cut-through switched networks, a switch needs to reserve enough buffers for a complete packet, but it can forward this packet to the destination directly without completely receiving this packet. For the wormhole switched networks, a switch can directly forward the received packet to the destination without reserving any buffers.
2.1.3 Virtual-circuit Switching
Virtual-circuit switching requires setting up a virtual connection from the source to the destination before sending packets. Fig. 2.6 shows the conceptual sketch of a virtual-circuit switched network, where virtual-virtual-circuit identifier (VCI) is introduced to specify which virtual-circuit access the physical wires. The VCI is not a global identifier; it has link local scope and is carried inside the header of the packet. As shown in Fig. 2.6, the virtual-circuit table of a switch is initially established based on routing paths, and is used to indicate the VCI (Out VCI) of the delivered packet according to this packet’s original VCI (In VCI). The packet delivered from the SW1 switch to the SW4 switch will change the VCI of the packet header from In VCI to Out VCI according to each virtual circuit
BF
Figure 2.6: Virtual-circuit switched network.
table in the path. If enough buffers and bandwidth are reserved for this path, the quality of service (QoS) can be provided.
2.2 Architecture Models and Platform Design
There are many different interconnection architectures of NoC platform. P.P. Pande et al. [12] compare the performance and characteristics of a variety of NoC architectures and also obtain comparative results for a number of common NoC topologies. In this work, several assumptions are made in the following. First, we assume that our intercon-nection architecture of the NoC platform is a mesh-based topology where the platform
2.2. ARCHITECTURE MODELS AND PLATFORM DESIGN 17 NI = Network Interface P = Processor Core B = Buffer
Figure 2.7: Mesh-based interconnection architecture of the NoC platform.
is illustrated in Fig. 2.7. Second, the platform that consists of two kinds of components:
identical processors and switches. Third, each processor contains local memory and is connected to the local switch. Fourth, each switch connects to the neighboring switches and the local processor.
Three reasons are considered for choosing the 2-D mesh topology. First, the simple connection and easy routing are preferred in parallel computing platforms [13]. Next, the uniform interconnection among the nodes makes balanced propagation delay between switches and ensures the overall scalability of the network. Finally, this topology meets the plane manufacturing topology of IC technology.
PE PE
PE PE PE PE
PE PE
RS
RS RS
RS
RS
SW
SW SW
(b) Switch−based Physical Channel
Virtual Channel
(a) Relay Station−based RS
Figure 2.8: Transformation from relay station to switch.
2.2.1 Network Switching
We propose a switch architecture based on the latency-insensitive concepts [14] [15]
and utilizes the virtual-circuit switching technique to achieve high bandwidth utilization, bandwidth guarantee and predictable latency under heavy traffic condition. Relay sta-tion(RS) is used for pipeline the long interconnect in latency-insensitive design. The topology of relay station connection is shown in the Fig. 2.8 [15]. In order to improve the low utilization of the dedicated peer-to-peer connections, the RSs are replaced by our switches and the virtual channels are substituted for the connections between RSs.
2.2. ARCHITECTURE MODELS AND PLATFORM DESIGN 19