Distributed Switched Networks - Interconnection Networks

Switched-media networks provide a very flexible framework to design communi-cation subsystems external to the devices that need to communicate, as presented above. However, there are cases where it is convenient to more tightly integrate the end node devices with the network resources used to enable them to commu-nicate. Instead of centralizing the switch fabric in an external subsystem, an alter-native approach is to distribute the network switches among the end nodes, which then become network nodes or simply nodes, yielding a distributed switched net-work. As a consequence, each network switch has one or more end node devices directly connected to it, thus forming a network node. These nodes are directly connected to other nodes without indirectly going through some external switch, giving rise to another popular name for these networks—direct networks.

The topology for distributed switched networks takes on a form much different from centralized switched networks in that end nodes are connected across the area of the switch fabric, not just at one or two of the peripheral edges of the fabric. This causes the number of switches in the system to be equal to the total number of nodes. A quite obvious way of interconnecting nodes consists of connecting a ded-icated link between each node and every other node in the network. This fully con-nected topology provides the best connectivity (full connectivity in fact), but it is more costly than a crossbar network, as the following example shows.

Example Compute the cost of interconnecting N nodes using a fully connected topology relative to doing so using a crossbar topology. Consider separately the relative cost of the unidirectional links and the relative cost of the switches. Switch cost is assumed to grow quadratically with the number of unidirectional ports for k × k switches but to grow only linearly with 1 × k switches.

Answer The crossbar topology requires an N × N switch, so the switch cost is propor-tional to N². The link cost is 2N, which accounts for the unidirectional links from the end nodes to the centralized crossbar, and vice versa. In the fully con-nected topology, two sets of 1 × (N − 1) switches (possibly merged into one set) are used in each of the N nodes to connect nodes directly to and from all other nodes. Thus, the total switch cost for all N nodes is proportional to 2N(N − 1).

Regarding link cost, each of the N nodes requires two unidirectional links in opposite directions between its end node device and its local switch. In addi-tion, each of the N nodes has N − 1 unidirectional links from its local switch to other switches distributed across all the other end nodes. Thus, the total number of unidirectional links is 2N + N(N − 1), which is equal to N(N + 1) for all N nodes. The relative costs of the fully connected topology with respect to the crossbar is, therefore, the following:

Relative cost_switches = 2N(N − 1) / N² = 2(N − 1) / N = 2(1 − 1/N) Relative cost_links = N(N + 1) / 2N = (N + 1)/2

As the number of interconnected devices increases, the switch cost of the fully connected topology is nearly double the crossbar, with both being very high (i.e., quadratic growth). Moreover, the fully connected topology always has higher rel-ative link cost, which grows linearly with the number of nodes. Again, keep in mind that end node links are different from switch links in their length and pack-aging, particularly for direct networks, so they usually have different associated costs. Despite its higher cost, the fully connected topology provides no extra per-formance benefits over the crossbar as both are nonblocking. Thus, crossbar net-works are usually used in practice instead of fully connected netnet-works.

A lower-cost alternative to fully connecting all nodes in the network is to directly connect nodes in sequence along a ring topology, as shown in Figure F.13.

For bidirectional rings, each of the N nodes now uses only 3 × 3 switches and just two bidirectional network links (shared by neighboring nodes), for a total of N switches and N bidirectional network links. This linear cost excludes the N injec-tion-reception bidirectional links required within nodes.

Unlike shared-media networks, rings can allow many simultaneous transfers:

the first node can send to the second while the second sends to the third, and so on. However, as dedicated links do not exist between logically nonadjacent node pairs, packets must hop across intermediate nodes before arriving at their destina-tion, increasing their transport latency. For bidirectional rings, packets can be

transported in either direction, with the shortest path to the destination usually being the one selected. In this case, packets must travel N/4 network switch hops, on average, with total switch hop count being one more to account for the local switch at the packet source node. Along the way, packets may block on network resources due to other packets contending for the same resources simultaneously.

Fully connected and ring-connected networks delimit the two extremes of distributed switched topologies, but there are many points of interest in between for a given set of cost-performance requirements. Generally speaking, the ideal switched-media topology has cost approaching that of a ring but performance approaching that of a fully connected topology. Figure F.14 illustrates three pop-ular direct network topologies commonly used in systems spanning the cost-performance spectrum. All of them consist of sets of nodes arranged along multi-ple dimensions with a regular interconnection pattern among nodes that can be expressed mathematically. In the mesh or grid topology, all the nodes in each dimension form a linear array. In the torus topology, all the nodes in each dimen-sion form a ring. Both of these topologies provide direct communication to neighboring nodes with the aim of reducing the number of hops suffered by pack-ets in the network with respect to the ring. This is achieved by providing greater connectivity through additional dimensions, typically no more than three in com-mercial systems. The hypercube or n-cube topology is a particular case of the mesh in which only two nodes are interconnected along each dimension, leading to a number of dimensions, n, that must be large enough to interconnect all N nodes in the system (i.e., n = log₂ N). The hypercube provides better connectivity than meshes and tori at the expense of higher link and switch costs, in terms of the number of links and number of ports per node.

Example Compute the cost of interconnecting N devices using a torus topology relative to doing so using a fat tree topology. Consider separately the relative cost of the bidirectional links and the relative cost of the switches—which is assumed to grow quadratically with the number of bidirectional ports. Provide an approxi-mate expression for the case of switches being similar in size.

Answer Using k × k switches, the fat tree requires 2N/k (logk/2 N) switches, assuming the last stage (the root) has the same number of switches as each of the other stages.

Figure F.13 A ring network topology, folded to reduce the length of the longest link. Shaded circles represent switches, and black squares represent end node devices.

The gray rectangle signifies a network node consisting of a switch, a device, and its con-necting link.

Given that the number of bidirectional ports in each switch is k (i.e., there are k input ports and k output ports for a k × k switch) and that the switch cost grows quadratically with this, total network switch cost is proportional to 2kN log_k/2 N.

The link cost is N log_k/2 N as each of the log_k/2 N stages requires N bidirectional links, including those between the devices and the fat tree. The torus requires as many switches as nodes, each of them having 2n + 1 bidirectional ports, includ-ing the port to attach the communicatinclud-ing device, where n is the number of dimen-sions. Hence, total switch cost for the torus is (2n + 1)²N. Each of the torus nodes requires 2n + 1 bidirectional links for the n different dimensions and the connec-tion for its end node device, but as the dimensional links are shared by two nodes, the total number of links is (2n/2 + 1)N = (n + 1)N bidirectional links for all N nodes. Thus, the relative costs of the torus topology with respect to the fat tree are

Relative cost_switches = (2n + 1)²N / 2kN log_k/2 N = (2n + 1)² / 2k log_k/2 N Relative cost_links = (n + 1)N / N log_k/2 N = (n + 1) / log_k/2 N

Figure F.14 Direct network topologies that have appeared in commercial systems, mostly supercomputers. The shaded circles represent switches, and the black squares represent end node devices. Switches have many bidirectional network links, but at least one link goes to the end node device. These basic topologies can be supple-mented with extra links to improve performance and reliability. For example, connect-ing the switches on the periphery of the 2D mesh, shown in (a), usconnect-ing the unused ports on each switch forms a 2D torus, shown in (b). The hypercube topology, shown in (c) is an n-dimensional interconnect for 2ⁿ nodes, requiring n + 1 ports per switch: one for the n nearest neighbor nodes and one for the end node device.

(a) 2D grid or mesh of 16 nodes (b) 2D torus of 16 nodes

When switch sizes are similar, 2n + 1 ≅ k. In this case, the relative cost is Relative cost_switches = (2n + 1)² / 2k log_k/2 N = (2n + 1)/ 2log_k/2 N = k / 2log_k/2 N When the number of switch ports (also called switch degree) is small, tori have lower cost, particularly when the number of dimensions is low. This is an espe-cially useful property when N is large. On the other hand, when larger switches and/or a high number of tori dimensions are used, fat trees are less costly and preferable. For example, when interconnecting 256 nodes, a fat tree is four times more expensive in terms of switch and link costs when 4 × 4 switches are used.

This higher cost is compensated for by lower network contention, on average.

The fat tree is comparable in cost to the torus when 8 × 8 switches are used (e.g., for interconnecting 256 nodes). For larger switch sizes beyond this, the torus costs more than the fat tree as each node includes a switch. This cost can be amortized by connecting multiple end node devices per switch, called bristling.

The topologies depicted in Figure F.14 all have in common the interesting characteristic of having their network links arranged in several orthogonal dimen-sions in a regular way. In fact, these topologies all happen to be particular instances of a larger class of direct network topologies known as k-ary n-cubes, where k signifies the number of nodes interconnected in each of the n dimen-sions. The symmetry and regularity of these topologies simplify network imple-mentation (i.e, packaging) and packet routing as the movement of a packet along a given network dimension does not modify the number of remaining hops in any other dimension toward its destination. As we will see in the next section, this topological property can be readily exploited by simple routing algorithms.

Like their indirect counterpart, direct networks can introduce blocking among packets that concurrently request the same path, or part of it. The only exception is fully connected networks. The same way that the number of stages and switch hops in indirect networks can be reduced by using larger switches, the hop count in direct networks can likewise be reduced by increasing the number of topologi-cal dimensions via increased switch degree.

It may seem to be a good idea always to maximize the number of dimensions for a system of a certain size and switch cost. However, this is not necessarily the case. Most electronic systems are built within our three-dimensional (3D) world using planar (2D) packaging technology such as integrated circuit chips, printed circuit boards, and backplanes. Direct networks with up to three dimensions can be implemented using relatively short links within this 3D space, independent of system size. Links in higher-dimensioned networks would require increasingly longer wires or fiber. This increase in link length with system size is also indica-tive of MINs, including fat trees, which require either long links within all the stages or increasingly longer links as more stages are added. As we saw in the first example given in Section F.2, flow-controlled buffers increase in size proportionally to link length, thus requiring greater silicon area. This is among the reasons why the supercomputer with the largest number of compute nodes existing in 2005, the IBM Blue Gene/L, implemented a 3D torus network for

interprocessor communication. A fat tree would have required much longer links, rendering a 64K node system less feasible. This highlights the importance of cor-rectly selecting the proper network topology that meets system requirements.

Besides link length, other constraints derived from implementing the topol-ogy may also limit the degree to which a topoltopol-ogy can scale. These are available pin-out and achievable bisection bandwidth. Pin count is a local restriction on the bandwidth of a chip, printed circuit board, and backplane (or chassis) connector.

In a direct network that integrates processor cores and switches on a single chip or multichip module, pin bandwidth is used both for interfacing with main mem-ory and for implementing node links. In this case, limited pin count could reduce the number of switch ports or bit lines per link. In an indirect network, switches are implemented separately from processor cores, allowing most of the pins to be dedicated to communication bandwidth. However, as switches are grouped onto boards, the aggregate of all input-output links of the switch fabric on a board for a given topology must not exceed the board connector pin-outs.

The bisection bandwidth is a more global restriction that gives the intercon-nect density and bandwidth that can be achieved by a given implementation (packaging) technology. Interconnect density and clock frequency are related to each other: When wires are packed closer together, crosstalk and parasitic capac-itance increase, which usually impose a lower clock frequency. For example, the availability and spacing of metal layers limit wire density and frequency of on-chip networks, and copper track density limits wire density and frequency on a printed circuit board. To be implementable, the topology of a network must not exceed the available bisection bandwidth of the implementation technology.

Most networks implemented to date are constrained more so by pin-out limita-tions rather than bisection bandwidth, particularly with the recent move to blade-based systems. Nevertheless, bisection bandwidth largely affects performance.

For a given topology, bisection bandwidth, BW_Bisection, is calculated by divid-ing the network into two roughly equal parts—each with half the nodes—and summing the bandwidth of the links crossing the imaginary dividing line. For nonsymmetric topologies, bisection bandwidth is the smallest of all pairs of equal-sized divisions of the network. For a fully connected network, the bisection bandwidth is proportional to N²/ 2 unidirectional links (or N²/ 4 bidirectional links), where N is the number of nodes. For a bus, bisection bandwidth is the bandwidth of just the one shared half-duplex link. For other topologies, values lie in between these two extremes. Network injection and reception bisection band-width is commonly used as a reference value, which is N/2 for a network with N injection and reception links, respectively. Any network topology that provides this bisection bandwidth is said to have full bisection bandwidth.

Figure F.15 summarizes the number of switches and links required, the corre-sponding switch size, the maximum and average switch hop distances between nodes, and the bisection bandwidth in terms of links for several topologies dis-cussed in this section for interconnecting 64 nodes.

在文檔中 Interconnection Networks (頁 35-41)