Storage Area Networks - Interconnection Networks

System area networks were originally designed for a single room or single floor (thus their distances are tens to hundreds of meters) and were for use in MPPs and clusters. In the intervening years, the acronym SAN has been co-opted to also mean storage area networks, whereby networking technology is used to con-nect storage devices to compute servers. Today, many refer to “storage” when they say SAN. The most widely used SAN example in 2006 was Fibre Channel (FC), which comes in many varieties, including various versions of Fibre Chan-nel Arbitrated Loop (FC-AL) and Fibre ChanChan-nel Switched (FC-SW). Not only are disk arrays attached to servers via FC links, but there are even some disks with FC links attached to switches so that storage area networks can enjoy the benefits of greater bandwidth and interconnectivity of switching.

In October 2000, the InfiniBand Trade Association announced the version 1.0 specification of InfiniBand [InfiniBand Trade Association 2001]. Led by Intel, HP, IBM, Sun, and other companies, it was targeted to the high-perfor-mance computing market as a successor to the PCI bus by having point-to-point links and switches with its own set of protocols. Its characteristics are desirable potentially both for system area networks to connect clusters and for storage area networks to connect disk arrays to servers. Consequently, it has had strong com-petition from both fronts. On the storage area networking side, the chief competi-tion for InfiniBand has been the rapidly improving Ethernet technology widely used in LANs. The Internet Engineering Task Force proposed a standard called iSCSI to send SCSI commands over IP networks [Satran et al. 2001]. Given the cost advantages of the higher-volume Ethernet switches and interface cards, Gigabit Ethernet dominates the low-end and medium range for this market.

What’s more, the slow introduction of InfiniBand and its small market share delayed the development of chip sets incorporating native support for InfiniBand.

Therefore, network interface cards had to be plugged into the PCI or PCI-X bus, thus never delivering on the promise of replacing the PCI bus.

It was another I/O standard, PCI-Express, that finally replaced the PCI bus.

Like InfiniBand, PCI-Express implements a switched network but with point-to-point serial links. To its credit, it maintains software compatibility with the PCI bus, drastically simplifying migration to the new I/O interface. Moreover, PCI-Express benefited significantly from mass market production and has found application in the desktop market for connecting one or more high-end graphics cards, making gamers very happy. Every PC motherboard now implements one or more 16x PCI-Express interfaces. PCI-Express absolutely dominates the I/O interface, but the current standard does not provide support for interprocessor communication.

Yet another standard, Advanced Switching Interconnect (ASI), may emerge as a complementary technology to Express. ASI is compatible with PCI-Express, thus linking directly to current motherboards, but it also implements support for interprocessor communication as well as I/O. Its defenders believe that it will eventually replace both SANs and LANs with a unified network in the data center market, but ironically this was also said of InfiniBand. The inter-ested reader is referred to Pinkston et al. [2003] for a detailed discussion on this.

There is also a new disk interface standard called Serial Advanced Technology Attachment (SATA) that is replacing parallel Integrated Device Electronics (IDE) with serial signaling technology to allow for increased bandwidth. Most disks in the market use this new interface, but keep in mind that Fibre Channel is still alive and well. Indeed, most of the promises made by InfiniBand in the SAN market were satisfied by Fibre Channel first, thus increasing their share of the market.

Some believe that Ethernet, PCI-Express, and SATA have the edge in the LAN, I/O interface, and disk interface areas, respectively. But the fate of the remaining storage area networking contenders depends on many factors. A won-derful characteristic of computer architecture is that such issues will not remain endless academic debates, unresolved as people rehash the same arguments repeatedly. Instead, the battle is fought in the marketplace, with well-funded and talented groups giving their best efforts at shaping the future. Moreover, constant changes to technology reward those who are either astute or lucky. The best com-bination of technology and follow-through has often determined commercial suc-cess. Time will tell us who will win and who will lose, at least for the next round!

On-Chip Networks

Relative to the other network domains, on-chip networks are in their infancy. As recently as the late 1990s, the traditional way of interconnecting devices such as caches, register files, ALUs, and other functional units within a chip was to use dedicated links aimed at minimizing latency or shared buses aimed at simplicity.

But with subsequent increases in the volume of interconnected devices on a sin-gle chip, the length and delay of wires to cross a chip, and chip power consump-tion, it has become important to share on-chip interconnect bandwidth in a more structured way, giving rise to the notion of a network on-chip. Among the first to recognize this were Agarwal [Waingold et al. 1997] and Dally [Dally 1999; Dally and Towles 2001]. They and others argued that on-chip networks that route pack-ets allow efficient sharing of burgeoning wire resources between many communi-cation flows and also facilitate modularity to mitigate chip-crossing wire delay problems identified by Ho, Mai, and Horowitz [2001]. Switched on-chip net-works were also viewed as providing better fault isolation and tolerance. Chal-lenges in designing these networks were later described by Taylor et al. [2005], who also proposed a 5-tuple model for characterizing the delay of OCNs. A design process for OCNs that provides a complete synthesis flow was proposed

by Bertozzi et al. [2005]. Following these early works, much research and devel-opment has gone into on-chip network design, making this a very hot area of microarchitecture activity.

Multicore and tiled designs featuring on-chip networks have become very popular since the turn of the millennium. Pinkston and Shin [2005] provide a sur-vey of on-chip networks used in early multicore/tiled systems. Most designs exploit the reduced wiring complexity of switched OCNs as the paths between cores/tiles can be precisely defined and optimized early in the design process, thus enabling improved power and performance characteristics. With typically tens of thousands of wires attached to the four edges of a core or tile as “pin-outs,” wire resources can be traded off for improved network performance by having very wide channels over which data can be sent broadside (and possibly scaled up or down according to the power management technique), as opposed to serializing the data over fixed narrow channels.

Rings, meshes, and crossbars are straightforward to implement in planar chip technology and routing is easily defined on them, so these were popular topolog-ical choices in early switched OCNs. It will be interesting to see if this trend con-tinues in the future when several tens to hundreds of heterogeneous cores and tiles will likely be interconnected within a single chip, possibly using 3D integra-tion technology. Considering that processor microarchitecture has evolved signif-icantly from its early beginnings in response to application demands and technological advancements, we would expect to see vast architectural improve-ments to on-chip networks as well.

References

Agarwal, A. [1991]. “Limits on interconnection network performance,” IEEE Trans. on Parallel and Distributed Systems 2:4 (April), 398–412.

Alles, A. [1995]. “ATM internetworking” (May), www.cisco.com/warp/public/614/12.html.

Anderson, T. E., D. E. Culler, and D. Patterson [1995]. “A case for NOW (networks of workstations),” IEEE Micro 15:1 (February), 54–64.

Anjan, K. V., and T. M. Pinkston [1995]. “An efficient, fully-adaptive deadlock recovery scheme: Disha,” Proc. 22nd Annual Int’l. Symposium on Computer Architecture, June 22–24, 1995, Santa Margherita Ligure, Italy.

Arpaci, R. H., D. E. Culler, A. Krishnamurthy, S. G. Steinberg, and K. Yelick [1995].

“Empirical evaluation of the Cray-T3D: A compiler perspective,” Proc. 22nd Annual Int’l. Symposium on Computer Architecture, June 22–24, 1995, Santa Margherita Ligure, Italy.

Bell, G., and J. Gray [2001]. Crays, Clusters and Centers, MSR-TR-2001-76, Microsoft Corporation, Redmond, Wash.

Benes, V. E. [1962]. “Rearrangeable three stage connecting networks,” Bell System Tech-nical Journal 41, 1481–1492.

Bertozzi, D., A. Jalabert, S. Murali, R. Tamhankar, S. Stergiou, L. Benini, and G. De Micheli [2005]. “NoC synthesis flow for customized domain specific multiprocessor systems-on-chip,” IEEE Trans. on Parallel and Distributed Systems 16:2 (February), 113–130.

Bhuyan, L. N., and D. P. Agrawal [1984]. “Generalized hypercube and hyperbus struc-tures for a computer network,” IEEE Trans. on Computers 32:4 (April), 322–333.

Brewer, E. A., and B. C. Kuszmaul [1994]. “How to get good performance from the CM-5 data network.” Proc. Eighth Int’l Parallel Processing Symposium, April 26–29, 1994, Cancun, Mexico.

Clos, C. [1953]. “A study of non-blocking switching networks,” Bell Systems Technical Journal 32 (March), 406–424.

Dally, W. J. [1990]. “Performance analysis of k-ary n-cube interconnection networks,”

IEEE Trans. on Computers 39:6 (June), 775–785.

Dally, W. J. [1992]. “Virtual channel flow control,” IEEE Trans. on Parallel and Distrib-uted Systems 3:2 (March), 194–205.

Dally, W. J. [1999]. “Interconnect limited VLSI architecture,” Proc. of the Int’l. Intercon-nect Technology Conference, May 24–26, 1999, San Francisco, Calif.

Dally, W. J., and C. I. Seitz [1986]. “The torus routing chip,” Distributed Computing 1:4, 187–196.

Dally, W. J., and B. Towles [2001]. “Route packets, not wires: On-chip interconnection networks,” Proc. of the 38th Design Automation Conference, June 18–22, 2001, Las Vegas, Nev.

Dally, W. J., and B. Towles [2004]. Principles and Practices of Interconnection Networks, Morgan Kaufmann Publishers, San Francisco.

Davie, B. S., L. L. Peterson, and D. Clark [1999]. Computer Networks: A Systems Approach, 2nd ed., Morgan Kaufmann Publishers, San Francisco.

Duato, J. [1993]. “A new theory of deadlock-free adaptive routing in wormhole networks,”

IEEE Trans. on Parallel and Distributed Systems 4:12 (December) 1320–1331.

Duato, J., I. Johnson, J. Flich, F. Naven, P. Garcia, and T. Nachiondo [2005]. “A new scal-able and cost-effective congestion management strategy for lossless multistage inter-connection networks,” Proc. 11th Int’l. Symposium on High Performance Computer Architecture, February 12–16, 2005, San Francisco.

Duato, J., O. Lysne, R. Pang, and T. M. Pinkston [2005]. “Part I: A theory for deadlock-free dynamic reconfiguration of interconnection networks,” IEEE Trans. on Parallel and Distributed Systems 16:5 (May), 412–427.

Duato, J., and T. M. Pinkston [2001]. “A general theory for deadlock-free adaptive routing using a mixed set of resources,” IEEE Trans. on Parallel and Distributed Systems 12:12 (December), 1219–1235.

Duato, J., S. Yalamanchili, and L. Ni [2003]. Interconnection Networks: An Engineering Approach, 2nd printing, Morgan Kaufmann Publishers, San Francisco.

Flich, J., and Bertozzi, D. [2010]. Designing Network-on-Chip Architectures in the Nanoscale Era, CRC Press, Boca Raton, FL.

Glass, C. J., and L. M. Ni [1992]. “The Turn Model for adaptive routing,” Proc. 19th Int’l.

Symposium on Computer Architecture, May, Gold Coast, Australia.

Gunther, K. D. [1981]. “Prevention of deadlocks in packet-switched data transport sys-tems,” IEEE Trans. on Communications COM–29:4 (April), 512–524.

Ho, R., K. W. Mai, and M. A. Horowitz [2001]. “The future of wires,” Proc. of the IEEE 89:4 (April), 490–504.

Holt, R. C. [1972]. “Some deadlock properties of computer systems,” ACM Computer Surveys 4:3 (September), 179–196.

Hoskote, Y., S. Vangal, A. Singh, N. Borkar, and S. Borkar S. [2007] “A 5-ghz mesh interconnect for a teraflops processor,” IEEE Micro 27:5, 51–61.

Howard, J., S. Dighe, Y. Hoskote, S. Vangal, S. Finan, G. Ruhl, D. Jenkins, H. Wilson, N. Borka, G. Schrom, F. Pailet, S. Jain, T. Jacob, S. Yada, S. Marella, P. Salihundam, V. Erraguntla, M. Konow, M. Riepen, G. Droege, J. Lindemann, M. Gries, T. Apel, K. Henriss, T. Lund-Larsen, S. Steibl, S. Borkar, V. De, R. Van Der Wijngaart, and

T. Mattson [2010]. “A 48-core IA-32 message-passing processor with DVFS in 45nm CMOS,” IEEE International Solid-State Circuits Conference Digest of Techni-cal Papers, pp. 58–59.

InfiniBand Trade Association [2001]. InfiniBand Architecture Specifications Release 1.0.a, www.infinibandta.org.

Jantsch, A., and H. Tenhunen, eds. [2003]. Networks on Chips, Kluwer Academic Publish-ers, The Netherlands.

Kahn, R. E. [1972]. “Resource-sharing computer communication networks,” Proc. IEEE 60:11 (November), 1397–1407.

Kermani, P., and L. Kleinrock [1979]. “Virtual cut-through: A new computer communica-tion switching technique,” Computer Networks 3 (January), 267–286.

Kurose, J. F., and K. W. Ross [2001]. Computer Networking: A Top-Down Approach Featuring the Internet, Addison-Wesley, Boston.

Leiserson, C. E. [1985]. “Fat trees: Universal networks for hardware-efficient supercom-puting,” IEEE Trans. on Computers C–34:10 (October), 892–901.

Merlin, P. M., and P. J. Schweitzer [1980]. “Deadlock avoidance in store-and-forward net-works. I. Store-and-forward deadlock,” IEEE Trans. on Communications COM–28:3 (March), 345–354.

Metcalfe, R. M. [1993]. “Computer/network interface design: Lessons from Arpanet and Ethernet.” IEEE J. on Selected Areas in Communications 11:2 (February), 173–180.

Metcalfe, R. M., and D. R. Boggs [1976]. “Ethernet: Distributed packet switching for local computer networks,” Comm. ACM 19:7 (July), 395–404.

Partridge, C. [1994]. Gigabit Networking. Addison-Wesley, Reading, Mass.

Peh, L. S., and W. J. Dally [2001]. “A delay model and speculative architecture for pipe-lined routers,” Proc. 7th Int’l. Symposium on High Performance Computer Architec-ture, January 20–24, 2001, Monterrey, Mexico.

Pfister, G. F. [1998]. In Search of Clusters, 2nd ed., Prentice Hall, Upper Saddle River, N.J.

Pinkston, T. M. [2004]. “Deadlock characterization and resolution in interconnection net-works,” in Deadlock Resolution in Computer-Integrated Systems, M. C. Zhu and M. P. Fanti, eds., CRC Press, Boca Raton, Fl., 445–492.

Pinkston, T. M., A. Benner, M. Krause, I. Robinson, and T. Sterling [2003]. “InfiniBand:

The ‘de facto’ future standard for system and local area networks or just a scalable replacement for PCI buses?” Cluster Computing (Special Issue on Communication Architecture for Clusters) 6:2 (April), 95–104.

Pinkston, T. M., and J. Shin [2005]. “Trends toward on-chip networked microsystems,”

Int’l. J. of High Performance Computing and Networking 3:1, 3–18.

Pinkston, T. M., and S. Warnakulasuriya [1997]. “On deadlocks in interconnection net-works,” Proc. 24th Int’l. Symposium on Computer Architecture, June 2–4, 1997, Den-ver, Colo.

Puente, V., R. Beivide, J. A. Gregorio, J. M. Prellezo, J. Duato, and C. Izu [1999]. “Adap-tive bubble router: A design to improve performance in torus networks,” Proc. 28th Int’l. Conference on Parallel Processing, September 21–24, 1999, Aizu-Wakamatsu, Japan.

Rodrigo, S., J. Flich, J. Duato, and M. Hummel [2008] “Efficient unicast and multicast support for CMPs,” Proc. 41st Annual IEEE/ACM International Symposium on Micro-architecture (MICRO-41), November 8–12, 2008, Lake Como, Italy, pp. 364–375.

Saltzer, J. H., D. P. Reed, and D. D. Clark [1984]. “End-to-end arguments in system design,” ACM Trans. on Computer Systems 2:4 (November), 277–288.

Satran, J., D. Smith, K. Meth, C. Sapuntzakis, M. Wakeley, P. Von Stamwitz, R. Haagens, E. Zeidner, L. Dalle Ore, and Y. Klein [2001]. “iSCSI,” IPS working group of IETF, Internet draft, www.ietf.org/internet-drafts/draft-ietf-ips-iscsi-07.txt.

Scott, S. L., and J. Goodman [1994]. “The impact of pipelined channels on k-ary n-cube networks,” IEEE Trans. on Parallel and Distributed Systems 5:1 (January), 1–16.

Senior, J. M. [1993]. Optical Fiber Commmunications: Principles and Practice, 2nd ed., Prentice Hall, Hertfordshire, U.K.

Spurgeon, C. [2006]. “Charles Spurgeon’s Ethernet Web Site,” www.etherman-age.com/ethernet/ethernet.html.

Sterling, T. [2001]. Beowulf PC Cluster Computing with Windows and Beowulf PC Clus-ter Computing with Linux, MIT Press, Cambridge, Mass.

Stevens, W. R. [1994–1996]. TCP/IP Illustrated (three volumes), Addison-Wesley, Reading, Mass.

Tamir, Y., and G. Frazier [1992]. “Dynamically-allocated multi-queue buffers for VLSI communication switches,” IEEE Trans. on Computers 41:6 (June), 725–734.

Tanenbaum, A. S. [1988]. Computer Networks, 2nd ed., Prentice Hall, Englewood Cliffs, N.J.

Taylor, M. B., W. Lee, S. P. Amarasinghe, and A. Agarwal [2005]. “Scalar operand net-works,” IEEE Trans. on Parallel and Distributed Systems 16:2 (February), 145–162.

Thacker, C. P., E. M. McCreight, B. W. Lampson, R. F. Sproull, and D. R. Boggs [1982].

“Alto: A personal computer,” in Computer Structures: Principles and Examples, D. P.

Siewiorek, C. G. Bell, and A. Newell, eds., McGraw-Hill, New York, 549–572.

TILE-GX, http://www.tilera.com/sites/default/files/productbriefs/PB025_TILE-Gx_Processor_

A_v3.pdf.

von Eicken, T., D. E. Culler, S. C. Goldstein, and K. E. Schauser [1992]. “Active messages:

A mechanism for integrated communication and computation,” Proc. 19th Annual Int’l. Symposium on Computer Architecture, May 19–21, 1992, Gold Coast, Australia.

Vaidya, A. S., A Sivasubramaniam, and C. R. Das [1997]. “Performance benefits of virtual channels and adaptive routing: An application-driven study,” Proc. 11th ACM Int’l Conference on Supercomputing, July 7–11, 1997, Vienna, Austria.

Van Leeuwen, J., and R. B. Tan [1987] “Interval Routing,” The Computer Journal 30:4, 298–307.

Waingold, E., M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P.

Finch, R. Barua, J. Babb, S. Amarasinghe, and A. Agarwal [1997]. “Baring it all to software: Raw Machines,” IEEE Computer 30 (September), 86–93.

Yang, Y., and G. Mason [1991]. “Nonblocking broadcast switching networks,” IEEE Trans. on Computers 40:9 (September), 1005–1015.

Solutions to “starred” exercises are available for instructors who register at textbooks.elsevier.com.

✪

F.1 [15] <F.2, F.3> Is electronic communication always faster than nonelectronic means for longer distances? Calculate the time to send 1000 GB using 25 8-mm tapes and an overnight delivery service versus sending 1000 GB by FTP over the Internet. Make the following four assumptions:

■ The tapes are picked up at 4 P.M. Pacific time and delivered 4200 km away at 10 A.M. Eastern time (7 A.M. Pacific time).

■ On one route the slowest link is a T3 line, which transfers at 45 Mbits/sec.

Exercises

■ On another route the slowest link is a 100-Mbit/sec Ethernet.

■ You can use 50% of the slowest link between the two sites.

Will all the bytes sent by either Internet route arrive before the overnight delivery person arrives?

✪

F.2 [10] <F.2, F.3> For the same assumptions as Exercise F.1, what is the bandwidth of overnight delivery for a 1000-GB package?

✪

F.3 [10] <F.2, F.3> For the same assumptions as Exercise F.1, what is the minimum bandwidth of the slowest link to beat overnight delivery? What standard network options match that speed?

✪

F.4 [15] <F.2, F.3> The original Ethernet standard was for 10 Mbits/sec and a maxi-mum distance of 2.5 km. How many bytes could be in flight in the original Ether-net? Assume you can use 90% of the peak bandwidth.

✪

F.5 [15] <F.2, F.3> Flow control is a problem for WANs due to the long time of flight, as the example on page F-14 illustrates. Ethernet did not include flow con-trol when it was first standardized at 10 Mbits/sec. Calculate the number of bytes in flight for a 10-Gbit/sec Ethernet over a 100 meter link, assuming you can use 90% of peak bandwidth. What does your answer mean for network designers?

✪

F.6 [15] <F.2, F.3> Assume the total overhead to send a zero-length data packet on an Ethernet is 100 µs and that an unloaded network can transmit at 90% of the peak 1000-Mbit/sec rating. For the purposes of this question, assume that the size of the Ethernet header and trailer is 56 bytes. Assume a continuous stream of pack-ets of the same size. Plot the delivered bandwidth of user data in Mbits/sec as the payload data size varies from 32 bytes to the maximum size of 1500 bytes in 32-byte increments.

✪

F.7 [10] <F.2, F.3> Exercise F.6 suggests that the delivered Ethernet bandwidth to a single user may be disappointing. Making the same assumptions as in that exer-cise, by how much would the maximum payload size have to be increased to deliver half of the peak bandwidth?

✪

F.8 [10] <F.2, F.3> One reason that ATM has a fixed transfer size is that when a short message is behind a long message, a node may need to wait for an entire transfer to complete. For applications that are time sensitive, such as when transmitting voice or video, the large transfer size may result in transmission delays that are too long for the application. On an unloaded interconnection, what is the worst-case delay in microseconds if a node must wait for one full-size Ethernet packet versus an ATM transfer? See Figure F.30 (page F-78) to find the packet sizes. For this question assume that you can transmit at 100% of the 622-Mbits/sec ATM network and 100% of the 1000-Mbit/sec Ethernet.

✪

F.9 [10] <F.2, F.3> Exercise F.7 suggests the need for expanding the maximum pay-load to increase the delivered bandwidth, but Exercise F.8 suggests the impact on worst-case latency of making it longer. What would be the impact on latency of increasing the maximum payload size by the answer to Exercise F.7?

✪

F.10 [12/12/20] <F.4> The Omega network shown in Figure F.11 on page F-31 con-sists of three columns of four switches, each with two inputs and two outputs.

Each switch can be set to straight, which connects the upper switch input to the upper switch output and the lower input to the lower output, and to exchange, which connects the upper input to the lower output and vice versa for the lower input. For each column of switches, label the inputs and outputs 0, 1, . . . , 7 from top to bottom, to correspond with the numbering of the processors.

a. [12] <F.4> When a switch is set to exchange and a message passes through, what is the relationship between the label values for the switch input and out-put used by the message? (Hint: Think in terms of operations on the digits of

在文檔中 Interconnection Networks (頁 106-118)