Other References - Parallelism (Chapter 3 and Appendices C and H) L-26 L.6 The Development of

The concept of using virtual memory to implement a shared address space among distinct machines was pioneered in Kai Li’s Ivy system in 1988. There have been subsequent papers exploring hardware support issues, software mechanisms, and programming issues. Amza et al. [1996] described a system built on workstations using a new consistency model, Kontothanassis et al. [1997] described a software shared-memory scheme using remote writes, and Erlichson et al. [1996]

described the use of shared virtual memory to build large-scale multiprocessors using SMPs as nodes.

There is an almost unbounded amount of information on multiprocessors and multicomputers: Conferences, journal papers, and even books seem to appear faster than any single person can absorb the ideas. No doubt many of these papers will go unnoticed—not unlike the past. Most of the major architecture conferences contain papers on multiprocessors. An annual conference, Supercomputing XY (where X and Y are the last two digits of the year), brings together users, architects, software developers, and vendors, and the proceedings are published in book, CD-ROM, and online (see www.scXY.org) form. Two major journals, Journal of Paral-lel and Distributed Computing and the IEEE Transactions on ParalParal-lel and Distrib-uted Systems, contain papers on all aspects of parallel processing. Several books focusing on parallel processing are included in the following references, with Culler, Singh, and Gupta [1999] being the most recent, large-scale effort. For years, Eugene Miya of NASA Ames Research Center has collected an online bib-liography of parallel-processing papers. The bibbib-liography, which now contains

more than 35,000 entries, is available online at liinwww.ira.uka.de/bibliography/

Parallel/Eugene/index.html.

In addition to documenting the discovery of concepts now used in practice, these references also provide descriptions of many ideas that have been explored and found wanting, as well as ideas whose time has just not yet come. Given the move toward multicore and multiprocessors as the future of high-performance computer architecture, we expect that many new approaches will be explored in the years ahead. A few of them will manage to solve the hardware and software problems that have been the key to using multiprocessing for the past 40 years!

References

Adve, S. V., and K. Gharachorloo [1996]. “Shared memory consistency models: A tutorial,” IEEE Computer 29:12 (December), 66–76.

Adve, S. V., and M. D. Hill [1990]. “Weak ordering—a new definition,” Proc. 17th Annual Int’l. Symposium on Computer Architecture (ISCA), May 28–31, 1990, Seattle, Wash., 2–14.

Agarwal, A., R. Bianchini, D. Chaiken, K. Johnson, and D. Kranz [1995]. “The MIT Alewife machine: Architecture and performance,” 22nd Annual Int’l. Symposium on Computer Architecture (ISCA), June 22–24, 1995, Santa Margherita, Italy, 2–13.

Agarwal, A., J. L. Hennessy, R. Simoni, and M. A. Horowitz [1988]. “An evaluation of directory schemes for cache coherence,” Proc. 15th Annual Int’l. Symposium on Computer Architecture, May 30–June 2, 1988, Honolulu, Hawaii, 280–289.

Agarwal, A., J. Kubiatowicz, D. Kranz, B.-H. Lim, D. Yeung, G. D’Souza, and M. Parkin [1993]. “Sparcle: An evolutionary processor design for large-scale multiprocessors,”

IEEE Micro 13 (June), 48–61.

Alles, A. [1995]. “ATM Internetworking,” White Paper (May), Cisco Systems, Inc., San Jose, Calif. (www.cisco.com/warp/public/614/12.html).

Almasi, G. S., and A. Gottlieb [1989]. Highly Parallel Computing, Benjamin/Cummings, Redwood City, Calif.

Alverson, G., R. Alverson, D. Callahan, B. Koblenz, A. Porterfield, and B. Smith [1992].

“Exploiting heterogeneous parallelism on a multithreaded multiprocessor,” Proc.

ACM/IEEE Conf. on Supercomputing, November 16–20, 1992, Minneapolis, Minn., 188–197.

Amdahl, G. M. [1967]. “Validity of the single processor approach to achieving large scale computing capabilities,” Proc. AFIPS Spring Joint Computer Conf., April 18–20, 1967, Atlantic City, N.J., 483–485.

Amza C., A. L. Cox, S. Dwarkadas, P. Keleher, H. Lu, R. Rajamony, W. Yu, and W.

Zwaenepoel [1996]. “Treadmarks: Shared memory computing on networks of work-stations,” IEEE Computer 29:2 (February), 18–28.

Anderson, T. E., D. E. Culler, and D. Patterson [1995]. “A case for NOW (networks of workstations),” IEEE Micro 15:1 (February), 54–64.

Ang, B., D. Chiou, D. Rosenband, M. Ehrlich, L. Rudolph, and Arvind [1998]. “StarT-Voyager: A flexible platform for exploring scalable SMP issues,” Proc. ACM/IEEE Conf. on Supercomputing, November 7–13, 1998, Orlando, FL.

Archibald, J., and J.-L. Baer [1986]. “Cache coherence protocols: Evaluation using a mul-tiprocessor simulation model,” ACM Trans. on Computer Systems 4:4 (November), 273–298.

Arpaci, R. H., D. E. Culler, A. Krishnamurthy, S. G. Steinberg, and K. Yelick [1995].

“Empirical evaluation of the CRAY-T3D: A compiler perspective,” Proc. 22nd Annual Int’l. Symposium on Computer Architecture (ISCA), June 22–24, 1995, Santa Margherita, Italy.

Baer, J.-L., and W.-H. Wang [1988]. “On the inclusion properties for multi-level cache hierarchies,” Proc. 15th Annual Int’l. Symposium on Computer Architecture, May 30–

June 2, 1988, Honolulu, Hawaii, 73–80.

Balakrishnan, H. V., N. Padmanabhan, S. Seshan, and R. H. Katz [1997]. “A comparison of mechanisms for improving TCP performance over wireless links,” IEEE/ACM Trans. on Networking 5:6 (December), 756–769.

Barroso, L. A., K. Gharachorloo, and E. Bugnion [1998]. “Memory system characterization of commercial workloads,” Proc. 25th Annual Int’l. Symposium on Computer Architecture (ISCA), July 3–14, 1998, Barcelona, Spain, 3–14.

Baskett, F., T. Jermoluk, and D. Solomon [1988]. “The 4D-MP graphics superworksta-tion: Computing + graphics = 40 MIPS + 40 MFLOPS and 10,000 lighted polygons per second,” Proc. IEEE COMPCON, February 29–March 4, 1988, San Francisco, 468–471.

BBN Laboratories. [1986]. Butterfly Parallel Processor Overview, Tech. Rep. 6148, BBN Laboratories, Cambridge, Mass.

Bell, C. G. [1985]. “Multis: A new class of multiprocessor computers,” Science 228 (April 26), 462–467.

Bell, C. G. [1989]. “The future of high performance computers in science and engineer-ing,” Communications of the ACM 32:9 (September), 1091–1101.

Bell, C. G., and J. Gray [2001]. Crays, Clusters and Centers, Tech. Rep. MSR-TR-2001-76, Microsoft Research, Redmond, Wash.

Bell, C. G., and J. Gray [2002]. “What’s next in high performance computing,” CACM, 45:2 (February), 91–95.

Bouknight, W. J., S. A. Deneberg, D. E. McIntyre, J. M. Randall, A. H. Sameh, and D. L.

Slotnick [1972]. “The Illiac IV system,” Proc. IEEE 60:4, 369–379. Also appears in D. P. Siewiorek, C. G. Bell, and A. Newell, Computer Structures: Principles and Examples, McGraw-Hill, New York, 1982, 306–316.

Brain, M. [2000]. Inside a Digital Cell Phone, www.howstuffworks.com/inside-cell-phone.htm.

Brewer, E. A., and B. C. Kuszmaul [1994]. “How to get good performance from the CM-5 data network,” Proc. Eighth Int’l. Parallel Processing Symposium (IPPS), April 26–

29, 1994, Cancun, Mexico.

Brin, S., and L. Page [1998]. “The anatomy of a large-scale hypertextual Web search engine,” Proc. 7th Int’l. World Wide Web Conf., April 14–18, 1998, Brisbane, Queensland, Australia, 107–117.

Burkhardt III, H., S. Frank, B. Knobe, and J. Rothnie [1992]. Overview of the KSR1 Com-puter System, Tech. Rep. KSR-TR-9202001, Kendall Square Research, Boston.

Censier, L., and P. Feautrier [1978]. “A new solution to coherence problems in multicache systems,” IEEE Trans. on Computers C-27:12 (December), 1112–1118.

Chandra, R., S. Devine, B. Verghese, A. Gupta, and M. Rosenblum [1994]. “Scheduling and page migration for multiprocessor compute servers,” Proc. Sixth Int’l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 4–7, 1994, San Jose, Calif., 12–24.

Charlesworth, A. [1998]. “Starfire: Extending the SMP envelope,” IEEE Micro 18:1 (January/February), 39–49.

Clark, W. A. [1957]. “The Lincoln TX-2 computer development,” Proc. Western Joint Computer Conference, February 26–28, 1957, Los Angeles, 143–145.

Comer, D. [1993]. Internetworking with TCP/IP, 2nd ed., Prentice Hall, Englewood Cliffs, N.J.

Culler, D. E., J. P. Singh, and A. Gupta [1999]. Parallel Computer Architecture: A Hard-ware/Software Approach, Morgan Kaufmann, San Francisco.

Dally, W. J., and C. I. Seitz [1986]. “The torus routing chip,” Distributed Computing 1:4, 187–196.

Davie, B. S., L. L. Peterson, and D. Clark [1999]. Computer Networks: A Systems Approach, 2nd ed., Morgan Kaufmann, San Francisco.

Desurvire, E. [1992]. “Lightwave communications: The fifth generation,” Scientific Amer-ican (International Edition) 266:1 (January), 96–103.

Dongarra, J., T. Sterling, H. Simon, and E. Strohmaier [2005]. “High-performance com-puting: Clusters, constellations, MPPs, and future directions,” Computing in Science

& Engineering, 7:2 (March/April), 51–59.

Dubois, M., C. Scheurich, and F. Briggs [1988]. “Synchronization, coherence, and event ordering,” IEEE Computer 21:2 (February), 9–21.

Dunigan, W., K. Vetter, K. White, and P. Worley [2005]. “Performance evaluation of the Cray X1 distributed shared memory architecture,” IEEE Micro, January/February, 30–40.

Eggers, S. [1989]. “Simulation Analysis of Data Sharing in Shared Memory Multiproces-sors,” Ph.D. thesis, Computer Science Division, University of California, Berkeley.

Elder, J., A. Gottlieb, C. K. Kruskal, K. P. McAuliffe, L. Randolph, M. Snir, P. Teller, and J. Wilson [1985]. “Issues related to MIMD shared-memory computers: The NYU Ul-tracomputer approach,” Proc. 12th Annual Int’l. Symposium on Computer Architec-ture (ISCA), June 17–19, 1985, Boston, Mass., 126–135.

Erlichson, A., N. Nuckolls, G. Chesson, and J. L. Hennessy [1996]. “SoftFLASH: Analyz-ing the performance of clustered distributed virtual shared memory,” Proc. Seventh Int’l. Conf. on Architectural Support for Programming Languages and Operating Sys-tems (ASPLOS), October 1–5, 1996, Cambridge, Mass., 210–220.

Falsafi, B., and D. A. Wood [1997]. “Reactive NUMA: A design for unifying S-COMA and CC-NUMA,” Proc. 24th Annual Int’l. Symposium on Computer Architecture (IS-CA), June 2–4, 1997, Denver, Colo., 229–240.

Flynn, M. J. [1966]. “Very high-speed computing systems,” Proc. IEEE 54:12 (Decem-ber), 1901–1909.

Forgie, J. W. [1957]. “The Lincoln TX-2 input-output system,” Proc. Western Joint Com-puter Conference, February 26–28, 1957, Los Angeles, 156–160.

Frank, S. J. [1984]. “Tightly coupled multiprocessor systems speed memory access time,”

Electronics 57:1 (January), 164–169.

Gajski, D., D. Kuck, D. Lawrie, and A. Sameh [1983]. “CEDAR—a large scale multipro-cessor,” Proc. Int’l. Conf. on Parallel Processing (ICPP), August, Columbus, Ohio, 524–529.

Galles, M. [1996]. “Scalable pipelined interconnect for distributed endpoint routing: The SGI SPIDER chip,” Proc. IEEE HOT Interconnects ’96, August 15–17, 1996, Stan-ford University, Palo Alto, Calif.

Gehringer, E. F., D. P. Siewiorek, and Z. Segall [1987]. Parallel Processing: The Cm*

Experience, Digital Press, Bedford, Mass.

Gharachorloo, K., A. Gupta, and J. L. Hennessy [1992]. “Hiding memory latency using dy-namic scheduling in shared-memory multiprocessors,” Proc. 19th Annual Int’l. Sympo-sium on Computer Architecture (ISCA), May 19–21, 1992, Gold Coast, Australia.

Gharachorloo, K., D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. L. Hennessy [1990]. “Memory consistency and event ordering in scalable shared-memory multi-processors,” Proc. 17th Annual Int’l. Symposium on Computer Architecture (ISCA), May 28–31, 1990, Seattle, Wash., 15–26.

Gibson, J., R. Kunz, D. Ofelt, M. Horowitz, J. Hennessy, and M. Heinrich [2000].

“FLASH vs. (simulated) FLASH: Closing the simulation loop,” Proc. Ninth Int’l.

Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), November 12–15, Cambridge, Mass., 49–58.

Goodman, J. R. [1983]. “Using cache memory to reduce processor memory traffic,” Proc.

10th Annual Int’l. Symposium on Computer Architecture (ISCA), June 5–7, 1982, Stockholm, Sweden, 124–131.

Goralski, W. [1997]. SONET: A Guide to Synchronous Optical Network, McGraw-Hill, New York.

Grice, C., and M. Kanellos [2000]. “Cell phone industry at crossroads: Go high or low?”

CNET News (August 31), technews.netscape.com/news/0-1004-201-2518386-0.html?

tag=st.ne.1002.tgif.sf.

Groe, J. B., and L. E. Larson [2000]. CDMA Mobile Radio Design, Artech House, Boston.

Hagersten E., and M. Koster [1998]. “WildFire: A scalable path for SMPs,” Proc. Fifth Int’l. Symposium on High-Performance Computer Architecture, January 9–12, 1999, Orlando, Fla.

Hagersten, E., A. Landin, and S. Haridi [1992]. “DDM—a cache-only memory architec-ture,” IEEE Computer 25:9 (September), 44–54.

Hill, M. D. [1998]. “Multiprocessors should support simple memory consistency models,”

IEEE Computer 31:8 (August), 28–34.

Hillis, W. D. [1985]. The Connection Multiprocessor, MIT Press, Cambridge, Mass.

Hirata, H., K. Kimura, S. Nagamine, Y. Mochizuki, A. Nishimura, Y. Nakase, and T.

Nishizawa [1992]. “An elementary processor architecture with simultaneous in-struction issuing from multiple threads,” Proc. 19th Annual Int’l. Symposium on Computer Architecture (ISCA), May 19–21, 1992, Gold Coast, Australia, 136–145.

Hockney, R. W., and C. R. Jesshope [1988]. Parallel Computers 2: Architectures, Pro-gramming and Algorithms, Adam Hilger, Ltd., Bristol, England.

Holland, J. H. [1959]. “A universal computer capable of executing an arbitrary number of subprograms simultaneously,” Proc. East Joint Computer Conf. 16, 108–113.

Hord, R. M. [1982]. The Illiac-IV, The First Supercomputer, Computer Science Press, Rockville, Md.

Hristea, C., D. Lenoski, and J. Keen [1997]. “Measuring memory hierarchy performance of cache-coherent multiprocessors using micro benchmarks,” Proc. ACM/IEEE Conf.

on Supercomputing, November 15–21, 1997, San Jose, Calif.

Hwang, K. [1993]. Advanced Computer Architecture and Parallel Programming, McGraw-Hill, New York.

IBM. [2005]. “Blue Gene,” IBM J. of Research and Development, 49:2/3 (special issue).

Infiniband Trade Association. [2001]. InfiniBand Architecture Specifications Release 1.0.a, www.infinibandta.org.

Jordan, H. F. [1983]. “Performance measurements on HEP—a pipelined MIMD comput-er,” Proc. 10th Annual Int’l. Symposium on Computer Architecture (ISCA), June 5–7, 1982, Stockholm, Sweden, 207–212.

Kahn, R. E. [1972]. “Resource-sharing computer communication networks,” Proc. IEEE 60:11 (November), 1397–1407.

Keckler, S. W., and W. J. Dally [1992]. “Processor coupling: Integrating compile time and runtime scheduling for parallelism,” Proc. 19th Annual Int’l. Symposium on Computer Architecture (ISCA), May 19–21, 1992, Gold Coast, Australia, 202–213.

Kontothanassis, L., G. Hunt, R. Stets, N. Hardavellas, M. Cierniak, S. Parthasarathy, W. Meira, S. Dwarkadas, and M. Scott [1997]. “VM-based shared memory on low-latency, remote-memory-access networks,” Proc. 24th Annual Int’l. Symposium on Computer Architecture (ISCA), June 2–4, 1997, Denver, Colo.

Kurose, J. F., and K. W. Ross [2001]. Computer Networking: A Top-Down Approach Featuring the Internet, Addison-Wesley, Boston.

Kuskin, J., D. Ofelt, M. Heinrich, J. Heinlein, R. Simoni, K. Gharachorloo, J. Chapin, D.

Nakahira, J. Baxter, M. Horowitz, A. Gupta, M. Rosenblum, and J. L. Hennessy [1994]. “The Stanford FLASH multiprocessor,” Proc. 21st Annual Int’l. Symposium on Computer Architecture (ISCA), April 18–21, 1994, Chicago.

Lamport, L. [1979]. “How to make a multiprocessor computer that correctly executes multiprocess programs,” IEEE Trans. on Computers C-28:9 (September), 241–248.

Laudon, J., A. Gupta, and M. Horowitz [1994]. “Interleaving: A multithreading technique targeting multiprocessors and workstations,” Proc. Sixth Int’l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 4–7, 1994, San Jose, Calif., 308–318.

Laudon, J., and D. Lenoski [1997]. “The SGI Origin: A ccNUMA highly scalable server,”

Proc. 24th Annual Int’l. Symposium on Computer Architecture (ISCA), June 2–4, 1997, Denver, Colo., 241–251.

Lenoski, D., J. Laudon, K. Gharachorloo, A. Gupta, and J. L. Hennessy [1990]. “The Stan-ford DASH multiprocessor,” Proc. 17th Annual Int’l. Symposium on Computer Archi-tecture (ISCA), May 28–31, 1990, Seattle, Wash., 148–159.

Lenoski, D., J. Laudon, K. Gharachorloo, W.-D. Weber, A. Gupta, J. L. Hennessy, M. A.

Horowitz, and M. Lam [1992]. “The Stanford DASH multiprocessor,” IEEE Computer 25:3 (March), 63–79.

Li, K. [1988]. “IVY: A shared virtual memory system for parallel computing,” Proc. Int’l.

Conf. on Parallel Processing (ICCP), August, The Pennsylvania State University, University Park, Penn.

Lo, J., L. Barroso, S. Eggers, K. Gharachorloo, H. Levy, and S. Parekh [1998]. “An analy-sis of database workload performance on simultaneous multithreaded processors,”

Proc. 25th Annual Int’l. Symposium on Computer Architecture (ISCA), July 3–14, 1998, Barcelona, Spain, 39–50.

Lo, J., S. Eggers, J. Emer, H. Levy, R. Stamm, and D. Tullsen [1997]. “Converting thread-level parallelism into instruction-thread-level parallelism via simultaneous multithreading,”

ACM Trans. on Computer Systems 15:2 (August), 322–354.

Lovett, T., and S. Thakkar [1988]. “The Symmetry multiprocessor system,” Proc. Int’l.

Conf. on Parallel Processing (ICCP), August, The Pennsylvania State University, University Park, Penn., 303–310.

Mellor-Crummey, J. M., and M. L. Scott [1991]. “Algorithms for scalable synchronization on shared-memory multiprocessors,” ACM Trans. on Computer Systems 9:1 (Febru-ary), 21–65.

Menabrea, L. F. [1842]. “Sketch of the analytical engine invented by Charles Babbage,”

Bibliothèque Universelle de Genève, 82 (October).

Metcalfe, R. M. [1993]. “Computer/network interface design: Lessons from Arpanet and Ethernet.” IEEE J. on Selected Areas in Communications 11:2 (February), 173–180.

Metcalfe, R. M., and D. R. Boggs [1976]. “Ethernet: Distributed packet switching for local computer networks,” Communications of the ACM 19:7 (July), 395–404.

Mitchell, D. [1989]. “The Transputer: The time is now,” Computer Design (RISC suppl.), 40–41.

Miya, E. N. [1985]. “Multiprocessor/distributed processing bibliography,” Computer Architecture News 13:1, 27–29.

National Research Council. [1997]. The Evolution of Untethered Communications, Com-puter Science and Telecommunications Board, National Academy Press, Washington, D.C.

Nikhil, R. S., G. M. Papadopoulos, and Arvind [1992]. “*T: A multithreaded massively parallel architecture,” Proc. 19th Annual Int’l. Symposium on Computer Architecture (ISCA), May 19–21, 1992, Gold Coast, Australia, 156–167.

Noordergraaf, L., and R. van der Pas [1999]. “Performance experiences on Sun’s WildFire prototype,” Proc. ACM/IEEE Conf. on Supercomputing, November 13–19, 1999, Portland, Ore.

Partridge, C. [1994]. Gigabit Networking, Addison-Wesley, Reading, Mass.

Pfister, G. F. [1998]. In Search of Clusters, 2nd ed., Prentice Hall, Upper Saddle River, N.J.

Pfister, G. F., W. C. Brantley, D. A. George, S. L. Harvey, W. J. Kleinfekder, K. P. McAuliffe, E. A. Melton, V. A. Norton, and J. Weiss [1985]. “The IBM research parallel processor pro-totype (RP3): Introduction and architecture,” Proc. 12th Annual Int’l. Symposium on Com-puter Architecture (ISCA), June 17–19, 1985, Boston, Mass., 764–771.

Reinhardt, S. K., J. R. Larus, and D. A. Wood [1994]. “Tempest and Typhoon: User-level shared memory,” Proc. 21st Annual Int’l. Symposium on Computer Architecture (IS-CA), April 18–21, 1994, Chicago, 325–336.

Rettberg, R. D., W. R. Crowther, P. P. Carvey, and R. S. Towlinson [1990]. “The Mon-arch parallel processor hardware design,” IEEE Computer 23:4 (April), 18–30.

Rosenblum, M., S. A. Herrod, E. Witchel, and A. Gupta [1995]. “Complete computer sim-ulation: The SimOS approach,” in IEEE Parallel and Distributed Technology (now called Concurrency) 4:3, 34–43.

Saltzer, J. H., D. P. Reed, and D. D. Clark [1984]. “End-to-end arguments in system design,” ACM Trans. on Computer Systems 2:4 (November), 277–288.

Satran, J., D. Smith, K. Meth, C. Sapuntzakis, M. Wakeley, P. Von Stamwitz, R. Haagens, E. Zeidner, L. Dalle Ore, and Y. Klein [2001]. “iSCSI,” IPS Working Group of IETF, Internet draft www.ietf.org/internet-drafts/draft-ietf-ips-iscsi-07.txt.

Saulsbury, A., T. Wilkinson, J. Carter, and A. Landin [1995]. “An argument for Simple COMA,” Proc. First IEEE Symposium on High-Performance Computer Architec-tures, January 22–25, 1995, Raleigh, N.C., 276–285.

Schwartz, J. T. [1980]. “Ultracomputers,” ACM Trans. on Programming Languages and Systems 4:2, 484–521.

Scott, S. L. [1996]. “Synchronization and communication in the T3E multiprocessor,”

Seventh Int’l. Conf. on Architectural Support for Programming Languages and Oper-ating Systems (ASPLOS), October 1–5, 1996, Cambridge, Mass., 26–36.

Scott, S. L., and G. M. Thorson [1996]. “The Cray T3E network: Adaptive routing in a high-performance 3D torus,” Proc. IEEE HOT Interconnects ’96, August 15–17, 1996, Stanford University, Palo Alto, Calif., 14–156.

Seitz, C. L. [1985]. “The Cosmic Cube (concurrent computing),” Communications of the ACM 28:1 (January), 22–33.

Singh, J. P., J. L. Hennessy, and A. Gupta [1993]. “Scaling parallel programs for multipro-cessors: Methodology and examples,” Computer 26:7 (July), 22–33.

Slotnick, D. L., W. C. Borck, and R. C. McReynolds [1962]. “The Solomon computer,”

Proc. AFIPS Fall Joint Computer Conf., December 4–6, 1962, Philadelphia, Penn., 97–107.

Smith, B. J. [1978]. “A pipelined, shared resource MIMD computer,” Proc. Int’l. Conf. on Parallel Processing (ICPP), August, Bellaire, Mich., 6–8.

Soundararajan, V., M. Heinrich, B. Verghese, K. Gharachorloo, A. Gupta, and J. L.

Hennessy [1998]. “Flexible use of memory for replication/migration in cache-coher-ent DSM multiprocessors,” Proc. 25th Annual Int’l. Symposium on Computer Archi-tecture (ISCA), July 3–14, 1998, Barcelona, Spain, 342–355.

Spurgeon, C. [2001]. “Charles Spurgeon’s Ethernet Web site,” wwwhost.ots.utexas.edu/

ethernet/ethernet-home.html.

Stenström, P., T. Joe, and A. Gupta [1992]. “Comparative performance evaluation of cache-coherent NUMA and COMA architectures,” Proc. 19th Annual Int’l. Symposium on Computer Architecture (ISCA), May 19–21, 1992, Gold Coast, Australia, 80–91.

Sterling, T. [2001]. Beowulf PC Cluster Computing with Windows and Beowulf PC Clus-ter Computing with Linux, MIT Press, Cambridge, Mass.

Stevens, W. R. [1994–1996]. TCP/IP Illustrated (three volumes), Addison-Wesley, Read-ing, Mass.

Stone, H. [1991]. High Performance Computers, Addison-Wesley, New York.

Swan, R. J., A. Bechtolsheim, K. W. Lai, and J. K. Ousterhout [1977]. “The implementa-tion of the Cm* multi-microprocessor,” Proc. AFIPS Naimplementa-tional Computing Conf., June 13–16, 1977, Dallas, Tex., 645–654.

Swan, R. J., S. H. Fuller, and D. P. Siewiorek [1977]. “Cm*—a modular, multi-micropro-cessor,” Proc. AFIPS National Computing Conf., June 13–16, 1977, Dallas, Tex., 637–644.

Tanenbaum, A. S. [1988]. Computer Networks, 2nd ed., Prentice Hall, Englewood Cliffs, N.J.

Tang, C. K. [1976]. “Cache design in the tightly coupled multiprocessor system,” Proc.

AFIPS National Computer Conf., June 7–10, 1976, New York, 749–753.

Thacker, C. P., E. M. McCreight, B. W. Lampson, R. F. Sproull, and D. R. Boggs [1982].

“Alto: A personal computer,” in D. P. Siewiorek, C. G. Bell, and A. Newell, eds., Computer Structures: Principles and Examples, McGraw-Hill, New York, 549–572.

Thekkath, R., A. P. Singh, J. P. Singh, S. John, and J. L. Hennessy [1997]. “An evaluation of a commercial CC-NUMA architecture—the CONVEX Exemplar SPP1200,” Proc. 11th Int’l. Parallel Processing Symposium (IPPS), April 1–7, 1997, Geneva, Switzerland.

Tullsen, D. M., S. J. Eggers, J. S. Emer, H. M. Levy, J. L. Lo, and R. L. Stamm [1996].

“Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor,” Proc. 23rd Annual Int’l. Symposium on Computer Archi-tecture (ISCA), May 22–24, 1996, Philadelphia, Penn., 191–202.

Tullsen, D. M., S. J. Eggers, and H. M. Levy [1995]. “Simultaneous multithreading: Max-imizing on-chip parallelism,” Proc. 22nd Annual Int’l. Symposium on Computer Ar-chitecture (ISCA), June 22–24, 1995, Santa Margherita, Italy, 392–403.

Unger, S. H. [1958]. “A computer oriented towards spatial problems,” Proc. Institute of Radio Engineers 46:10 (October), 1744–1750.

Walrand, J. [1991]. Communication Networks: A First Course, Aksen Associates: Irwin, Homewood, Ill.

Wilson, A. W., Jr. [1987]. “Hierarchical cache/bus architecture for shared-memory multi-processors,” Proc. 14th Annual Int’l. Symposium on Computer Architecture (ISCA), June 2–5, 1987, Pittsburgh, Penn., 244–252.

Wolfe, A., and J. P. Shen [1991]. “A variable instruction stream extension to the VLIW architecture.” Proc. Fourth Int’l. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), April 8–11, 1991, Palo Alto, Calif., 2–14.

Wood, D. A., and M. D. Hill [1995]. “Cost-effective parallel computing,” IEEE Computer 28:2 (February), 69–72.

Wulf, W., and C. G. Bell [1972]. “C.mmp—A multi-mini-processor,” Proc. AFIPS Fall Joint Computer Conf., December 5–7, 1972, Anaheim, Calif., 765–777.

Wulf, W., and S. P. Harbison [1978]. “Reflections in a pool of processors—an experience report on C.mmp/Hydra,” Proc. AFIPS National Computing Conf. June 5–8, 1978, Anaheim, Calif., 939–951.

Yamamoto, W., M. J. Serrano, A. R. Talcott, R. C. Wood, and M. Nemirosky [1994].

“Performance estimation of multistreamed, superscalar processors,” Proc. 27th Hawaii Int’l. Conf. on System Sciences, January 4–7, 1994, Wailea, 195–204.

In this section, we cover the development of clusters that were the foundation of warehouse-scale computers (WSCs) and of utility computing. (Readers inter-ested in learning more should start with Barroso and Hölzle [2009] and the blog postings and talks of James Hamilton at http://perspectives.mvdirona.com.)

在文檔中 Parallelism (Chapter 3 and Appendices C and H) L-26 L.6 The Development of SIMD Supercomputers, Vector (頁 63-71)