• 沒有找到結果。

The ability to simplify means to eliminate the unnecessary so that the necessary may speak.

Hans Hoffman Search for the Real (1967)

Fallacy It is possible to design a flawless architecture.

All architecture design involves trade-offs made in the context of a set of hardware and software technologies. Over time those technologies are likely to change, and decisions that may have been correct at one time later look like mistakes. For exam-ple, in 1975 the VAX designers overemphasized the importance of code size effi-ciency and underestimated how important ease of decoding and pipelining would be 10 years later. And, almost all architectures eventually succumb to the lack of sufficient address space. Avoiding these problems in the long run, however, would probably mean compromising the efficiency of the architecture in the short run.

Fallacy An architecture with flaws cannot be successful.

The IBM 360 is often criticized in the literature—the branches are not PC-relative, and the address is too small in displacement addressing. Yet, the machine has been an enormous success because it correctly handled several new problems. First, the architecture has a large amount of address space. Second, it is byte addressed and handles bytes well. Third, it is a general-purpose register machine. Finally, it is sim-ple enough to be efficiently imsim-plemented across a wide performance and cost range.

The Intel 8086 provides an even more dramatic example. The 8086 architecture is the only widespread architecture in existence today that is not truly a general-purpose register machine. Furthermore, the segmented address space of the 8086 causes major problems for both programmers and compiler writers. Never-theless, the 8086 architecture—because of its selection as the microprocessor in the IBM PC—has been enormously successful.

sort: addi $29,$29, –36 sw $15, 0($29) sw $16, 4($29) sw $17, 8($29) sw $18,12($29) sw $19,16($29) sw $20,20($29) sw $24,24($29) sw $25,28($29) sw $31,32($29)

sort: .word ^m<r2,r3,r4,r5,r6,r7>

Procedure body

Move parameters move $18, $4

move $20, $5

moval r7,8(ap) moval r5,4(ap)

Outer loop add $19, $0, $0

for1tst: slt $8, $19, $20 beq $8, $0, exit1

clrl r6

for1tst: cmpl r6,(r7) bgeq exit1

$17, $19, –1 p addi

o o l r e n n I

for2tst: slti $8, $17, 0 bne $8, $0, exit2 muli $15, $17, 4 add $16, $18, $15 lw $24, 0($16) lw $25, 4($16) slt $8, $25, $24 beq $8, $0, exit2

for2tst:

subl3 r4,r6,#1

blss exit2

movl r3,(r5)

addl3 r2,r4,#1

cmpl (r3)[r4],(r3)[r2]

bleq exit2

Pass parameters and call

move $4, $18 move $5, $17 jal swap

pushl (r5) pushl r4 calls #2,swap p

o o l r e n n I

Outer loop exit2: addi $19, $19, 1

$17, $17, –1 addi

j for2tst

decl r4

brb for2tst

j for1tst

exit2: incl r6

brb for1tst

Restoring registers

exit1: lw $15,0($29)

lw $16, 4($29) lw $17, 8($29) lw $18,12($29) lw $19,16($29) lw $20,20($29) lw $24,24($29) lw $25,28($29) lw $31,32($29) addi $29,$29, 36

Procedure return

ret exit1:

$31 jr

Figure K.56 MIPS32 versus VAX assembly version of procedure sort in Figure K.55 on page K-33.

Fallacy The architecture that executes fewer instructions is faster.

Designers of VAX machines performed a quantitative comparison of VAX and MIPS for implementations with comparable organizations, the VAX 8700 and the MIPS M2000. Figure K.57 shows the ratio of the number of instructions exe-cuted and the ratio of performance measured in clock cycles. MIPS executes about twice as many instructions as the VAX while the MIPS M2000 has almost three times the performance of the VAX 8700.

Concluding Remarks

The Virtual Address eXtension of the PDP-11 architecture… provides a virtual address of about 4.3 gigabytes which, even given the rapid improvement of mem-ory technology, should be adequate far into the future.

William Strecker

“VAX-11/780—A Virtual Address Extension to the PDP-11 Family,”

AFIPS Proc., National Computer Conference (1978) We have seen that instruction sets can vary quite dramatically, both in how they access operands and in the operations that can be performed by a single instruction.

Figure K.58 compares instruction usage for both architectures for two programs;

even very different architectures behave similarly in their use of instruction classes.

3 3.5 4

2.5

2

1.5

1

0.5

0 spice matrix nasa7 fpppp

Instructions executed Performance

tomcatv doduc espresso eqntott li

MIPS/VAX

Number of bits of displacement

Figure K.57 Ratio of MIPS M2000 to VAX 8700 in instructions executed and perfor-mance in clock cycles using SPEC89 programs. On average, MIPS executes a little over twice as many instructions as the VAX, but the CPI for the VAX is almost six times the MIPS CPI, yielding almost a threefold performance advantage. (Based on data from “Per-formance from Architecture: Comparing a RISC and CISC with Similar Hardware Orga-nization,” by D. Bhandarkar and D. Clark, in Proc. Symp. Architectural Support for Programming Languages and Operating Systems IV, 1991.)

A product of its time, the VAX emphasis on code density and complex oper-ations and addressing modes conflicts with the current emphasis on easy decoding, simple operations and addressing modes, and pipelined performance.

With more than 600,000 sold, the VAX architecture has had a very successful run. In 1991, DEC made the transition from VAX to Alpha.

Orthogonality is key to the VAX architecture; the opcode is independent of the addressing modes, which are independent of the data types and even the number of unique operands. Thus, a few hundred operations expand to hundreds of thousands of instructions when accounting for the data types, operand counts, and addressing modes.

Exercises

K.1 [3]<K.4> The following VAX instruction decrements the location pointed to be register r5:

decl (r5)

What is the single MIPS instruction, or if it cannot be represented in a single instruction, the shortest sequence of MIPS instructions, that performs the same operation? What are the lengths of the instructions on each machine?

K.2 [5]<K.4> This exercise is the same as Exercise K.1, except this VAX instruction clears a location using autoincrement deferred addressing:

clrl@(r5)+

K.3 [5]<K.4> This exercise is the same as Exercise K.1, except this VAX instruction adds 1 to register r5, placing the sum back in register r5, compares the sum to reg-ister r6, and then branches to L1 if r5< r6:

aoblss r6, r5, L1 # r5 = r5 + 1; if (r5< r6) goto L1.

K.4 [5]<K.4> Show the single VAX instruction, or minimal sequence of instructions, for this C statement:

a = b + 100;

Assume a corresponds to register r3 and b corresponds to register r4.

K.5 [10]<K.4> Show the single VAX instruction, or minimal sequence of instruc-tions, for this C statement:

x[i + 1] = x[i] + c;

Assume c corresponds to register r3, i to register r4, and x is an array of 32-bit words beginning at memory location 4,000,000ten.

Program Machine Branch

Arithmetic/

logical

Data transfer

Floating

point Totals

gcc VAX 30% 40% 19% 89%

MIPS 24% 35% 27% 86%

spice VAX 18% 23% 15% 23% 79%

MIPS 04% 29% 35% 15% 83%

Figure K.58 The frequency of instruction distribution for two programs on VAX and MIPS.

K.5 The IBM 360/370 Architecture for Mainframe Computers Introduction

The term“computer architecture” was coined by IBM in 1964 for use with the IBM 360. Amdahl, Blaauw, and Brooks [1964] used the term to refer to the programmer-visible portion of the instruction set. They believed that a family of machines of the same architecture should be able to run the same software. Although this idea may seem obvious to us today, it was quite novel at the time. IBM, even though it was the leading company in the industry, had five different architectures before the 360. Thus, the notion of a company standardizing on a single architecture was a radical one. The 360 designers hoped that six different divisions of IBM could be brought together by defining a common architecture. Their definition of architecture was

… the structure of a computer that a machine language programmer must understand to write a correct (timing independent) program for that machine.

The term “machine language programmer” meant that compatibility would hold, even in assembly language, while “timing independent” allowed different implementations.

The IBM 360 was introduced in 1964 with six models and a 25:1 performance ratio. Amdahl, Blaauw, and Brooks [1964] discussed the architecture of the IBM 360 and the concept of permitting multiple object-code-compatible implementa-tions. The notion of an instruction set architecture as we understand it today was the most important aspect of the 360. The architecture also introduced several important innovations, now in wide use:

1. 32-bit architecture

2. Byte-addressable memory with 8-bit bytes 3. 8-, 16-, 32-, and 64-bit data sizes

4. 32-bit single-precision and 64-bit double-precision floating-point data In 1971, IBM shipped the first System/370 (models 155 and 165), which included a number of significant extensions of the 360, as discussed by Case and Padegs [1978], who also discussed the early history of System/360. The most important addi-tion was virtual memory, though virtual memory 370 s did not ship until 1972, when a virtual memory operating system was ready. By 1978, the high-end 370 was several hundred times faster than the low-end 360 s shipped 10 years earlier. In 1984, the 24-bit addressing model built into the IBM 360 needed to be abandoned, and the 370-XA (eXtended Architecture) was introduced. While old 24-bit programs could be sup-ported without change, several instructions could not function in the same manner when extended to a 32-bit addressing model (31-bit addresses supported) because they would not produce 31-bit addresses. Converting the operating system, which was written mostly in assembly language, was no doubt the biggest task.

Several studies of the IBM 360 and instruction measurement have been made.

Shustek’s thesis [1978] is the best known and most complete study of the 360/370 architecture. He made several observations about instruction set complexity that

were not fully appreciated until some years later. Another important study of the 360 is the Toronto study by Alexander and Wortman [1975] done on an IBM 360 using 19 XPL programs.