Putting It All Together: Measurements of Instruction Set Usage

In this section, we present detailed measurements for the 80x86 and then compare the measurements to MIPS for the same programs. To facilitate comparisons among dynamic instruction set measurements, we use a subset of the SPEC92 pro-grams. The 80x86 results were taken in 1994 using the Sun Solaris FORTRAN and C compilers V2.0 and executed in 32-bit mode. These compilers were comparable in quality to the compilers used for MIPS.

Remember that these measurements depend on the benchmarks chosen and the compiler technology used. Although we feel that the measurements in this section are reasonably indicative of the usage of these architectures, other programs may behave differently from any of the benchmarks here, and different compilers may yield different results. In doing a real instruction set study, the architect would want to have a much larger set of benchmarks, spanning as wide an application range as possible, and consider the operating system and its usage of the instruction set.

Single-user benchmarks like those measured here do not necessarily behave in the same fashion as the operating system.

We start with an evaluation of the features of the 80x86 in isolation, and later compare instruction counts with those of DLX.

Measurements of 80x86 Operand Addressing

We start with addressing modes. Figure K.40 shows the distribution of the operand types in the 80x86. These measurements cover the“second” operand of the oper-ation; for example,

mov EAX, [45]

counts as a single memory operand. If the types of the first operand were counted, the percentage of register usage would increase by about a factor of 1.5.

The 80x86 memory operands are divided into their respective addressing modes in Figure K.41. Probably the biggest surprise is the popularity of the

Integer average FP average

% 5 4 r

e t s i g e R

% 6 1 e

t a i d e m m I

% 2 2

% 6

% 2 7

% 9 3 y

r o m e M

Figure K.40 Operand type distribution for the average of five SPECint92 programs (compress, eqntott, espresso, gcc, li) and the average of five SPECfp92 programs (doduc, ear, hydro2d, mdljdp2, su2cor).

addressing modes added by the 80386, the last four rows of the figure. They account for about half of all the memory accesses. Another surprise is the popu-larity of direct addressing. On most other machines, the equivalent of the direct addressing mode is rare. Perhaps the segmented address space of the 80x86 makes direct addressing more useful, since the address is relative to a base address from the segment register.

These addressing modes largely determine the size of the Intel instructions.

Figure K.42 shows the distribution of instruction sizes. The average number of bytes per instruction for integer programs is 2.8, with a standard deviation of 1.5, and 4.1 with a standard deviation of 1.9 for floating-point programs. The dif-ference in length arises partly from the difdif-ferences in the addressing modes: Integer programs rely more on the shorter register indirect and 8-bit displacement sing modes, while floating-point programs more frequently use the 80386 addres-sing modes with the longer 32-bit displacements.

Given that the floating-point instructions have aspects of both stacks and reg-isters, how are they used? Figure K.43 shows that, at least for the compilers used in this measurement, the stack model of execution is rarely followed. (See Section L.3 for a historical explanation of this observation.)

Finally, Figures K.44 and K.45 show the instruction mixes for 10 SPEC92 programs.

Comparative Operation Measurements

Figures K.46 and K.47 show the number of instructions executed for each of the 10 programs on the 80x86 and the ratio of instruction execution compared with that

Addressing mode Integer average FP average

Base + scaled indexed + 8-bit disp. 0%

Base + scaled indexed + 32-bit disp. 4%

Figure K.41 Operand addressing mode distribution by program. This chart does not include addressing modes used by branches or control instructions.

for DLX: Numbers less than 1.0 mean that the 80x86 executes fewer instructions than DLX. The instruction count is surprisingly close to DLX for many integer programs, as you would expect a load-store instruction set architecture like DLX to execute more instructions than a register-memory architecture like the 80x86. The floating-point programs always have higher counts for the 80x86, doduc ear hydro2d mdljdp2 su2cor FP average Stack (2nd operand ST (1

)) 1.1% 0.0% 0.0% 0.2% 0.6% 0.4%

> 1) 17.3% 63.4% 14.2% 7.1% 30.7% 26.5%

% 1 . 3 7

% 7 . 8 6

% 7 . 2 9

% 8 . 5 8

% 6 . 6 3

% 6 . 1 8 y

r o m e M Option

Figure K.43 The percentage of instructions for the floating-point operations (add, sub, mul, div) that use each of the three options for specifying a floating-point operand on the 80x86. The three options are (1) the strict stack model of implicit operands on the stack, (2) register version naming an explicit operand that is not one of the top two elements of the stack, and (3) memory operand.

Percentage of instructions at each length

Instruction lengths

1 0%

Floating-point average Integer average

39%

18%

25%

19%

40%

10%

14%

0% 20% 40% 60%

Figure K.42 Averages of the histograms of 80x86 instruction lengths for five SPE-Cint92 programs and for five SPECfp92 programs, all running in 32-bit mode.

presumably due to the lack of floating-point registers and the use of a stack architecture.

Another question is the total amount of data traffic for the 80x86 versus DLX, since the 80x86 can specify memory operands as part of operations while DLX can only access via loads and stores. Figures K.46 and K.47 also show the data reads, data writes, and data read-modify-writes for these 10 programs. The total

Instruction doduc ear hydro2d mdljdp2 su2cor FP average

Figure K.44 80x86 instruction mix for five SPECfp92 programs.

accesses ratio to DLX of each memory access type is shown in the bottom rows, with the read-modify-write counting as one read and one write. The 80x86 performs about two to four times as many data accesses as DLX for floating-point programs, and 1.25 times as many for integer programs. Finally, Figure K.48 shows the percentage of instructions in each category for 80x86 and DLX.

Instruction compress eqntott espresso gcc (cc1) li Int. average

Compare 8.2% 27.7% 15.3% 13.5% 7.7% 16%

Mov reg-reg 7.9% 0.6% 5.0% 4.2% 7.8% 4%

Cond. branch 15.5% 28.6% 18.9% 17.4% 15.4% 20%

Uncond. branch 1.2% 0.2% 0.9% 2.2% 2.2% 1%

Return, jmp indirect 0.5% 0.4% 0.7% 1.5% 3.2% 1%

Compare FP 0%

Mov reg-reg FP 0%

Other (abs, sqrt, . . .) 0%

Figure K.45 80x86 instruction mix for five SPECint92 programs.

Concluding Remarks

Beauty is in the eye of the beholder.

Old Adage As we have seen,“orthogonal” is not a term found in the Intel architectural dictio-nary. To fully understand which registers and which addressing modes are avail-able, you need to see the encoding of all addressing modes and sometimes the encoding of the instructions.

compress eqntott espresso gcc (cc1) li Int. avg.

Instructions executed on 80x86 (millions) 2226 1203 2216 3770 5020

Instructions executed ratio to DLX 0.61 1.74 0.85 0.96 0.98 1.03

Data reads on 80x86 (millions) 589 229 622 1079 1459

Data writes on 80x86 (millions) 311 39 191 661 981

Data read-modify-writes on 80x86 (millions) 26 1 129 48 48

Total data reads on 80x86 (millions) 615 230 751 1127 1507

Total data writes on 80x86 (millions) 338 40 319 709 1029

Total data accesses on 80x86 (millions) 953 269 1070 1836 2536

Figure K.46 Instructions executed and data accesses on 80x86 and ratios compared to DLX for five SPECint92 programs.

doduc ear hydro2d mdljdp2 su2cor FP average Instructions executed on 80x86 (millions) 1223 15,220 13,342 6197 6197

Instructions executed ratio to DLX 1.19 1.19 2.53 2.09 1.62 1.73

Data reads on 80x86 (millions) 515 6007 5501 3696 3643

Data writes on 80x86 (millions) 260 2205 2085 892 892

Data read-modify-writes on 80x86 (millions) 1 0 189 124 124

Total data reads on 80x86 (millions) 517 6007 5690 3820 3767

Total data writes on 80x86 (millions) 261 2205 2274 1015 1015

D .68 33.25 38.74 16.74 9.35 20.35

Total data accesses on 80x86 (millions) 778 8212 7965 4835 4782

Figure K.47 Instructions executed and data accesses for five SPECfp92 programs on 80x86 and ratio to DLX.

Some argue that the inelegance of the 80x86 instruction set is unavoidable, the price that must be paid for rampant success by any architecture. We reject that notion. Obviously, no successful architecture can jettison features that were added in previous implementations, and over time some features may be seen as unde-sirable. The awkwardness of the 80x86 began at its core with the 8086 instruction set and was exacerbated by the architecturally inconsistent expansions of the 8087, 80286, and 80386.

A counterexample is the IBM 360/370 architecture, which is much older than the 80x86. It dominates the mainframe market just as the 80x86 dominates the PC market. Due undoubtedly to a better base and more compatible enhancements, this instruction set makes much more sense than the 80x86 more than 30 years after its first implementation.

For better or worse, Intel had a 16-bit microprocessor years before its compet-itors’ more elegant architectures, and this head start led to the selection of the 8086 as the CPU for the IBM PC. What it lacks in style is made up in quantity, making the 80x86 beautiful from the right perspective.

The saving grace of the 80x86 is that its architectural components are not too difficult to implement, as Intel has demonstrated by rapidly improving perfor-mance of integer programs since 1978. High floating-point perforperfor-mance is a larger challenge in this architecture.

K.4 The VAX Architecture

VAX: the most successful minicomputer design in industry history . . . the VAX was probably the hacker’s favorite machine . . . . Especially noted for its large, assembler-programmer-friendly instruction set—an asset that became a liability after the RISC revolution.

Eric Raymond The New Hacker’s Dictionary (1991) Integer average FP average

Category x86 DLX x86 DLX

Total data transfer 34% 36% 28% 2%

Total integer arithmetic 34% 31% 16% 12%

Total control 24% 20% 6% 10%

Total logical 8% 13% 3% 2%

Total FP data transfer 0% 0% 22% 33%

Total FP arithmetic 0% 0% 25% 41%

Figure K.48 Percentage of instructions executed by category for 80x86 and DLX for the averages of five SPECint92 and SPECfp92 programs of Figures K.46 and K.47.

Introduction

To enhance your understanding of instruction set architectures, we chose the VAX as the representative Complex Instruction Set Computer (CISC) because it is so differ-ent from MIPS and yet still easy to understand. By seeing two such divergdiffer-ent styles, we are confident that you will be able to learn other instruction sets on your own.

At the time the VAX was designed, the prevailing philosophy was to create instruction sets that were close to programming languages in order to simplify compilers. For example, because programming languages had loops, instruction sets should have loop instructions. As VAX architect William Strecker said (“VAX-11/780—A Virtual Address Extension to the PDP-11 Family,” AFIPS Proc., National Computer Conference, 1978):

A major goal of the VAX-11 instruction set was to provide for effective compiler generated code. Four decisions helped to realize this goal: 1) A very regular and consistent treatment of operators . . . . 2) An avoidance of instructions unlikely to be generated by a compiler . . . . 3) Inclusions of several forms of common operators . . . . 4) Replacement of common instruction sequences with single instructions . . . . Examples include procedure calling, multiway branching, loop control, and array subscript calculation.

Recall that DRAMs of the mid-1970s contained less than 1/1000th the capacity of today’s DRAMs, so code space was also critical. Hence, another prevailing phi-losophy was to minimize code size, which is de-emphasized in fixed-length instruction sets like MIPS. For example, MIPS address fields always use 16 bits, even when the address is very small. In contrast, the VAX allows instructions to be a variable number of bytes, so there is little wasted space in address fields.

Whole books have been written just about the VAX, so this VAX extension cannot be exhaustive. Hence, the following sections describe only a few of its addressing modes and instructions. To show the VAX instructions in action, later sections show VAX assembly code for two C procedures. The general style will be to contrast these instructions with the MIPS code that you are already familiar with.

The differing goals for VAX and MIPS have led to very different architectures.

The VAX goals, simple compilers and code density, led to the powerful addressing modes, powerful instructions, and efficient instruction encoding. The MIPS goals were high performance via pipelining, ease of hardware implementation, and com-patibility with highly optimizing compilers. The MIPS goals led to simple instruc-tions, simple addressing modes, fixed-length instruction formats, and a large number of registers.

在文檔中 The 80x86 remains the highest dollar-volume ISA, dominating the desktop and the much of the server market (頁 45-52)