Z Stop the machine and ring the warning bell
2.2 Classifying Instruction Set Architectures
FIGURE 2.1 Operand locations for four instruction set architecture classes. The arrows indicate whether the operand is an input or the result of the ALU operation, or both an input and result. Lighter shades indicate inputs and the dark shade indicates the result. In (a), a Top Of Stack register (TOS), points to the top input operand, which is combined with the oper-and below. The first operoper-and is removed from the stack, the result takes the place of the second operoper-and, oper-and TOS is up-dated to point to the result. All operands are implicit. In (b), the Accumulator is both an implicit input operand and a result.
In (c) one input operand is a register, one is in memory, and the result goes to a register. All operands are registers in (d), and, like the stack architecture, can be transferred to memory only via separate instructions: push or pop for (a) and load or store for (d).
Stack Accumulator
Register
(register-memory)
Register (load-store)
Push A Load A Load R1,A Load R1,A
Push B Add B Add R3,R1,B Load R2,B
Add Store C Store R3,C Add R3,R1,R2
Pop C Store R3,C
FIGURE 2.2 The code sequence for C = A + B for four classes of instruction sets. Note that the Add instruction has implicit operands for stack and accumulator architectures, and explicit operands for register architectures. It is assumed that A, B, and C all belong in memory and that the values of A and B cannot be destroyed. Figure 2.1 shows the Add operation for each class of architecture.
(a) Stack (b) Accumulator (c) Register-Memory
TOS
ALU Processor
Memory
...
...
...
ALU
...
...
ALU
...
...
...
...
(d) Register-Register /Load-Store
ALU
...
...
...
...
puters shipping today, keeps all operands in memory and is called a memory-memory architecture. Some instruction set architectures have more registers than a single accumulator, but place restrictions on uses of these special registers.
Such an architecture is sometimes called an extended accumulator or special-purpose register computer.
Although most early computers used stack or accumulator-style architectures, virtually every new architecture designed after 1980 uses a load-store register ar-chitecture. The major reasons for the emergence of general-purpose register (GPR) computers are twofold. First, registers—like other forms of storage inter-nal to the processor—are faster than memory. Second, registers are more efficient for a compiler to use than other forms of internal storage. For example, on a reg-ister computer the expression (A*B) – (B*C) – (A*D) may be evaluated by doing the multiplications in any order, which may be more efficient because of the loca-tion of the operands or because of pipelining concerns (see Chapter 3). Neverthe-less, on a stack computer the hardware must evaluate the expression in only one order, since operands are hidden on the stack, and it may have to load an operand multiple times.
More importantly, registers can be used to hold variables. When variables are allocated to registers, the memory traffic reduces, the program speeds up (since registers are faster than memory), and the code density improves (since a register can be named with fewer bits than can a memory location).
As explained in section 2.11, compiler writers would prefer that all registers be equivalent and unreserved. Older computers compromise this desire by dedi-cating registers to special uses, effectively decreasing the number of general-pur-pose registers. If the number of truly general-purgeneral-pur-pose registers is too small, trying to allocate variables to registers will not be profitable. Instead, the compiler will reserve all the uncommitted registers for use in expression evaluation. The domi-nance of hand-optimized code in the DSP community has lead to DSPs with many special-purpose registers and few general-purpose registers.
How many registers are sufficient? The answer, of course, depends on the ef-fectiveness of the compiler. Most compilers reserve some registers for expression evaluation, use some for parameter passing, and allow the remainder to be allo-cated to hold variables. Just as people tend to be bigger than their parents, new in-struction set architectures tend to have more registers than their ancestors.
Two major instruction set characteristics divide GPR architectures. Both char-acteristics concern the nature of operands for a typical arithmetic or logical in-struction (ALU inin-struction). The first concerns whether an ALU inin-struction has two or three operands. In the three-operand format, the instruction contains one re-sult operand and two source operands. In the two-operand format, one of the oper-ands is both a source and a result for the operation. The second distinction among GPR architectures concerns how many of the operands may be memory addresses in ALU instructions. The number of memory operands supported by a typical ALU instruction may vary from none to three. Figure 2.3 shows combinations of these two attributes with examples of computers. Although there are seven
possi-ble combinations, three serve to classify nearly all existing computers. As we mentioned earlier, these three are register (also called load-store), register-memory, and memory-memory.
Figure 2.4 shows the advantages and disadvantages of each of these alterna-tives. Of course, these advantages and disadvantages are not absolutes: They are qualitative and their actual impact depends on the compiler and implementation strategy. A GPR computer with memory-memory operations could easily be ig-Number of
mem-ory addresses Maximum number of operands allowed
Alpha, ARM, MIPS, PowerPC, SPARC, SuperH, Trimedia TM5200
1 2
Register-memory
IBM 360/370, Intel 80x86, Motorola 68000, TI TMS320C54x
FIGURE 2.3 Typical combinations of memory operands and total operands per typical ALU instruction with exam-ples of computers. Computers with no memory reference per ALU instruction are called load-store or register-register computers. Instructions with multiple memory operands per typical ALU instruction are called register-memory or memory-memory, according to whether they have one or more than one memory operand.
Type Advantages Disadvantages
Register-register (0,3)
Simple, fixed-length instruction encoding. Simple code-generation model. Instructions take similar numbers of clocks to execute (see App. A).
Higher instruction count than architectures with memory references in instructions. More instruc-tions and lower instruction density leads to larger programs.
Register-memory (1,2)
Data can be accessed without a separate load in-struction first. Inin-struction format tends to be easy to encode and yields good density.
Operands are not equivalent since a source oper-and in a binary operation is destroyed. Encoding a register number and a memory address in each instruction may restrict the number of registers.
Clocks per instruction vary by operand location.
Memory-memory (2,2) or (3,3)
Most compact. Doesn’t waste registers for temporaries.
Large variation in instruction size, especially for three-operand instructions. In addition, large vari-ation in work per instruction. Memory accesses create memory bottleneck. (Not used today.) FIGURE 2.4 Advantages and disadvantages of the three most common types of general-purpose register comput-ers. The notation (m, n) means m memory operands and n total operands. In general, computers with fewer alternatives simplify the compiler’s task since there are fewer decisions for the compiler to make (see section 2.11). Computers with a wide variety of flexible instruction formats reduce the number of bits required to encode the program. The number of registers also affects the instruction size since you need log2 (number of registers) for each register specifier in an instruction. Thus, doubling the number of registers takes 3 extra bits for a register-register architecture, or about 10% of a 32-bit instruction.
nored by the compiler and used as a register-register computer. One of the most pervasive architectural impacts is on instruction encoding and the number of in-structions needed to perform a task. We will see the impact of these architectural alternatives on implementation approaches in Chapters 3 and 4.
Summary: Classifying Instruction Set Architectures
Here and at the end of sections 2.3 to 2.11 we summarize those characteristics we would expect to find in a new instruction set architecture, building the foundation for the MIPS architecture introduced in section 2.12. From this section we should clearly expect the use of general-purpose registers. Figure 2.4, combined with Appendix A on pipelining, lead to the expectation of a register-register (also called load-store) version of a general-purpose register architecture.
With the class of architecture covered, the next topic is addressing operands.
Independent of whether the architecture is register-register or allows any operand to be a memory reference, it must define how memory addresses are interpreted and how they are specified. The measurements presented here are largely, but not completely, computer independent. In some cases the measurements are signifi-cantly affected by the compiler technology. These measurements have been made using an optimizing compiler, since compiler technology plays a critical role.
Interpreting Memory Addresses
How is a memory address interpreted? That is, what object is accessed as a function of the address and the length? All the instruction sets discussed in this book––except some DSPs––are byte addressed and provide access for bytes (8 bits), half words (16 bits), and words (32 bits). Most of the computers also pro-vide access for double words (64 bits).
There are two different conventions for ordering the bytes within a larger ob-ject. Little Endian byte order puts the byte whose address is “x...x000” at the least-significant position in the double word (the little end). The bytes are num-bered:
Big Endian byte order puts the byte whose address is “x...x000” at the most-sig-nificant position in the double word (the big end). The bytes are numbered: