Z Stop the machine and ring the warning bell
2.12 Putting It All Together: The MIPS Architecture
MIPS provides a good architectural model for study, not only because of the pop-ularity of this type of processor (see Chapter 1), but also because it is an easy ar-chitecture to understand. We will use this arar-chitecture again in Chapters 3 and 4, and it forms the basis for a number of exercises and programming projects.
In the 15 years since the first MIPS processor, there have been many versions of MIPS (see Appendix B <RISC>). We will use a subset of what is now called MIPS64, which will often abbreviate to just MIPS, but the full instruction set is found in Appendix B.
Registers for MIPS
MIPS64 has 32 64-bit general-purpose registers (GPRs), named R0, R1, …, R31.
GPRs are also sometimes known as integer registers. Additionally, there is a set of 32 floating-point registers (FPRs), named F0, F1, ..., F31, which can hold 32 single-precision (32-bit) values or 32 double-precision (64-bit) values. (When holding one single-precision number, the other half of the FPR is unused.) Both single- and double-precision floating-point operations (32-bit and 64-bit) are pro-vided. MIPS also includes instructions that operate on two single precision oper-ands in a single 64-bit floating-point register.
The value of R0 is always 0. We shall see later how we can use this register to synthesize a variety of useful operations from a simple instruction set.
A few special registers can be transferred to and from the general-purpose reg-isters. An example is the floating-point status register, used to hold information about the results of floating-point operations. There are also instructions for mov-ing between a FPR and a GPR.
Data types for MIPS
The data types are 8-bit bytes, 16-bit half words, 32-bit words, and 64-bit double words for integer data and 32-bit single precision and 64-bit double precision for floating point. Half words were added because they are found in languages like C and popular in some programs, such as the operating systems, concerned about size of data structures. They will also become more popular if Unicode becomes widely used. Single-precision floating-point operands were added for similar rea-sons. (Remember the early warning that you should measure many more programs before designing an instruction set.)
The MIPS64 operations work on 64-bit integers and 32- or 64-bit floating point. Bytes, half words, and words are loaded into the general-purpose registers with either zeros or the sign bit replicated to fill the 32 bits of the GPRs. Once loaded, they are operated on with the 64-bit integer operations.
Addressing modes for MIPS data transfers
The only data addressing modes are immediate and displacement, both with 16-bit fields. Register indirect is accomplished simply by placing 0 in the 16-16-bit dis-placement field, and absolute addressing with a 16-bit field is accomplished by using register 0 as the base register. Embracing zero gives us four effective modes, although only two are supported in the architecture.
MIPS memory is byte addressable in Big Endian mode with a 64-bit address.
As it is a load-store architecture, all references between memory and either GPRs or FPRs are through loads or stores. Supporting the data types mentioned above, memory accesses involving GPRs can be to a byte, half word, word, or double word. The FPRs may be loaded and stored with single-precision or double-preci-sion numbers. All memory accesses must be aligned.
MIPS Instruction Format
Since MIPS has just two addressing modes, these can be encoded into the op-code. Following the advice on making the processor easy to pipeline and decode,
FIGURE 2.27 Instruction layout for MIPS. All instructions are encoded in one of three types, with common fields in the same location in each format.
I-type instruction
rs rt Immediate
Encodes: Loads and stores of bytes, half words, words, double words. All immediates (rt rs op immediate)
6 5 5 16
Conditional branch instructions (rs is register, rd unused) Jump register, jump and link register
(rd = 0, rs = destination, immediate = 0) R-type instruction
rs rt shamt
Register—register ALU operations: rd rs funct rt Function encodes the data path operation: Add, Sub, . . . Read/write special registers and moves
6 5 5 5 5 6
funct
Opcode J-type instruction
Offset added to PC
6 26
Jump and jump and link Trap and return from exception Opcode
Opcode rd
—
—
all instructions are 32 bits with a 6-bit primary opcode. Figure 2.27 shows the in-struction layout. These formats are simple while providing 16-bit fields for dis-placement addressing, immediate constants, or PC-relative branch addresses.
Appendix B shows a variant of MIPS––called MIPS16––which has 16-bit and 32-bit instructions to improve code density for embedded applications. We will stick to the traditional 32-bit format in this book.
MIPS Operations
MIPS supports the list of simple operations recommended above plus a few oth-ers. There are four broad classes of instructions: loads and stores, ALU opera-tions, branches and jumps, and floating-point operations.
Any of the general-purpose or floating-point registers may be loaded or stored, except that loading R0 has no effect. Figure 2.28 gives examples of the load and store instructions. Single-precision floating-point numbers occupy half a floating-point register. Conversions between single and double precision must be done ex-plicitly. The floating-point format is IEEE 754 (see Appendix G). A list of the all the MIPS instructions in our subset appears in Figure 2.31 (page 146).
Example instruction Instruction name Meaning
LD R1,30(R2) Load double word Regs[R1]←64 Mem[30+Regs[R2]]
LD R1,1000(R0) Load double word Regs[R1]←64 Mem[1000+0]
LW R1,60(R2) Load word Regs[R1]←64 (Mem[60+Regs[R2]]0)32 ##
Mem[60+Regs[R2]]
LB R1,40(R3) Load byte Regs[R1]←64 (Mem[40+Regs[R3]] 0)56 ##
Mem[40+Regs[R3]]
LBU R1,40(R3) Load byte unsigned Regs[R1]←64 056 ## Mem[40+Regs[R3]]
LH R1,40(R3) Load half word Regs[R1]←64 (Mem[40+Regs[R3]]0)48 ##
Mem[40+Regs[R3]]##Mem[41+Regs[R3]]
L.S F0,50(R3) Load FP single Regs[F0]←64 Mem[50+Regs[R3]] ## 032 L.D F0,50(R2) Load FP double Regs[F0]←64 Mem[50+Regs[R2]]
SD R3,500(R4) Store double word Mem[500+Regs[R4]]←64Regs[R3]
SW R3,500(R4) Store word Mem[500+Regs[R4]]←32Regs[R3]
S.S F0,40(R3) Store FP single Mem[40+Regs[R3]]←32 Regs[F0]0..31
S.D F0,40(R3) Store FP double Mem[40+Regs[R3]]←64 Regs[F0]
SH R3,502(R2) Store half Mem[502+Regs[R2]]←16Regs[R3]48..63
SB R2,41(R3) Store byte Mem[41+Regs[R3]]←8 Regs[R2]56..63
FIGURE 2.28 The load and store instructions in MIPS. All use a single addressing mode and require that the memory value be aligned. Of course, both loads and stores are available for all the data types shown.
To understand these figures we need to introduce a few additional extensions to our C description language presented initially on page 107:
n A subscript is appended to the symbol ← whenever the length of the datum be-ing transferred might not be clear. Thus, ←n means transfer an n-bit quantity.
We use x, y ← z to indicate that z should be transferred to x and y.
n A subscript is used to indicate selection of a bit from a field. Bits are labeled from the most-significant bit starting at 0. The subscript may be a single digit (e.g.,Regs[R4]0 yields the sign bit of R4) or a subrange (e.g., Regs[R3]56..63
yields the least-significant byte of R3).
n The variable Mem, used as an array that stands for main memory, is indexed by a byte address and may transfer any number of bytes.
n A superscript is used to replicate a field (e.g., 048 yields a field of zeros of length 48 bits).
n The symbol ## is used to concatenate two fields and may appear on either side of a data transfer.
A summary of the entire description language appears on the back inside cover. As an example, assuming that R8 and R10 are 64-bit registers:
Regs[R10]32..63 ← 32(Mem[Regs[R8]]0)24 ## Mem[Regs[R8]]
means that the byte at the memory location addressed by the contents of register R8 is sign-extended to form a 32-bit quantity that is stored into the lower half of register R10. (The upper half of R10 is unchanged.)
All ALU instructions are register-register instructions. Figure 2.29 gives some examples of the arithmetic/logical instructions. The operations include simple arithmetic and logical operations: add, subtract, AND,OR,XOR, and shifts. Imme-diate forms of all these instructions are provided using a 16-bit sign-extended im-mediate. The operation LUI (load upper immediate) loads bits 32 to 47 of a register, while setting the rest of the register to 0. LUI allows a 32-bit constant to be built in two instructions, or a data transfer using any constant 32-bit address in one extra instruction.
As mentioned above, R0 is used to synthesize popular operations. Loading a constant is simply an add immediate where one source operand is R0, and a reg-ister-register move is simply an add where one of the sources is R0. (We some-times use the mnemonic LI, standing for load immediate, to represent the former and the mnemonic MOV for the latter.)
MIPS Control Flow Instructions
MIPS provides compare instructions, which compare two registers to see if the first is less than the second. If the condition is true, these instructions place a
1 in the destination register (to represent true); otherwise they place the value 0.
Because these operations “set” a register, they are called set-equal, set-not-equal, set-less-than, and so on. There are also immediate forms of these compares.
Control is handled through a set of jumps and a set of branches. Figure 2.30 gives some typical branch and jump instructions. The four jump instructions are differentiated by the two ways to specify the destination address and by whether or not a link is made. Two jumps use a 26-bit offset shifted two bits and then re-places the lower 28 bits of the program counter (of the instruction sequentially following the jump) to determine the destination address. The other two jump in-structions specify a register that contains the destination address. There are two flavors of jumps: plain jump, and jump and link (used for procedure calls). The latter places the return address—the address of the next sequential instruction—
in R31.
Example instruction Instruction name Meaning
DADDU R1,R2,R3 Add unsigned Regs[R1]←Regs[R2]+Regs[R3]
DADDIU R1,R2,#3 Add immediate unsigned Regs[R1]←Regs[R2]+3 LUI R1,#42 Load upper immediate Regs[R1]←032##42##016 SLL R1,R2,#5 Shift left logical Regs[R1]←Regs[R2]<<5 SLT R1,R2,R3 Set less than if (Regs[R2]<Regs[R3])
Regs[R1]←1 else Regs[R1]←0 FIGURE 2.29 Examples of arithmetic/logical instructions on MIPS, both with and without immediates.
Example instruction Instruction name Meaning
J name Jump PC36..63←name
JAL name Jump and link Regs[R31]←PC+4; PC36..63←name;
((PC+4)–227) ≤ name < ((PC+4)+227) JALR R2 Jump and link register Regs[R31]←PC+4; PC←Regs[R2]
JR R3 Jump register PC←Regs[R3]
BEQZ R4,name Branch equal zero if (Regs[R4]==0) PC←name;
((PC+4)–217) ≤ name < ((PC+4)+217) BNE R3,R4,name Branch not equal zero if (Regs[ R3]!= Regs[R4]) PC←name;
((PC+4)–217) ≤ name < ((PC+4)+217) MOVZ R1,R2,R3 Conditional move if zero if (Regs[R3]==0) Regs[R1]←Regs[R2]
FIGURE 2.30 Typical control-flow instructions in MIPS. All control instructions, except jumps to an address in a regis-ter, are PC-relative. Note that the branch distances are longer than the address field would suggestion; since MIPS instruc-tions are all 32-bits long, the byte branch address is multiplied by 4 to get a longer distance.
Instruction type/opcode Instruction meaning
Data transfers Move data between registers and memory, or between the integer and FP or special registers; only memory address mode is 16-bit displacement + contents of a GPR LB,LBU,SB Load byte, load byte unsigned, store byte (to/from integer registers)
LH,LHU,SH Load half word, load half word unsigned, store half word (to/from integer registers) LW,LWU,SW Load word, Load word unsigned, store word (to/from integer registers)
LD,SD Load double word, store double word (to/from integer registers) L.S,L.D,S.S,S.D Load SP float, load DP float, store SP float, store DP float MFC0,MTC0 Move from/to GPR to/from a special register
MOV.S,MOV.D Copy one SP or DP FP register to another FP register MFC1,MTC1 Move 32 bits from/to FP registers to/from integer registers
Arithmetic/logical Operations on integer or logical data in GPRs; signed arithmetic trap on overflow DADD,DADDI,DADDU,
DADDIU
Add, add immediate (all immediates are 16 bits); signed and unsigned DSUB,DSUBU Subtract, subtract immediate; signed and unsigned
DMUL,DMULU,DDIV,DDIVU Multiply and divide, signed and unsigned; all operations take and yield 64-bit values
AND,ANDI And, and immediate
OR,ORI,XOR,XORI Or, or immediate, exclusive or, exclusive or immediate
LUI Load upper immediate—loads bits 32 to 47 of register with immediate; then sign extends DSLL, SDRL, DSRA,
DSLLV, DSRLV, DSRAV
Shifts: both immediate (DS__) and variable form (DS__V); shifts are shift left logical, right logical, right arithmetic
SLT,SLTI,SLTU,SLTIU Set less than, set less than immediate; signed and unsigned
Control Conditional branches and jumps; PC-relative or through register BEQZ,BNEZ Branch GPR equal/not equal to zero; 16-bit offset from PC+4
BC1T,BC1F Test comparison bit in the FP status register and branch; 16-bit offset from PC+4 J, JR Jumps: 26-bit offset from PC+4 (J) or target in register (JR)
JAL, JALR Jump and link: save PC+4 in R31, target is PC-relative (JAL) or a register (JALR) TRAP Transfer to operating system at a vectored address
ERET Return to user code from an exception; restore user mode Floating point FP operations on DP and SP formats
ADD.D,ADD.S,ADD.PS Add DP, SP numbers, an d pairs of SP numbers SUB.D,SUB.S,ADD.PS Subtract DP, SP numbers, an d pairs of SP numbers MUL.D,MUL.S,MUL.PS Multiply DP, SP floating point, an d pairs of SP numbers DIV.D,DIV.S,DIV.PS Divide DP, SP floating point, an d pairs of SP numbers
CVT._._ Convert instructions: CVT.x.y converts from type x to type y, where x and y are L (64-bit integer), W (32-bit integer), D (DP), or S (SP). Both operands are FPRs.
C.__.D,C.__.S DP and SP compares: “__” = LT,GT,LE,GE,EQ,NE; sets bit in FP status register FIGURE 2.31 Subset of the instructions in MIPS64. Figure 2.27 lists the formats of these instructions. SP = single precision; DP = double precision. This list can also be found on the page preceding the back inside cover.
All branches are conditional. The branch condition is specified by the in-struction, which may test the register source for zero or nonzero; the register may contain a data value or the result of a compare. There are also conditional branch instructions to test for whether a register is negative and for equality between two registers. The branch target address is specified with a 16-bit signed offset that is added to the program counter, which is pointing to the next sequential instruc-tion. There is also a branch to test the point status register for floating-point conditional branches, described below.
Chapters 3 and 4 show that conditional branches are a major challenge to pipelined execution; hence many architectures have added instructions to convert a simple branch into a condition arithmetic instruction. MIPS included condition-al move on zero or not zero. The vcondition-alue of the destination register either is left un-changed or is replaced by a copy of one of the source registers depending on whether or not the value of the other source register is zero.
MIPS Floating-Point Operations
Floating-point instructions manipulate the floating-point registers and indicate whether the operation to be performed is single or double precision. The opera-tions MOV.S and MOV.D copy a single-precision (MOV.S) or double-precision (MOV.D) floating-point register to another register of the same type. The opera-tionsMFC1 and MTC1 move data between a single floating-point register and an in-teger register; moving a double-precision value to two inin-teger registers requires two instructions. Conversions from integer to floating point are also provided, and vice versa.
The floating-point operations are add, subtract, multiply, and divide; a suffix D is used for double precision and a suffix S is used for single precision (e.g., ADD.D, ADD.S, SUB.D, SUB.S, MUL.D, MUL.S, DIV.D, DIV.S). Floating-point compares set a bit in the special floating-point status register that can be tested with a pair of branches: BC1T and BC1F, branch floating-point true and branch floating-point false.
To get greater performance for graphics routines, MIPS64 has instructions that perform two 32-bit point operations on each half of the 64-bit floating-point register. These paired single operations include ADD.PS,SUB.PS,MUL.PS, andDIV.PS. (They are loaded and store using double precision loads and stores.) Giving a nod towards the importance of DSP applications, MIPS64 also in-cludes both integer and floating-point multiply-add instructions: MADD,MADD.S, MADD.D, and MADD.PS. Unlike DSPs, the registers are all the same width in these combined operations.
Figure 2.31 on page 146 contains a list of a subset of MIPS64 operations and their meaning.
MIPS Instruction Set Usage
To give an idea which instructions are popular, Figure 2.32 shows the frequen-cy of instructions and instruction classes for five SPECint92 programs and Figure 2.33 shows the same data for five SPECfp92 programs. To give a more intuitive
Instruction gap gcc gzip mcf perl
Integer average
load 44.7% 35.5% 31.8% 33.2% 41.6% 37%
store 10.3% 13.2% 5.1% 4.3% 16.2% 10%
add 7.7% 11.2% 16.8% 7.2% 5.5% 10%
sub 1.7% 2.2% 5.1% 3.7% 2.5% 3%
mul 1.4% 0.1% 0%
compare 2.8% 6.1% 6.6% 6.3% 3.8% 5%
cond branch 9.3% 12.1% 11.0% 17.5% 10.9% 12%
cond move 0.4% 0.6% 1.1% 0.1% 1.9% 1%
jump 0.8% 0.7% 0.8% 0.7% 1.7% 1%
call 1.6% 0.6% 0.4% 3.2% 1.1% 1%
return 1.6% 0.6% 0.4% 3.2% 1.1% 1%
shift 3.8% 1.1% 2.1% 1.1% 0.5% 2%
and 4.3% 4.6% 9.4% 0.2% 1.2% 4%
7.9% 8.5% 4.8% 17.6% 8.7% 9%
xor 1.8% 2.1% 4.4% 1.5% 2.8% 3%
other logical 0.1% 0.4% 0.1% 0.1% 0.3% 0%
load FP 0%
store FP 0%
add FP 0%
sub FP 0%
mul FP 0%
div FP 0%
mov reg-reg FP 0%
compare FP 0%
cond mov FP 0%
other FP 0%
FIGURE 2.32 MIPS dynamic instruction mix for five SPECint2000 programs. Note that integer register-register move instructions are included in the or instruction. Blank entries have the value 0.0%.
feeling, Figure 2.34 shows the data graphically for all instructions that are re-sponsible on average for more than 1% of the instructions executed.
Instruction applu art equake lucas swim FP average
load 32.2% 28.0% 29.0% 15.4% 27.5% 26%
store 2.9% 0.8% 3.4% 1.3% 2%
add 25.7% 20.2% 11.7% 8.2% 15.3% 16%
sub 2.5% 0.1% 2.1% 3.8% 2%
mul 2.3% 1.2% 1%
compare 7.4% 2.1% 2%
cond branch 2.5% 11.5% 2.9% 0.6% 1.3% 4%
cond mov 0.3% 0.1% 0%
jump 0.1% 0%
call 0.7% 0%
return 0.7% 0%
shift 0.7% 0.2% 1.9% 1%
and 0.2% 1.8% 0%
or 0.8% 1.1% 2.3% 1.0% 7.2% 2%
xor 3.2% 0.1% 1%
other logical 0.1% 0%
load FP 11.4% 12.0% 19.7% 16.2% 16.8% 15%
store FP 4.2% 4.5% 2.7% 18.2% 5.0% 7%
add FP 2.3% 4.5% 9.8% 8.2% 9.0% 7%
sub FP 2.9% 1.3% 7.6% 4.7% 3%
mul FP 8.6% 4.1% 12.9% 9.4% 6.9% 8%
div FP 0.3% 0.6% 0.5% 0.3% 0%
mov reg-reg FP 0.7% 0.9% 1.2% 1.8% 0.9% 1%
compare FP 0.9% 0.6% 0.8% 0%
cond mov FP 0.6% 0.8% 0%
other FP 1.6% 0%
FIGURE 2.33 MIPS dynamic instruction mix for five programs from SPECfp2000. Note that integer register-register move instructions are included in the or instruction. Blank entries have the value 0.0%.
FIGURE 2.34 Graphical display of instructions executed of the five programs from SPECint2000 in Figure 2.32 (top) and the five programs from SPECfp2000 in Figure 2.33 (bottom). Just as in Figures 2.16 and 2.18, the most popular instructions are simple.
These instruction classes collectively are responsible on average for 96% of instructions ex-ecuted for SPECint2000 and 97% of instructions exex-ecuted for SPECfp2000.
0 % 5 % 1 0 % 1 5 % 2 0 % 2 5 % 3 0 % 3 5 % 4 0 % load int
add/sub int load FP add/sub FP mul FP store FP cond branch a n d / o r / x o r compare int store int
Total dynamic percentage
applu a r t equake lucas s w i m
2 6 % 1 5 %
2 0 % 1 0 %
8 % 7 % 4 % 4 % 2 % 2 %
0 % 5 % 1 0 % 1 5 % 2 0 % 2 5 % 3 0 % 3 5 % 4 0 % load
a n d / o r / x o r add/sub cond branch store compare c a l l / r e t u r n
Total dynamic percentage
gap gcc gzip m c f perl
3 7 % 1 2 %
1 0 % 5 %
1 3 % 1 6 % 3 %
Media processor is a name given to a class of embedded processors that are dedi-cated to multimedia processing, typically being cost sensitive like embedded pro-cessors but following the compiler orientation from desktop and server computing. Like DSPs, they operate on narrower data types than the desktop, and must often deal with infinite, continuous streams of data. Figure 2.35 gives a list of media application areas and benchmark algorithms for media processors.
The Trimedia TM32 CPU is a representative of this class. As multimedia ap-plications have considerable parallelism in the processing of these data streams, the instruction set architectures often look different from the desktop. Its is in-tended for products like set top boxes and advanced televisions.
First, there are many more registers: 128 32-bit registers, which contain either integer or floating point data. Second, and not surprisingly, it offers the parti-tioned ALU or SIMD instructions to allow computations on multiple instances of narrower data, as described in Figure 2.17 on page 120. Third, showing its heri-tage, for integers it offers both two’s complement arithmetic favored by desktop processors and saturating arithmetic favored by DSPs. Figure 2.36 lists the opera-tions found in the Trimedia TM32 CPU.
However, the most unusual feature from the perspective of the desktop is that the architecture allows the programmer to specify five independent operations to be issued at the same time. If there are not five independent instructions available for the compiler to schedule together–that is, the rest are dependent–then NOPs are placed in the leftover slots. This instruction coding technique is called, natu-rally enough, Very Long Instruction Word (VLIW), and it predates the Trimedia processors. VLIW is the subject of Chapter 4, so just give a preview of VLIW here. An example helps explain how the Trimedia TM32 CPU works, and one can be found in Chapter 4 on page 279 <<Xref to example in section 4.8>>. This
However, the most unusual feature from the perspective of the desktop is that the architecture allows the programmer to specify five independent operations to be issued at the same time. If there are not five independent instructions available for the compiler to schedule together–that is, the rest are dependent–then NOPs are placed in the leftover slots. This instruction coding technique is called, natu-rally enough, Very Long Instruction Word (VLIW), and it predates the Trimedia processors. VLIW is the subject of Chapter 4, so just give a preview of VLIW here. An example helps explain how the Trimedia TM32 CPU works, and one can be found in Chapter 4 on page 279 <<Xref to example in section 4.8>>. This