Classes of Parallelism and Parallel Architectures
1.3 Defining Computer Architecture
The task the computer designer faces is a complex one: determine what attributes are important for a new computer, then design a computer to maximize 1.3 Defining Computer Architecture ■ 11
performance and energy efficiency while staying within cost, power, and availabil-ity constraints. This task has many aspects, including instruction set design, func-tional organization, logic design, and implementation. The implementation may encompass integrated circuit design, packaging, power, and cooling. Optimizing the design requires familiarity with a very wide range of technologies, from com-pilers and operating systems to logic design and packaging.
A few decades ago, the term computer architecture generally referred to only instruction set design. Other aspects of computer design were called implementa-tion, often insinuating that implementation is uninteresting or less challenging.
We believe this view is incorrect. The architect’s or designer’s job is much more than instruction set design, and the technical hurdles in the other aspects of the project are likely more challenging than those encountered in instruction set design. We’ll quickly review instruction set architecture before describing the larger challenges for the computer architect.
Instruction Set Architecture: The Myopic View of Computer Architecture
We use the term instruction set architecture (ISA) to refer to the actual programmer-visible instruction set in this book. The ISA serves as the boundary between the software and hardware. This quick review of ISA will use examples from 80x86, ARMv8, and RISC-V to illustrate the seven dimensions of an ISA.
The most popular RISC processors come from ARM (Advanced RISC Machine), which were in 14.8 billion chips shipped in 2015, or roughly 50 times as many chips that shipped with 80x86 processors. Appendices A and K give more details on the three ISAs.
RISC-V (“RISC Five”) is a modern RISC instruction set developed at the University of California, Berkeley, which was made free and openly adoptable in response to requests from industry. In addition to a full software stack (com-pilers, operating systems, and simulators), there are several RISC-V implementa-tions freely available for use in custom chips or in field-programmable gate arrays.
Developed 30 years after the first RISC instruction sets, RISC-V inherits its ances-tors’ good ideas—a large set of registers, easy-to-pipeline instructions, and a lean set of operations—while avoiding their omissions or mistakes. It is a free and open, elegant example of the RISC architectures mentioned earlier, which is why more than 60 companies have joined the RISC-V foundation, including AMD, Google, HP Enterprise, IBM, Microsoft, Nvidia, Qualcomm, Samsung, and Western Digital. We use the integer core ISA of RISC-V as the example ISA in this book.
1. Class of ISA—Nearly all ISAs today are classified as general-purpose register architectures, where the operands are either registers or memory locations. The 80x86 has 16 general-purpose registers and 16 that can hold floating-point data, while RISC-V has 32 general-purpose and 32 floating-point registers (see Figure 1.4). The two popular versions of this class are register-memory ISAs, 12 ■ Chapter One Fundamentals of Quantitative Design and Analysis
such as the 80x86, which can access memory as part of many instructions, and load-store ISAs, such as ARMv8 and RISC-V, which can access memory only with load or store instructions. All ISAs announced since 1985 are load-store.
2. Memory addressing—Virtually all desktop and server computers, including the 80x86, ARMv8, and RISC-V, use byte addressing to access memory operands.
Some architectures, like ARMv8, require that objects must be aligned. An access to an object of size s bytes at byte address A is aligned if A mod s ¼0. (See Figure A.5 on page A-8.) The 80x86 and RISC-V do not require alignment, but accesses are generally faster if operands are aligned.
3. Addressing modes—In addition to specifying registers and constant operands, addressing modes specify the address of a memory object. RISC-V addressing modes are Register, Immediate (for constants), and Displacement, where a con-stant offset is added to a register to form the memory address. The 80x86 supports those three modes, plus three variations of displacement: no register (absolute), two registers (based indexed with displacement), and two registers
Register Name Use Saver
x0 zero The constant value 0 N.A.
x1 ra Return address Caller
x2 sp Stack pointer Callee
x3 gp Global pointer –
x4 tp Thread pointer –
x5–x7 t0–t2 Temporaries Caller
x8 s0/fp Saved register/frame pointer Callee
x9 s1 Saved register Callee
x10–x11 a0–a1 Function arguments/return values Caller
x12–x17 a2–a7 Function arguments Caller
x18–x27 s2–s11 Saved registers Callee
x28–x31 t3–t6 Temporaries Caller
f0–f7 ft0–ft7 FP temporaries Caller
f8–f9 fs0–fs1 FP saved registers Callee
f10–f11 fa0–fa1 FP function arguments/return values Caller
f12–f17 fa2–fa7 FP function arguments Caller
f18–f27 fs2–fs11 FP saved registers Callee
f28–f31 ft8–ft11 FP temporaries Caller
Figure 1.4 RISC-V registers, names, usage, and calling conventions. In addition to the 32 general-purpose registers (x0–x31), RISC-V has 32 floating-point registers (f0–f31) that can hold either a 32-bit single-precision number or a 64-bit double-precision num-ber. The registers that are preserved across a procedure call are labeled “Callee” saved.
1.3 Defining Computer Architecture ■ 13
where one register is multiplied by the size of the operand in bytes (based with scaled index and displacement). It has more like the last three modes, minus the displacement field, plus register indirect, indexed, and based with scaled index.
ARMv8 has the three RISC-V addressing modes plus PC-relative addressing, the sum of two registers, and the sum of two registers where one register is multiplied by the size of the operand in bytes. It also has autoincrement and autodecrement addressing, where the calculated address replaces the contents of one of the registers used in forming the address.
4. Types and sizes of operands—Like most ISAs, 80x86, ARMv8, and RISC-V support operand sizes of 8-bit (ASCII character), 16-bit (Unicode character or half word), 32-bit (integer or word), 64-bit (double word or long integer), and IEEE 754 floating point in 32-bit (single precision) and 64-bit (double precision). The 80x86 also supports 80-bit floating point (extended double precision).
5. Operations—The general categories of operations are data transfer, arithmetic logical, control (discussed next), and floating point. RISC-V is a simple and easy-to-pipeline instruction set architecture, and it is representative of the RISC architectures being used in 2017.Figure 1.5summarizes the integer RISC-V ISA, andFigure 1.6lists the floating-point ISA. The 80x86 has a much richer and larger set of operations (see Appendix K).
6. Control flow instructions—Virtually all ISAs, including these three, support conditional branches, unconditional jumps, procedure calls, and returns. All three use PC-relative addressing, where the branch address is specified by an address field that is added to the PC. There are some small differences.
RISC-V conditional branches (BE, BNE, etc.) test the contents of registers, and the 80x86 and ARMv8 branches test condition code bits set as side effects of arithmetic/logic operations. The ARMv8 and RISC-V procedure call places the return address in a register, whereas the 80x86 call (CALLF) places the return address on a stack in memory.
7. Encoding an ISA—There are two basic choices on encoding: fixed length and variable length. All ARMv8 and RISC-V instructions are 32 bits long, which simplifies instruction decoding.Figure 1.7shows the RISC-V instruction for-mats. The 80x86 encoding is variable length, ranging from 1 to 18 bytes.
Variable-length instructions can take less space than fixed-length instructions, so a program compiled for the 80x86 is usually smaller than the same program compiled for RISC-V. Note that choices mentioned previously will affect how the instructions are encoded into a binary representation. For example, the num-ber of registers and the numnum-ber of addressing modes both have a significant impact on the size of instructions, because the register field and addressing mode field can appear many times in a single instruction. (Note that ARMv8 and RISC-V later offered extensions, called Thumb-2 and RV64IC, that provide a mix of 16-bit and 32-bit length instructions, respectively, to reduce program size. Code size for these compact versions of RISC architectures are smaller than that of the 80x86. See Appendix K.)
14 ■ Chapter One Fundamentals of Quantitative Design and Analysis
Instruction type/opcode Instruction meaning
Data transfers Move data between registers and memory, or between the integer and FP or special registers; only memory address mode is 12-bit displacement + contents of a GPR lb, lbu, sb Load byte, load byte unsigned, store byte (to/from integer registers)
lh, lhu, sh Load half word, load half word unsigned, store half word (to/from integer registers) lw, lwu, sw Load word, load word unsigned, store word (to/from integer registers)
ld, sd Load double word, store double word (to/from integer registers) flw, fld, fsw, fsd Load SP float, load DP float, store SP float, store DP float
fmv._.x, fmv.x._ Copy from/to integer register to/from floating-point register; “__” ¼S for single-precision, D for double-precision
csrrw, csrrwi, csrrs,
csrrsi, csrrc, csrrci Read counters and write status registers, which include counters: clock cycles, time, instructions retired
Arithmetic/logical Operations on integer or logical data in GPRs
add, addi, addw, addiw Add, add immediate (all immediates are 12 bits), add 32-bits only & sign-extend to 64 bits, add immediate 32-bits only
sub, subw Subtract, subtract 32-bits only mul, mulw, mulh, mulhsu,
mulhu Multiply, multiply 32-bits only, multiply upper half, multiply upper half signed-unsigned, multiply upper half unsigned
div, divu, rem, remu Divide, divide unsigned, remainder, remainder unsigned
divw, divuw, remw, remuw Divide and remainder: as previously, but divide only lower 32-bits, producing 32-bit sign-extended result
and, andi And, and immediate
or, ori, xor, xori Or, or immediate, exclusive or, exclusive or immediate
lui Load upper immediate; loads bits 31-12 of register with immediate, then sign-extends auipc Adds immediate in bits 31–12 with zeros in lower bits to PC; used withJALR to
transfer control to any 32-bit address sll, slli, srl, srli, sra,
srai Shifts: shift left logical, right logical, right arithmetic; both variable and immediate forms
sllw, slliw, srlw, srliw,
sraw, sraiw Shifts: as previously, but shift lower 32-bits, producing 32-bit sign-extended result slt, slti, sltu, sltiu Set less than, set less than immediate, signed and unsigned
Control Conditional branches and jumps; PC-relative or through register beq, bne, blt, bge, bltu,
bgeu Branch GPR equal/not equal; less than; greater than or equal, signed and unsigned jal, jalr Jump and link: save PC + 4, target is PC-relative (JAL) or a register (JALR); if specify
x0 as destination register, then acts as a simple jump
ecall Make a request to the supporting execution environment, which is usually an OS ebreak Debuggers used to cause control to be transferred back to a debugging environment fence, fence.i Synchronize threads to guarantee ordering of memory accesses; synchronize
instructions and data for stores to instruction memory
Figure 1.5 Subset of the instructions in RISC-V. RISC-V has a base set of instructions (R64I) and offers optional exten-sions: multiply-divide (RVM), single-precision floating point (RVF), double-precision floating point (RVD). This figure includes RVM and the next one shows RVF and RVD.Appendix Agives much more detail on RISC-V.
1.3 Defining Computer Architecture ■ 15
Instruction type/opcode Instruction meaning
Floating point FP operations on DP and SP formats
fadd.d, fadd.s Add DP, SP numbers
fsub.d, fsub.s Subtract DP, SP numbers
fmul.d, fmul.s Multiply DP, SP floating point fmadd.d, fmadd.s, fnmadd.d,
fnmadd.s Multiply-add DP, SP numbers; negative multiply-add DP, SP numbers fmsub.d, fmsub.s, fnmsub.d,
fnmsub.s Multiply-sub DP, SP numbers; negative multiply-sub DP, SP numbers fdiv.d, fdiv.s Divide DP, SP floating point
fsqrt.d, fsqrt.s Square root DP, SP floating point fmax.d, fmax.s, fmin.d,
fmin.s Maximum and minimum DP, SP floating point
fcvt._._, fcvt._._u,
fcvt._u._ Convert instructions:FCVT.x.y converts from type x to type y, where x and y are L (64-bit integer), W (32-bit integer), D (DP), or S (SP). Integers can be unsigned (U) feq._, flt._,fle._ Floating-point compare between floating-point registers and record the Boolean
result in integer register; “__” ¼S for single-precision, D for double-precision fclass.d, fclass.s Writes to integer register a 10-bit mask that indicates the class of the floating-point
number ("∞, +∞, "0, +0, NaN, …) fsgnj._, fsgnjn._,
fsgnjx._ Sign-injection instructions that changes only the sign bit: copy sign bit from other source, the oppositive of sign bit of other source, XOR of the 2 sign bits Figure 1.6 Floating point instructions for RISC-V. RISC-V has a base set of instructions (R64I) and offers optional extensions for single-precision floating point (RVF) and double-precision floating point (RVD). SP ¼single precision;
DP ¼double precision.
Figure 1.7 The base RISC-V instruction set architecture formats. All instructions are 32 bits long. The R format is for integer register-to-register operations, such as ADD, SUB, and so on. The I format is for loads and immediate oper-ations, such as LD and ADDI. The B format is for branches and the J format is for jumps and link. The S format is for stores. Having a separate format for stores allows the three register specifiers (rd, rs1, rs2) to always be in the same location in all formats. The U format is for the wide immediate instructions (LUI, AUIPC).
16 ■ Chapter One Fundamentals of Quantitative Design and Analysis
The other challenges facing the computer architect beyond ISA design are par-ticularly acute at the present, when the differences among instruction sets are small and when there are distinct application areas. Therefore, starting with the fourth edition of this book, beyond this quick review, the bulk of the instruction set mate-rial is found in the appendices (see Appendices A and K).