The 8086 provides support for both 8-bit (byte) and 16-bit (called word) data types. The data type distinctions apply to register operations as well as memory accesses. The 80386 adds 32-bit addresses and data, called double words. Almost every operation works on both 8-bit data and one longer data size. That size is determined by the mode and is either 16 or 32 bits.
Clearly some programs want to operate on data of all three sizes, so the 80x86 architects provide a convenient way to specify each version without expanding code size significantly. They decided that most programs would be dominated by either 16- or 32-bit data, and so it made sense to be able to set a default large size. This default size is set by a bit in the code segment register. To override the default size, an 8-bit prefix is attached to the instruction to tell the machine to use the other large size for this instruction.
The prefix solution was borrowed from the 8086, which allows multiple prefixes to modify instruction behavior. The three original prefixes override the default seg-ment register, lock the bus so as to perform a semaphore (see Chapter 5), or repeat the following instruction until CX counts down to zero. This last prefix was intended to be paired with a byte move instruction to move a variable number of bytes. The 80386 also added a prefix to override the default address size.
The 80x86 integer operations can be divided into four major classes:
1. Data movement instructions, including move, push, and pop
2. Arithmetic and logic instructions, including logical operations, test, shifts, and integer and decimal arithmetic operations
3. Control flow, including conditional branches and unconditional jumps, calls, and returns
4. String instructions, including string move and string compare
Offset Segment
16 32
32
32
32
20 20
20
10 10
12
Physical address Physical address
Linear address Logical address
Paging Segmentation Offset
Segment
16 16
24
24 Logical address
Offset Segment
16
Physical address
12 4
16
20 Logical address
Segmentation
e d o m d e t c e t o r P e
d o m l a e R
) 6 8 2 0 8 ( )
6 8 0 8
( (80386, 80486, Pentium)
Figure K.32 The original segmented scheme of the 8086 is shown on the left. All 80x86 processors support this style of addressing, called real mode. It simply takes the contents of a segment register, shifts it left 4 bits, and adds it to the 16-bit offset, forming a 20-bit physical address. The 80286 (center) used the contents of the segment register to select a segment descriptor, which includes a 24-bit base address among other items. It is added to the 16-bit offset to form the 24-bit physical address. The 80386 and successors (right) expand this base address in the segment descriptor to 32 bits and also add an optional paging layer below segmentation. A 32-bit linear address is first formed from the segment and offset, and then this address is divided into two 10-bit fields and a 12-bit page offset. The first 10-bit field selects the entry in the first-level page table, and then this entry is used in combination with the second 10-bit field to access the second-level page table to select the upper 20 bits of the physical address. Prepending this 20-bit address to the final 12-bit field gives the 32-bit physical address. Paging can be turned off, redefining the 32-bit linear address as the physical address. Note that a“flat” 80x86 address space comes simply by loading the same value in all the segment registers; that is, it doesn’t matter which segment register is selected.
Figure K.33 shows some typical 80x86 instructions and their functions.
The data transfer, arithmetic, and logic instructions are unremarkable, except that the arithmetic and logic instruction operations allow the destination to be either a register or a memory location.
Control flow instructions must be able to address destinations in another seg-ment. This is handled by having two types of control flow instructions:“near” for intrasegment (within a segment) and“far” for intersegment (between segments) transfers. In far jumps, which must be unconditional, two 16-bit quantities follow the opcode in 16-bit mode. One of these is used as the instruction pointer, while the other is loaded into CS and becomes the new code segment. In 32-bit mode the first field is expanded to 32 32-bits to match the 32-32-bit program counter (EIP).
Calls and returns work similarly—a far call pushes the return instruction pointer and return segment on the stack and loads both the instruction pointer and the code segment. A far return pops both the instruction pointer and the code segment from the stack. Programmers or compiler writers must be sure to always use the same type of call and return for a procedure—a near return does not work with a far call, and vice versa.
String instructions are part of the 8080 ancestry of the 80x86 and are not commonly executed in most programs.
Figure K.34 lists some of the integer 80x86 instructions. Many of the instructions are available in both byte and word formats.
n o i t c n u F n
o i t c u r t s n I JE name
JMP name IP name
CALLF name, seg SP SP–2;M[SS:SP] IP+5;SP SP–2;
PUSH SI SP SP–2;M[SS:SP] SI
POP DI DI M[SS:SP];SP SP+2
ADD AX,#6765 AX AX+6765
SHL BX,1 BX BX1..15## 0
TEST DX,#42 Set CC flags with DX & 42
MOVSB M[ES:DI] 8M[DS:SI];DI DI+1;SI SI+1
MOVW BX,[DI+45] BX 16M[DS:DI+45]
M[SS:SP] CS;IP name;CS seg;
if equal(CC) {IP name};IP–128 name IP+128
Figure K.33 Some typical 80x86 instructions and their functions. A list of frequent operations appears in Figure K.34. We use the abbreviation SR:X to indicate the forma-tion of an address with segment register SR and offset X. This effective address corre-sponding to SR:X is (SR<<4)+X. The CALLF saves the IP of the next instruction and the current CS on the stack.