• 沒有找到結果。

Machine Language

在文檔中 THE INTEL MICROPROCESSORS (頁 131-141)

Machine language is the native binary code that the microprocessor understands and uses as its instructions to control its operation. Machine language instructions for the 8086 through the Core2 vary in length from 1 to as many as 13 bytes. Although machine language appears complex, there is order to this microprocessor’s machine language. There are well over 100,000 variations of machine language instructions, meaning that there is no complete list of these vari-ations. Because of this, some binary bits in a machine language instruction are given, and the remaining bits are determined for each variation of the instruction.

Instructions for the 8086 through the 80286 are 16-bit mode instructions that take the form found in Figure 4–1(a). The 16-bit mode instructions are compatible with the 80386 and above if they are programmed to operate in the 16-bit instruction mode, but they may be prefixed, as shown in Figure 4–1(b). The 80386 and above assume that all instructions are 16-bit mode instructions when the machine is operated in the real mode (DOS). In the protected mode (Windows), the upper byte of the descriptor contains the D-bit that selects either the 16- or 32-bit instruction mode. At present, only Windows 95 through Windows XP and Linux operate in the 32-bit instruction mode. The 32-bit mode instructions are in the form shown in Figure 4–1(b).

C⫹⫹ 32-bit instruction mode (80386 through Pentium 4 only)

(a)

FIGURE 4–1 The formats of the 8086–Core2 instructions. (a) The 16-bit form and (b) the 32-bit form.

D W

Opcode FIGURE 4–2 Byte 1 of

many machine language instructions, showing the position of the D- and W-bits.

These instructions occur in the 16-bit instruction mode by the use of prefixes, which are explained later in this chapter.

The first 2 bytes of the 32-bit instruction mode format are called override prefixes because they are not always present. The first modifies the size of the operand address used by the instruc-tion and the second modifies the register size. If the 80386 through the Pentium 4 operate as 16-bit instruction mode machines (real or protected mode) and a 32-bit register is used, the register-size prefix (66H) is appended to the front of the instruction. If operated in the 32-bit instruction mode (protected mode only) and a 32-bit register is used, the register-size prefix is absent. If a 16-bit register appears in an instruction in the 32-bit instruction mode, the register-size 16-bit instruction mode, the register-size prefix is present to select a l6-bit register. The address size-prefix (67H) is used in a similar fashion, as explained later in this chapter. The prefixes toggle the size of the reg-ister and operand address from l6-bit to 32-bit or from 32-bit to l6-bit for the prefixed instruction.

Note that the l6-bit instruction mode uses 8- and l6-bit registers and addressing modes, while the 32-bit instruction mode uses 8- and 32-bit registers and addressing modes by default. The prefixes override these defaults so that a 32-bit register can be used in the l6-bit mode or a l6-bit register can be used in the 32-bit mode. The mode of operation (16 or 32 bits) should be selected to func-tion with the current applicafunc-tion. If 8- and 32-bit data pervade the applicafunc-tion, the 32-bit mode should be selected; likewise, if 8- and l6-bit data pervade, the l6-bit mode should be selected.

Normally, mode selection is a function of the operating system. (Remember that DOS can operate only in the l6-bit mode, where Windows can operate in both modes.)

The Opcode. The opcode selects the operation (addition, subtraction, move, and so on) that is performed by the microprocessor. The opcode is either 1 or 2 bytes long for most machine lan-guage instructions. Figure 4–2 illustrates the general form of the first opcode byte of many, but not all, machine language instructions. Here, the first 6 bits of the first byte are the binary opcode. The remaining 2 bits indicate the direction (D)—not to be confused with the instruction mode bit (16/32) or direction flag bit (used with string instructions)—of the data flow, and indi-cate whether the data are a byte or a word (W). In the 80386 and above, words and doublewords are both specified when . The instruction mode and register-size prefix (66H) determine whether W represents a word or a doubleword.

If the direction bit , data flow to the register REG field from the R/M field located in the second byte of an instruction. If the in the opcode, data flow to the R/M field from the REG field. If the , the data size is a word or doubleword; if the , the data size is always a byte. The W-bit appears in most instructions, while the D-bit appears mainly with the MOV and some other instructions. Refer to Figure 4–3 for the binary bit pattern of the second opcode byte (reg-mod-r/m) of many instructions. Figure 4–3 shows the location of the MOD (mode), REG (register), and R/M (register/memory) fields.

W-bit⫽0 W-bit⫽1 D-bit⫽0

1D2⫽1 W⫽1

MOD REG R/M

FIGURE 4–3 Byte 2 of many machine language instructions, showing the position of the MOD, REG, and R/M fields.

MOD Field. The MOD field specifies the addressing mode (MOD) for the selected instruction.

The MOD field selects the type of addressing and whether a displacement is present with the selected type. Table 4–1 lists the operand forms available to the MOD field for l6-bit instruction mode, unless the operand address-size override prefix (67H) appears. If the MOD field contains an 11, it selects the register-addressing mode. Register addressing uses the R/M field to specify a register instead of a memory location. If the MOD field contains a 00, 01, or 10, the R/M field selects one of the data memory-addressing modes. When MOD selects a data memory address-ing mode, it indicates that the addressaddress-ing mode contains no displacement (00), an 8-bit sign-extended displacement (01), or a l6-bit displacement (10). The MOV AL,[DI] instruction is an example that contains no displacement, a MOV AL,[ ] instruction uses an 8-bit displace-ment ( ), and a MOV AL,[ ] instruction uses a 16-bit displacement ( ).

All 8-bit displacements are sign-extended into 16-bit displacements when the micro-processor executes the instruction. If the 8-bit displacement is 00H–7FH (positive), it is sign-extended to 0000H–007FH before adding to the offset address. If the 8-bit displacement is 80H–FFH (negative), it is sign-extended to FF80H–FFFFH. To sign-extend a number, its sign-bit is copied to the next higher-order byte, which generates either a 00H or an FFH in the next higher-order byte. Some assembler programs do not use the 8-bit displacements and in place default to all 16-bit displacements.

In the 80386 through the Core2 microprocessors, the MOD field may be the same as shown in Table 4–1 for 16-bit instruction mode; if the instruction mode is 32 bits, the MOD field is as it appears in Table 4–2. The MOD field is interpreted as selected by the address-size over-ride prefix or the operating mode of the microprocessor. This change in the interpretation of the MOD field and instruction supports many of the numerous additional addressing modes allowed in the 80386 through the Core2. The main difference is that when the MOD field is a 10, this causes the 16-bit displacement to become a 32-bit displacement, to allow any protected mode memory location (4G bytes) to be accessed. The 80386 and above only allow an 8- or 32-bit dis-placement when operated in the 32-bit instruction mode, unless the address-size override prefix appears. Note that if an 8-bit displacement is selected, it is sign-extended into a 32-bit displace-ment by the microprocessor.

Register Assignments. Table 4–3 lists the register assignments for the REG field and the R/M field ( ). This table contains three lists of register assignments: one is used when the (bytes), and the other two are used when the (words or doublewords). Note that doubleword registers are only available to the 80386 through the Core2.

W bit⫽1 W bit⫽0MOD⫽11

⫹1000H DI⫹1000H

⫹2 DI⫹2

MOD Function

00 No displacement

01 8-bit sign-extended displacement 10 16-bit signed displacement 11 R/M is a register

MOD Function

00 No displacement

01 8-bit sign-extended displacement 10 32-bit signed displacement 11 R/M is a register

TABLE 4–1 MOD field for the 16-bit instruction mode.

TABLE 4–2 MOD field for the 32-bit instruction mode (80386–Core2 only).

Suppose that a 2-byte instruction, 8BECH, appears in a machine language program.

Because neither a 67H (operand address-size override prefix) nor a 66H (register-size override prefix) appears as the first byte, the first byte is the opcode. If the microprocessor is operated in the 16-bit instruction mode, this instruction is converted to binary and placed in the instruction format of bytes 1 and 2, as illustrated in Figure 4–4. The opcode is 100010. If you refer to Appendix B, which lists the machine language instructions, you will find that this is the opcode for a MOV instruction. Notice that both the D and W bits are a logic 1, which means that a word moves into the destination register specified in the REG field. The REG field contains a 101, indicating regis-ter BP, so the MOV instruction moves data into regisregis-ter BP. Because the MOD field contains a 11, the R/M field also indicates a register. Here, (SP); therefore, this instruction moves data from SP into BP and is written in symbolic form as a MOV BP,SP instruction.

Suppose that a 668BE8H instruction appears in an 80386 or above, operated in the 16-bit instruction mode. The first byte (66H) is the register-size override prefix that selects 32-bit register operands for the 16-bit instruction mode. The remainder of the instruction indicates that the opcode is a MOV with a source operand of EAX and a destination operand of EBP. This instruction is a MOV EBP,EAX. The same instruction becomes a MOV BP,AX instruction in the 80386 and above if it is operated in the 32-bit instruction mode, because the register-size override prefix selects a 16-bit register. Luckily, the assembler program keeps track of the register- and address-size prefixes and the mode of operation. Recall that if the .386 switch is placed before the .MODEL statement, the 32-bit mode is selected; if it is placed after the .MODEL statement, the 16-bit mode is selected. All programs written using the inline assembler in Visual are always in the 32-bit mode.

R/M Memory Addressing. If the MOD field contains a 00, 01, or 10, the R/M field takes on a new meaning. Table 4–4 lists the memory-addressing modes for the R/M field when MOD is a 00, 01, or 10 for the 16-bit instruction mode.

C⫹⫹

R>M⫽100

Code W = 0(Byte) W= 1(Word) W = 1(Doubleword)

000 AL AX EAX

001 CL CX ECX

010 DL DX EDX

011 BL BX EBX

100 AH SP ESP

101 CH BP EBP

110 DH SI ESI

111 BH DI EDI

MOD REG R/M

1 1 1 0 1 1 0 0

D W

Opcode

1 0 0 0 1 0 1 1

Opcode = MOV

D = Transfer to register (REG) W = Word

MOD = R/M is a register REG = BP

R/M = SP

FIGURE 4–4 The 8BEC instruction placed into bytes 1 and 2 formats from Figures 4–2 and 4–3. This instruction is a MOV BP,SP.

TABLE 4–3 REG and R/M (when)

assignments.

MOD = 11

All of the 16-bit addressing modes presented in Chapter 3 appear in Table 4–4. The dis-placement, discussed in Chapter 3, is defined by the MOD field. If and , the addressing mode is [DI]. If or 10, the addressing mode is [ ], or LIST [ ] for the 16-bit instruction mode. This example uses LIST, 33H, and 22H as arbitrary values for the displacement.

Figure 4–5 illustrates the machine language version of the 16-bit instruction MOV DL,[DI] or instruction (8AI5H). This instruction is 2 bytes long and has an opcode 100010, (to REG from R/M), (byte), (no displacement), (DL), and ([DI]). If the instruction changes to , the MOD field changes to 01 for an 8-bit displacement, but the first 2 bytes of the instruction otherwise remain the same. The instruction now becomes 8A5501H instead of 8A15H. Notice that the 8-bit displacement appends to the first 2 bytes of the instruction to form a 3-byte instruction instead of 2 bytes. If the instruction is again changed to a , the machine language form becomes 8A750010H. Here, the 16-bit displacement of 1000H (coded as 0010H) appends the opcode.

Special Addressing Mode. There is a special addressing mode that does not appear in Tables 4–2, 4–3, or 4–4. It occurs whenever memory data are referenced by only the displacement mode of addressing for 16-bit instructions. Examples are the MOV [1000H],DL and MOV NUMB,DL instructions. The first instruction moves the contents of register DL into data segment memory location 1000H. The second instruction moves register DL into symbolic data segment memory location NUMB.

Whenever an instruction has only a displacement, the MOD field is always a 00 and the R/M field is always 110. As shown in the tables, the instruction contains no displacement and uses addressing mode [BP]. You cannot actually use addressing mode [BP] without a displacement in machine language. The assembler takes care of this by using an 8-bit displacement (MOD⫽01)

MOV DL,3DI⫹1000H]

MOV DL,3DI⫹14

R>M⫽101 W⫽0 MOD⫽00 REG⫽010

D⫽1

DI⫹22H MOD⫽01 MOD⫽00DI⫹33HR>M⫽101

R/M Code Addressing Mode

000 DS:3BX+SI4

001 DS:3BX+DI4

010 SS:3BP+SI4

011 SS:3BP+DI4

100 DS:[SI]

101 DS:[DI]

110 SS:[BP]*

111 DS:[BX]

*Note: See text section, Special Addressing Mode.

MOD REG R/M

0 0 0 1 0 1 0 1

D W

Opcode

1 0 0 0 1 0 1 0

Opcode = MOV

D = Transfer to register (REG) W = Byte

MOD = No displacement REG = DL

R/M = DS:[DI]

FIGURE 4–5 A MOV DL,[DI] instruction converted to its machine language form.

TABLE 4–4 16-bit R/M memory-addressing modes.

FIGURE 4–6 The MOV [1000H],DI instruction uses the special addressing mode.

MOD REG R/M

0 0 0 1 0 1 1 0

D W

Opcode

1 0 0 0 1 0 0 0

Opcode = MOV

D = Transfer from register (REG) W = Byte

MOD = because R/M is [BP] (special addressing) REG = DL

R/M = DS:[BP]

Displacement = 1000H

0 0 0 1 0 0 0 0

0 0 0 0 0 0 0 0

Byte 1 Byte 2

Byte 3 Byte 4

Displacement—low Displacement—high

of 00H whenever the [BP] addressing mode appears in an instruction. This means that the [BP]

addressing mode assembles as a [ ], even though a [BP] is used in the instruction. The same special addressing mode is also available for the 32-bit mode.

Figure 4–6 shows the binary bit pattern required to encode the MOV [1000H],DL instruc-tion in machine language. If the individual translating this symbolic instrucinstruc-tion into machine language does not know about the special addressing mode, the instruction would incorrectly translate to a MOV [BP],DL instruction. Figure 4–7 shows the actual form of the MOV [BP],DL instruction. Notice that this is a 3-byte instruction with a displacement of 00H.

BP⫹0

MOD REG R/M

0 1 0 1 0 1 1 0

D W

Opcode

1 0 0 0 1 0 0 0

Opcode = MOV

D = Transfer from register (REG) W = Byte

MOD = because R/M is [BP] (special addressing) REG = DL

R/M = DS:[BP]

Displacement = 00H

0 0 0 0 0 0 0 0

Byte 1 Byte 2

Byte 3 8-bit displacement

FIGURE 4–7 The MOV [BP],DL instruction converted to binary machine language.

R/M Code Function

000 DS:[EAX]

001 DS:[ECX]

010 DS:[EDX]

011 DS:[EBX]

100 Uses scaled-index byte

101 SS:[EBP]*

110 DS:[ESI]

111 DS:[EDI]

*Note: See text section, Special Addressing Mode.

32-Bit Addressing Modes. The 32-bit addressing modes found in the 80386 and above are obtained by either running these machines in the 32-bit instruction mode or in the 16-bit instruc-tion mode by using the address-size prefix 67H. Table 4–5 shows the coding for R/M used to specify the 32-bit addressing modes. Notice that when , an additional byte called a scaled-index byte appears in the instruction. The scaled-index byte indicates the additional forms of scaled-index addressing that do not appear in Table 4–5. The scaled-index byte is mainly used when two registers are added to specify the memory address in an instruction. Because the scaled-index byte is added to the instruction, there are 7 bits in the opcode and 8 bits in the scaled-index byte to define. This means that a scaled-index instruction has 215(32K) possible combinations. There are over 32,000 different variations of the MOV instruction alone in the 80386 through the Core2 microprocessors.

Figure 4–8 shows the format of the scaled-index byte as selected by a value of 100 in the R/M field of an instruction when the 80386 and above use a 32-bit address. The leftmost 2 bits select a scaling factor (multiplier) of . Note that a scaling factor of is implicit if none is used in an instruction that contains two 32-bit indirect address registers. The index and base fields both contain register numbers, as indicated in Table 4–3 for 32-bit registers.

The instruction is encoded as 67668B048BH. Notice that

both the address size (67H) and register size (66H) override prefixes appear in the instruction.

This coding (67668B048BH) is used when the 80386 and above microprocessors are operated in the 16-bit instruction mode for this instruction. If the microprocessor operates in the 32-bit instruction mode, both prefixes disappear and the instruction becomes an 8B048BH instruction.

The use of the prefixes depends on the mode of operation of the microprocessor. Scaled-index addressing can also use a single register multiplied by a scaling factor. An example is the MOV AL,[2*ECX] instruction. The contents of the data segment location addressed by two times ECX are copied into AL.

An Immediate Instruction. Suppose that the MOV WORD PTR [ ],1234H instruc-tion is chosen as an example of a 16-bit instrucinstruc-tion using immediate addressing. This instrucinstruc-tion moves a 1234H into the word-sized memory location addressed by the sum of 1000H, BX, and

BX + 1000H MOV EAX,3EBX+4*ECX4

1×, 2×, 4×, or 8×

R>M⫽100

Base Index

s s

ss 00 = × 1 01 = × 2 10 = × 4 11 = × 8 FIGURE 4–8 The

scaled-index byte.

TABLE 4–5 32-bit address-ing modes selected by R/M.

. This 6-byte instruction uses 2 bytes for the opcode, W, MOD, and R/M fields. Two of the 6 bytes are the data of 1234H; 2 of the 6 bytes are the displacement of 1000H. Figure 4–9 shows the binary bit pattern for each byte of this instruction.

This instruction, in symbolic form, includes WORD PTR. The WORD PTR directive indi-cates to the assembler that the instruction uses a word-sized memory pointer. If the instruction moves a byte of immediate data, BYTE PTR replaces WORD PTR in the instruction. Likewise, if the instruction uses a doubleword of immediate data, the DWORD PTR directive replaces BYTE PTR. Most instructions that refer to memory through a pointer do not need the BYTE PTR, WORD PTR, or DWORD PTR directives. These directives are necessary only when it is not clear whether the operation is a byte, word, or doubleword. The MOV [BX],AL instruction is clearly a byte move; the MOV [BX],9 instruction is not exact, and could therefore be a byte-, word-, or doubleword-sized move. Here, the instruction must be coded as MOV BYTE PTR [BX],9, MOV WORD PTR [BX],9, or MOV DWORD PTR [BX],9. If not, the assembler flags it as an error because it cannot determine the intent of the instruction.

Segment MOV Instructions. If the contents of a segment register are moved by the MOV, PUSH, or POP instructions, a special set of register bits (REG field) selects the segment register (see Table 4–6).

Figure 4–10 shows a MOV BX,CS instruction converted to binary. The opcode for this type of MOV instruction is different for the prior MOV instructions. Segment registers can be moved between any 16-bit register or 16-bit memory location. For example, the MOV [DI],DS instruction stores the contents of DS into the memory location addressed by DI in the data DS × 10H

FIGURE 4–9 A MOV WORD PTR [ ], 1234H instruction converted to binary machine language.

BX⫹1000H

MOD R/M

1 0 0 0 0 1 1 1

W Opcode

1 1 0 0 0 1 1 1

Opcode = MOV (immediate) W = Word

MOD = 16-bit displacement

REG = 000 (not used in immediate addressing) R/M = DS:[BX]

Displacement = 1000H Data = 1234H

0 0 0 1 0 0 0 0

0 0 0 0 0 0 0 0

Byte 1 Byte 2

Byte 3 Byte 4

Displacement—low Displacement—high

0 0 0 1 0 0 1 0

0 0 1 1 0 1 0 0

Byte 5 Byte 6

Data—low Data—high

segment. An immediate segment register MOV is not available in the instruction set. To load a segment register with immediate data, first load another register with the data and then move it to a segment register.

Although this discussion has not been a complete coverage of machine language coding, it provides enough information for machine language programming. Remember that a program written in symbolic assembly language (assembly language) is rarely assembled by hand into binary machine language. An assembler program converts symbolic assembly language into machine language. With the microprocessor and its over 100,000 instruction variations, let us hope that an assembler is available for the conversion, because the process is very time-consuming, although not impossible.

The 64-Bit Mode for the Pentium 4 and Core2

None of the information presented thus far addresses the issue of 64-bit operation of the Pentium 4 or Core2. In the 64-bit mode, an additional prefix called REX (register extension) is added.

The REX prefix, which is encoded as a 40H–4FH, follows other prefixes and is placed immedi-ately before the opcode to modify it for 64-bit operation. The purpose of the REX prefix is to modify the reg and r/m fields in the second byte of the instruction. REX is needed to be able to address registers R8 through R15. Figure 4–11 illustrates the structure of REX and also its

The REX prefix, which is encoded as a 40H–4FH, follows other prefixes and is placed immedi-ately before the opcode to modify it for 64-bit operation. The purpose of the REX prefix is to modify the reg and r/m fields in the second byte of the instruction. REX is needed to be able to address registers R8 through R15. Figure 4–11 illustrates the structure of REX and also its

在文檔中 THE INTEL MICROPROCESSORS (頁 131-141)