• 沒有找到結果。

Real Arithmetic

N/A
N/A
Protected

Academic year: 2022

Share "Real Arithmetic"

Copied!
59
0
0

加載中.... (立即查看全文)

全文

(1)

Real Arithmetic

Computer Organization and Assembly Languages Yung-Yu Chuang

2006/12/11

(2)

Announcement

• Homework #4 is extended for two days

• Homework #5 will be announced today, due two weeks later.

• Midterm re-grading by this Thursday.

(3)

IA-32 floating point architecture

• Original 8086 only has integers. It is possible to simulate real arithmetic using software, but it is slow.

• 8087 floating-point processor (and 80287, 80387) was sold separately at early time.

• Since 80486, FPU (floating-point unit) was

integrated into CPU.

(4)

FPU data types

• Three floating-point types

(5)

FPU data types

• Four integer types

(6)

FPU registers

• Data register

• Control register

• Status register

• Tag register

(7)

Data registers

• Load: push, TOP--

• Store: pop, TOP++

• Instructions access the stack using ST(i)

relative to TOP

• If TOP=0 and push, TOP wraps to R7

• If TOP=7 and pop, TOP wraps to R0

• When overwriting occurs, generate an exception

• Real values are transferred to and from memory and stored in 10-byte temporary format. When storing, convert back to integer, long, real, long real.

79 0

R0 R1 R2 R3 R4 R5 R6 R7

ST(0) ST(1) ST(2)

010 TOP

(8)

Postfix expression

• (5*6)-4 → 5 6 * 4 -

5 5

5 6

6

30

*

30 4

4

26 -

(9)

Special-purpose registers

(10)

Special-purpose registers

• Last data pointer stores the memory address of the operand for the last non-control instruction.

Last instruction pointer stored the address of the last non-control instruction. Both are 48 bits, 32 for offset, 16 for segment selector.

1 1 0 1 1

(11)

Control register

Initial 037Fh

The instruction FINIT will initialize it to 037Fh.

(12)

Rounding

FPU attempts to round an infinitely accurate result from a floating-point calculation

– Round to nearest even: round toward to the closest one; if both are equally close, round to the even one – Round down: round toward to -∞

– Round up: round toward to +∞

– Truncate: round toward to zero

• Example

– suppose 3 fractional bits can be stored, and a calculated value equals +1.0111.

– rounding up by adding .0001 produces 1.100

– rounding down by subtracting .0001 produces 1.011

(13)

Rounding

1.011 1.0111

Truncate

1.100 1.0111

Round up

1.011 1.0111

Round down

1.100 1.0111

Round to nearest even

rounded value original value

method

-1.011 -1.0111

Truncate

-1.011 -1.0111

Round up

-1.100 -1.0111

Round down

-1.100 -1.0111

Round to nearest even

rounded value original value

method

(14)

• Six types of exception conditions

– #I: Invalid operation – #Z: Divide by zero

– #D: Denormalized operand – #O: Numeric overflow

– #U: Numeric underflow – #P: Inexact precision

Each has a corresponding mask bit

– if set when an exception occurs, the exception is handled automatically by FPU

– if clear when an exception occurs, a software exception handler is invoked

Floating-Point Exceptions

detect before execution

detect after execution

(15)

Status register

C3-C0: condition bits after comparisons

(16)

.data

bigVal REAL10 1.212342342234234243E+864 .code

Fld bigVal

FPU data types

(17)

FPU instruction set

• Instruction mnemonics begin with letter F

• Second letter identifies data type of memory operand

– B = bcd – I = integer

– no letter: floating point

• Examples

– FLBD load binary coded decimal – FISTP store integer and pop stack

– FMUL multiply floating-point operands

(18)

FPU instruction set

• Operands

– zero, one, or two

– no immediate operands

– no general-purpose registers (EAX, EBX, ...) (FSTSW is the only exception which stores FPU status word to AX)

– if an instruction has two operands, one must be a FPU register

– integers must be loaded from memory onto the stack and converted to floating-point before being used in calculations

(19)

Instruction format

{…}: implied operands

(20)

Classic stack

• ST(0) as source, ST(1) as destination. Result is

stored at ST(1) and ST(0) is popped, leaving the

result on the top.

(21)

Real memory and integer memory

• ST(0) as the implied destination. The second

operand is from memory.

(22)

Register and register pop

• Register: operands are FP data registers, one must be ST.

• Register pop: the same as register with a ST

pop afterwards.

(23)

Example: evaluating an expression

(24)
(25)

Load

FLDPI stores π

FLDL2T stores log2(10) FLDL2E stores log2(e) FLDLG2 stores log10(2) FLDLN2 stores ln(2)

(26)

load

.data

array REAL8 10 DUP(?) .code

fld array ; direct

fld [array+16] ; direct-offset

fld REAL8 PTR[esi] ; indirect

fld array[esi] ; indexed

fld array[esi*8] ; indexed, scaled

fld REAL8 PTR[ebx+esi]; base-index

fld array[ebx+esi] ; base-index-displacement

(27)

Store

(28)

Store

fst dblOne

fst dblTwo

fstp dblThree

fstp dblFour

(29)

Register

(30)

Arithmetic instructions

FCHS ; change sign of ST FABS ; ST=|ST|

(31)

Floating-Point add

• FADD

– adds source to destination

– No-operand version pops the FPU stack after addition

• Examples:

(32)

Floating-Point subtract

• FSUB

– subtracts source from destination.

– No-operand version pops the FPU stack after subtracting

• Example:

fsub mySingle ; ST -= mySingle

fsub array[edi*8] ; ST -= array[edi*8]

(33)

Floating-point multiply/divide

• FMUL

– Multiplies source by destination, stores product in destination

• FDIV

– Divides destination by source, then pops the stack

(34)

Example: compute distance

; compute D=sqrt(x^2+y^2)

fld x ; load x

fld st(0) ; duplicate x

fmul ; x*x

fld y ; load y

fld st(0) ; duplicate y

fmul ; y*y

fadd ; x*x+y*y

fsqrt fst D

(35)

Example: expression

; expression:valD = –valA + (valB * valC).

.data

valA REAL8 1.5 valB REAL8 2.5 valC REAL8 3.0

valD REAL8 ? ; will be +6.0 .code

fld valA ; ST(0) = valA

fchs ; change sign of ST(0) fld valB ; load valB into ST(0) fmul valC ; ST(0) *= valC

fadd ; ST(0) += ST(1)

fstp valD ; store ST(0) to valD

(36)

Example: array sum

.data N = 20

array REAL8 N DUP(1.0) sum REAL8 0.0

.code

mov ecx, N

mov esi, OFFSET array

fldz ; ST0 = 0

lp: fadd REAL8 PTR [esi]; ST0 += *(esi)

add esi, 8 ; move to next double loop lp

fstp sum ; store result

(37)

Comparisons

(38)

Comparisons

• The above instructions change FPU’s status register of FPU and the following instructions are used to transfer them to CPU.

• SAHF copies C

0

into carry, C

2

into parity and C

3

to zero. Since the sign and overflow flags are

not set, use conditional jumps for unsigned

integers (ja, jae, jb, jbe, je, jz).

(39)

Comparisons

(40)

Branching after FCOM

• Required steps:

1. Use the FSTSW instruction to move the FPU status word into AX.

2. Use the SAHF instruction to copy AH into the EFLAGS register.

3. Use JA, JB, etc to do the branching.

• Pentium Pro supports two new comparison

instructions that directly modify CPU’s FLAGS.

FCOMI ST(0), src ; src=STn FCOMIP ST(0), src

Example

fcomi ST(0), ST(1) jnb Label1

(41)

Example: comparison

.data

x REAL8 1.0 y REAL8 2.0 .code

; if (x>y) return 1 else return 0

fld x ; ST0 = x

fcomp y ; compare ST0 and y

fstsw ax ; move C bits into FLAGS sahf

jna else_part ; if x not above y, ...

then_part:

mov eax, 1 jmp end_if else_part:

mov eax, 0 end_if:

(42)

Example: comparison

.data

x REAL8 1.0 y REAL8 2.0 .code

; if (x>y) return 1 else return 0

fld y ; ST0 = y

fld x ; ST0 = x ST1 = y fcomi ST(0), ST(1)

jna else_part ; if x not above y, ...

then_part:

mov eax, 1 jmp end_if else_part:

mov eax, 0 end_if:

(43)

Comparing for Equality

• Not to compare floating-point values directly because of precision limit. For example,

sqrt(2.0)*sqrt(2.0) != 2.0

ST(0): +4.4408921E-016 fsub two

ST(0): +2.0000000E+000 fmul ST(0), ST(1)

ST(0): +1.4142135+000 fsqrt

ST(0): +2.0000000E+000 fld two

FPU stack instruction

(44)

Comparing for Equality

• Calculate the absolute value of the difference between two floating-point values

.data

epsilon REAL8 1.0E-12 ; difference value val2 REAL8 0.0 ; value to compare

val3 REAL8 1.001E-13 ; considered equal to val2 .code

; if( val2 == val3 ), display "Values are equal".

fld epsilon fld val2

fsub val3 fabs

fcomi ST(0),ST(1) ja skip

mWrite <"Values are equal",0dh,0ah>

skip:

(45)

Miscellaneous instructions

.data

x REAL4 2.75

five REAL4 5.2 .code

fld five ; ST0=5.2

fld x ; ST0=2.75, ST1=5.2

fscale ; ST0=2.75*32=88

; ST1=5.2

(46)

Example: quadratic formula

(47)

Example: quadratic formula

(48)

Example: quadratic formula

(49)

Other instructions

• F2XM1 ; ST=2

ST(0)

-1; ST in [-1,1]

• FYL2X ; ST=ST(1)*log

2

(ST(0))

• FYL2XP1 ; ST=ST(1)*log

2

(ST(0)+1)

• FPTAN ; ST(0)=1;ST(1)=tan(ST)

• FPATAN ; ST=arctan(ST(1)/ST(0))

• FSIN ; ST=sin(ST) in radius

• FCOS ; ST=sin(ST) in radius

• FSINCOS ; ST(0)=cos(ST);ST(1)=sin(ST)

(50)

Exception synchronization

• Main CPU and FPU can execute instructions concurrently

if an unmasked exception occurs, the current FPU instruction is interrupted and the FPU signals an exception

But the main CPU does not check for pending FPU exceptions. It might use a memory value that the interrupted FPU instruction was supposed to set.

Example:

.data

intVal DWORD 25 .code

fild intVal ; load integer into ST(0) inc intVal ; increment the integer

(51)

Exception synchronization

• Exception is issued and pended; the next floating-point instruction checks exceptions before it executes.

For safety, insert a fwait instruction, which tells the CPU to wait for the FPU's exception handler to finish:

.data

intVal DWORD 25 .code

fild intVal ; load integer into ST(0)

fwait ; wait for pending exceptions inc intVal ; increment the integer

(52)

Mixed-mode arithmetic

• Combining integers and reals.

Integer arithmetic instructions such as ADD and MUL cannot handle reals

FPU has instructions that promote integers to reals and load the values onto the floating point stack.

• Example: Z = N + X

.data

N SDWORD 20 X REAL8 3.5 Z REAL8 ? .code

fild N ; load integer into ST(0) fwait ; wait for exceptions

fadd X ; add mem to ST(0) fstp Z ; store ST(0) to mem

(53)

Mixed-mode arithmetic

int N=20;

double X=3.5;

int Z=(int)(N+X);

fild N fadd X

fist Z ; Z=24

fstcw ctrlWord

or ctrlWord, 110000000000b ; round mode=truncate fldcw ctrlWord

fild N fadd X

fist Z ; Z=23

(54)

Masking and unmasking exceptions

• Exceptions are masked by default

– Divide by zero just generates infinity, without halting the program

• If you unmask an exception

– processor executes an appropriate exception handler – Unmask the divide by zero exception by clearing bit 2

.data

ctrlWord WORD ? .code

fstcw ctrlWord ; get the control word and ctrlWord,1111111111111011b ; unmask #Z

fldcw ctrlWord ; load it back into FPU

(55)

Homework #5

for (i=0; i<m; i++)

for (j=0; j<p; j++) { C[I,j]=0;

for (r=0; r<n; r++)

C[i,j]+=A[i,r]*B[r,j];

}

• Strassen’s algorithm?

• Coppersmith & Winograd O(n2.376)

• Memory coherence

m

n

n

p

(56)

Homework #5

A C

k i

B

k j

i

j

/* ijk */

for (i=0; i<n; i++) { for (j=0; j<n; j++) {

sum = 0.0;

for (k=0; k<n; k++)

sum += a[i][k] * b[k][j];

c[i][j] = sum;

} }

/* ijk */

for (i=0; i<n; i++) { for (j=0; j<n; j++) {

sum = 0.0;

for (k=0; k<n; k++)

sum += a[i][k] * b[k][j];

c[i][j] = sum;

} }

/* jik */

for (j=0; j<n; j++) { for (i=0; i<n; i++) {

sum = 0.0;

for (k=0; k<n; k++)

sum += a[i][k] * b[k][j];

c[i][j] = sum }

}

/* jik */

for (j=0; j<n; j++) { for (i=0; i<n; i++) {

sum = 0.0;

for (k=0; k<n; k++)

sum += a[i][k] * b[k][j];

c[i][j] = sum }

}

(57)

Homework #5

/* kij */

for (k=0; k<n; k++) { for (i=0; i<n; i++) {

r = a[i][k];

for (j=0; j<n; j++)

c[i][j] += r * b[k][j];

} }

/* kij */

for (k=0; k<n; k++) { for (i=0; i<n; i++) {

r = a[i][k];

for (j=0; j<n; j++)

c[i][j] += r * b[k][j];

} }

/* ikj */

for (i=0; i<n; i++) { for (k=0; k<n; k++) {

r = a[i][k];

for (j=0; j<n; j++)

c[i][j] += r * b[k][j];

} }

/* ikj */

for (i=0; i<n; i++) { for (k=0; k<n; k++) {

r = a[i][k];

for (j=0; j<n; j++)

c[i][j] += r * b[k][j];

} }

/* jki */

for (j=0; j<n; j++) { for (k=0; k<n; k++) {

r = b[k][j];

for (i=0; i<n; i++)

c[i][j] += a[i][k] * r;

} }

/* jki */

for (j=0; j<n; j++) { for (k=0; k<n; k++) {

r = b[k][j];

for (i=0; i<n; i++)

c[i][j] += a[i][k] * r;

} }

/* kji */

for (k=0; k<n; k++) { for (j=0; j<n; j++) {

r = b[k][j];

for (i=0; i<n; i++)

c[i][j] += a[i][k] * r;

} }

/* kji */

for (k=0; k<n; k++) { for (j=0; j<n; j++) {

r = b[k][j];

for (i=0; i<n; i++)

c[i][j] += a[i][k] * r;

} }

(58)

Homework #5

0 10 20 30 40 50 60

25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 Array size (n)

Cycles/iteration

kji jki kij ikj jik ijk

kji & jki

kij & ikj

jik & ijk

(59)

Blocked array

0 10 20 30 40 50 60

25 50 75 100 125 150 175

200 225 250 275

300 325 350 375 400 Array size (n)

Cycles/iteration

kji jki kij ikj jik ijk

bijk (bsize = 25) bikj (bsize = 25)

參考文獻

相關文件

• The maintaining of an equivalent portfolio does not depend on our correctly predicting future stock prices. • The portfolio’s value at the end of the current period is precisely

• Consider an algorithm that runs C for time kT (n) and rejects the input if C does not stop within the time bound.. • By Markov’s inequality, this new algorithm runs in time kT (n)

• Consider an algorithm that runs C for time kT (n) and rejects the input if C does not stop within the time bound.. • By Markov’s inequality, this new algorithm runs in time kT (n)

See Chapter 5, Interrupt and Exception Handling, in the IA-32 Intel Architecture Software Developer’s Manual, Volume 3, for a detailed description of the processor’s mechanism

In this way, we can take these bits and by using the IFFT, we can create an output signal which is actually a time-domain OFDM signal.. The IFFT is a mathematical concept and does

As as single precision floating point number, they represent 23.850000381, but as a double word integer, they represent 1,103,023,309.. The CPU does not know which is the

 The oxidation number of oxygen is usually -2 in both ionic and molecular compounds. The major exception is in compounds called peroxides, which contain the O 2 2- ion, giving

This is especially important if the play incorporates the use of (a) flashbacks to an earlier time in the history of the characters (not the main focus of the play, but perhaps the