• 沒有找到結果。

Computer Arithmetic Design

N/A
N/A
Protected

Academic year: 2022

Share "Computer Arithmetic Design"

Copied!
318
0
0

加載中.... (立即查看全文)

全文

(1)

Computer Arithmetic Design

Instructor: Kuan Jen Lin

E-Mail: [email protected]

Web: http://vlsi.ee.fju.edu.tw/teacher/kjlin/kjlin.htm Dept. of EE, FJU, Taiwan

Room: SF 727B

(2)

SW & HW

SW = Algorithm + Data Structure + Programming techniques HW = Algorithm + Architecture + Design Method

Computing

Communication

Pipeline

Systolic array Low power Interface

Full custom Cell based FPGA

System level

(3)

Course Objectives

„ Learn computer algorithms to do arithmetic operations

„ Learn hardware designs for computer arithmetic.

„ After completing the course

‰ Students are able to implement computer arithmetic hardware designs using HDL.

‰ Students are able to read research papers about computer arithmetic.

(4)

Textbook

•Textbook

Behrooz Parhami,

“Computer Arithmetic

Algorithms and Hardware Designs,”

Oxford University Press

•Reference books:

Ercegovac and Lang, “Digital Arithmetic,” MKP.

Stine, “Digital Computer Aruthmetic datapath Design Using Verilog HDL,” CAP

(5)

Syllabus

„ Number representation

„ Two-operand Addition

„ Multi-operand Addition

„ Multiplication

„ Division

„ Square Root

„ Papers reading and presentation

(6)

Grading

„ Mid Exam (30%)

„ Papers reading and presentation (30%)

„ Homework (some problems need HDL programming) (30%)

„ Attendance and Others (10%)

(7)

Number Representation

Instructor: Kuan Jen Lin

E-Mail: [email protected] Dept. of EE, FJU, Taiwan

Room: SF 727B

Most slides are revision of PowerPoint files gotten from textbook website.

(8)

Numbers and Arithmetic

Chapter Goals

Define scope and provide motivation

Set the framework for the rest of the book Review positional fixed-point numbers

Chapter Highlights

What goes on inside your calculator?

Ways of encoding numbers in k bits

Radices and digit sets: conventional, exotic Conversion from one system to another

(9)

What is Computer Arithmetic?

Pentium Division Bug (1994-95): Pentium’s radix-4 SRT algorithm occasionally gave incorrect quotient

First noted in 1994 by T. Nicely who computed sums of reciprocals of twin primes:

1/5 + 1/7 + 1/11 + 1/13 + . . . + 1/p + 1/(p + 2) + . . . Worst-case example of division error in Pentium:

4 195 835 3 145 727

1.333 820 44...

1.333 739 06...

c = = Correct quotient

circa 1994 Pentium double FLP value;

accurate to only 14 bits (worse than single!)

(10)

Hardware (our focus in this book) Software

––––––––––––––––––––––––––––––––––––––––––––––––– ––––––––––––––––––––––––––––––––––––

Design of efficient digital circuits for Numerical methods for solving primitive and other arithmetic operations systems of linear equations,

such as +, –, ×, ÷, √, log, sin, cos partial differential equations, etc.

Issues: Algorithms Issues: Algorithms

Error analysis Error analysis

Speed/cost trade-offs Computational complexity

Hardware implementation Programming

Testing, verification Testing, verification

General-purpose Special-purpose

–––––––––––––––––––––– –––––––––––––––––––––––

Flexible data paths Tailored to

Fast primitive applications like:

operations like Digital filtering +, –, ×, ÷, √ Image processing Benchmarking Radar tracking

The Scope of Computer Arithmetic.

(11)

Using a calculator with √, x2, and xy functions, compute:

u = √√ … √ 2 = 1.000 677 131 “1024th root of 2”

v = 21/1024 = 1.000 677 131

Save u and v; If you can’t save, recompute values when needed x = (((u2)2)...)2 = 1.999 999 963

x' = u1024 = 1.999 999 973 y = (((v2)2)...)2 = 1.999 999 983

y' = v1024 = 1.999 999 994

Perhaps v and u are not really the same value

w = v – u = 1 × 10–11 Nonzero due to hidden digits (u – 1) × 1000 = 0.677 130 680 [Hidden ... (0) 68]

(v – 1) × 1000 = 0.677 130 690 [Hidden ... (0) 69]

A Motivating Example

(12)

Finite Precision Can Lead to Disaster

„ Example: Failure of Patriot Missile (1991 Feb. 25)

Source http://www.math.psu.edu/dna/455.f96/disasters.html American Patriot Missile battery in Dharan, Saudi Arabia, failed to intercept incoming Iraqi Scud missile

The Scud struck an American Army barracks, killing 28

„ Cause, per GAO/IMTEC-92-26 report: “software problem” (inaccurate calculation of the time since boot)

Problem specifics:

Time in tenths of second as measured by the system’s internal clock was multiplied by 1/10 to get the time in seconds

Internal registers were 24 bits wide

1/10 = 0.0001 1001 1001 1001 1001 100 (chopped to 24 b) Error ≈ 0.1100 1100 × 2–23 ≈ 9.5 × 10–8

„ Error in 100-hr operation period

≈ 9.5 × 10 –8 × 100 × 60 × 60 × 10 = 0.34 s

Distance traveled by Scud = (0.34 s) × (1676 m/s) ≈ 570 m

(13)

Numbers and Their Encodings

Some 4-bit number representation formats

Unsigned integer ± Signed integer

Signed fraction 2's-compl fraction

Floating point Logarithmic

Fixed point, 3+1

±

e s log x

Radix point

Base-2 logarithm Exponent in

{−2, −1, 0, 1} Significand in {0, 1, 2, 3}

(14)

Encoding Numbers in 4 Bits

0 2 4 6 8 10 12 14 16

−2

−4

−6

−8

−10

−12

−14

−16

Unsigned integers Signed-magnitude

3 + 1 fixed-point, xxx.x Signed fraction, ±.xxx 2’s-compl. fraction, x.xxx 2 + 2 floating-point, s × 2 e in [−2, 1], s in [0, 3]

2 + 2 logarithmic (log = xx.xx)

±

±

Number format

log x s e

e

(15)

Fixed-Radix Positional Number Systems

( xk–1xk–2 . . . x1x0 . x–1x–2 . . . x–l )r = xi ri One can generalize to:

Arbitrary radix (not necessarily integer, positive, constant) Arbitrary digit set, usually {–α, –α+1, . . . , β–1, β} = [–α, β]

Example 1.1. Balanced ternary number system:

Radix r = 3, digit set = [–1, 1]

Example 1.2. Negative-radix number systems:

Radix –r, r ≥ 2, digit set = [0, r – 1]

The special case with radix –2 and digit set [0, 1]

is known as the negabinary number system

Can it represent all integer number?

= 1 k

l i

(16)

More Examples of Number Systems

Example 1.3. Digit set [–4, 5] for r = 10:

(3 1 5)ten represents 295 = 300 – 10 + 5

Example 1.4. Digit set [–7, 7] for r = 10:

(3 1 5)ten = (3 0 5)ten = (1 7 0 5)ten

Example 1.7. Quater-imaginary number system:

radix r = 2j, digit set [0, 3]

(17)

Number Radix Conversion

Radix conversion, using arithmetic in the old radix r Convenient when converting from r = 10

u = w . v

= ( xk–1xk–2 . . . x1x0 . x–1x–2 . . . x–l )r Old

= ( XK–1XK–2 . . . X1X0 . X–1X–2 . . . X–L )R New

Radix conversion, using arithmetic in the new radix R Convenient when converting to R = 10

Whole part Fractional part

Example: (31)eight = (25)ten 31 Oct. = 25 Dec. Halloween = Xmas

(18)

Radix Conversion: Old-Radix Arithmetic

Converting whole part w: (105)ten = (?)five

Repeatedly divide by five Quotient Remainder

105 0

21 1

4 4

0 Therefore, (105)ten = (410)five

Converting fractional part v: (105.486)ten = (410.?)five Repeatedly multiply by five Whole Part Fraction

.486 2 .430

2 .150

0 .750

3 .750

3 .750 Therefore, (105.486)ten ≅ (410.22033)five

(19)

Radix Conversion: New-Radix Arithmetic

Converting whole part w: (22033)five = (?)ten

((((2 × 5) + 2) × 5 + 0) × 5 + 3) × 5 + 3

|---| : : : : 10 : : : :

|---| : : : 12 : : :

|---| : : 60 : :

|---| : 303 :

|---|

1518

Converting fractional part v: (410.22033)five = (105.?)ten (0.22033)five × 55 = (22033)five = (1518)ten

1518 / 55 = 1518 / 3125 = 0.48576 Therefore, (410.22033)five = (105.48576)ten

Horner’s rule or formula

(20)

Horner’s Rule for Fractions

Converting fractional part v: (0.22033)five = (?)ten

(((((3 / 5) + 3) / 5 + 0) / 5 + 2) / 5 + 2) / 5

|---| : : : : 0.6 : : : :

|---| : : : 3.6 : : :

|---| : : 0.72 : :

|---| : 2.144 :

|---|

2.4288

|---|

0.48576

Horner’s rule or formula

(21)

Classes of Number Representations

„ Signed number

„ Redundant number system

„ Residue number system

„ Real number

(22)

2 Representing Signed Numbers

Chapter Goals

Learn different encodings of the sign info Discuss implications for arithmetic design

Chapter Highlights

Using sign bit, biasing, complementation Properties of 2’s-complement numbers Signed vs unsigned arithmetic

Signed numbers, positions, or digits

(23)

0000

0001 1111

0010 1110

0011 1101

0100 1100

1000

0101 1011

0110 1010

0111 1001

0 +1

+3

+4

+5 +6 +7

-7

-3 -5

-4

-1 -0 +2

-

_ +

Bit pattern (representation) Signed values

(signed magnitude)

+2 -6

Increment Decrement

-

Four-bit signed-magnitude number representation system for integers

(24)

Four-bit biased integer number

representation system with a bias of 8

0000

0001 1111

0010 1110

0011 1101

0100 1100

1000

0101 1011

0110 1010

0111 1001

-8 -7

-5

-4

-3 -2 -1

+7

+3 +5

+4

+1 0 +2

+

_

Bit pattern (representation) Signed values

(biased by 8)

-6 +6

Increment Increment

(25)

Arithmetic with Biased Numbers

Addition/subtraction of biased numbers

x + y + bias = (x + bias) + (y + bias) – bias x – y + bias = (x + bias) – (y + bias) + bias

A power-of-2 (or 2a – 1) bias simplifies addition/subtraction Comparison of biased numbers:

Compare like ordinary unsigned numbers find true difference by ordinary subtraction

We seldom perform arbitrary arithmetic on biased numbers Main application: Exponent field of floating-point numbers

(26)

Example and Two Special Cases

Example -- complement system for fixed-point numbers:

Complementation constant M = 12.000

Fixed-point number range [–6.000, +5.999]

Represent –3.258 as 12.000 – 3.258 = 8.742 Auxiliary operations for complement representations

complementation or change of sign (computing M – x) computations of residues mod M

Thus, M must be selected to simplify these operations

Two choices allow just this for fixed-point radix-r arithmetic with k whole digits and l fractional digits

Radix complement M = rk

Digit complement M = rk – ulp (aka diminished radix compl) ulp (unit in least position) stands for r−l

Allows us to forget about l, even for nonintegers

(27)

Two’s- Complement Numbers

0000

0001 1111

0010 1110

0011 1101

0100 1100

1000

0101 1011

0110 1010

0111 1001

+0 +1

+3

+4

+5 +6 +7

-1

-5 -3

-4

-7 -8 -6

_ +

Unsigned representations Signed values

(2’s complement)

+2 -2

Two’s complement = radix complement system for r = 2

M = 2k

2k – x = [(2k – ulp) – x] + ulp

= xcompl + ulp Range of representable

numbers in with k whole bits:

from –2k–1 to 2k–1 – ulp ulp (unit in least position) stands for r−l

Allows us to forget about l, even for nonintegers

(28)

One’s-Complement Number Representation

One’s complement = digit

complement (diminished radix complement) system for r = 2

M = 2k – ulp

(2k – ulp) – x = xcompl Range of representable

numbers in with k whole bits:

from –2k–1 + ulp to 2k–1 – ulp

0000

0001 1111

0010 1110

0011 1101

0100 1100

1000

0101 1011

0110 1010

0111 1001

+0 +1

+3

+4

+5 +6 +7

-0

-4 -2

-3

-6 -7 -5

_ +

Unsigned representations Signed values

(1’s complement)

+2 -1

(29)

Range/Precision extension for 2’s- and 1’s Complement

Range/precision extension for 2’s-complement numbers

. . . xk–1 xk–1 xk–1 xk–1 xk–2 . . . x1 x0. x–1 x–2 . . . x–l 0 0 0 . . .

Å Sign extension Æ Sign LSD Å Extension Æ

bit

Range/precision extension for 1’s-complement numbers

. . . xk–1 xk–1 xk–1 xk–1 xk–2 . . . x1 x0. x–1 x–2 . . . x–l xk–1 xk–1 xk–1 . . .

Å Sign extension Æ Sign LSD Å Extension Æ

bit

(30)

Mod 2

k

vs Mod 2

k

-1

Mod-2k operation needed in 2’s-complement arithmetic is trivial:

Simply drop the carry-out (subtract 2k if result is 2k or greater) Mod-(2k – ulp) operation needed in 1’s-complement

arithmetic is done via end-around carry

(x + y) – (2k – ulp) Connect cout to cin

Since the dropped carry is worth 2k unites and the inserted carry is worth ulp, the combined effect is to reduce the

magnitude by 2k-ulp.

(31)

Why 2’s-Complement Is the Universal Choice

Adder/subtractor architecture for 2’s-complement numbers.

Mux

Adder

0 1

x y

y or y _

s = x ± y

add/sub

___

c in

Controlled

complementation

0 for addition, 1 for subtraction

c out

Can replace this mux with k XOR gates

(32)

Interpreting a 2’s-complement number as having a negatively weighted most-significant digit.

x = (1 0 1 0 0 1 1 0)two’s-compl

–27 26 25 24 23 22 21 20

–128 + 32 + 4 + 2 = –90

Check:

x = (1 0 1 0 0 1 1 0)two’s-compl

x = (0 1 0 1 1 0 1 0)two

27 26 25 24 23 22 21 20

64 + 16 + 8 + 2 = 90

(33)

Redundant Number Systems

Chapter Goals

Explore the advantages and drawbacks of using more than r digit values in radix r Chapter Highlights

Redundancy eliminates long carry chains Redundancy takes many forms: trade-offs Conversions between redundant

and nonredundant representations Redundancy used for end values too?

(34)

Coping with the Carry Problem

Ways of dealing with the carry propagation problem:

1. Limit propagation to within a small number of bits (Chapters 3-4) 2. Detect end of propagation; don’t wait for worst case (Chapter 5) 3. Speed up propagation via lookahead etc. (Chapters 6-7)

4. Ideal: Eliminate carry propagation altogether! (Chapter 3)

(35)

Use Redundant Number System (1/2)

5 7 8 2 4 9

6 2 9 3 8 9 Operand digits in [0, 9]

––––––––––––––––––––––––––––––––––

11 9 17 5 12 18 Position sums in [0, 18]

But how can we extend this beyond a single addition?

Subsequent additions will cause problems.

+

•The digit values 10 through 18 are redundant.

•Carry occurs if the sum >= 10, while not >18.

(36)

Use Redundant Number System (2/2)

18 18 18 18 18 + 0 0 0 0 1

Is there still carry propagation problem?

The sum of digits for each position is in [0, 36], each can be decomposed into an interim sum in [0, 16] and a

transfer digit in [0, 2], i.e. carry.

8 8 8 8 9 1 1 1 1 1 9 9 9 9 9

(37)

Example: Addition of Redundant Numbers

Position sum decomposition [0, 36] = 10 × [0, 2] + [0, 16]

Absorption of transfer digit [0, 16] + [0, 2] = [0, 18]

6 12 9 10 8 18

Operand digits in [0, 18]

17 21 26 20 20 36 7 11 16 0 10 16

Position sums i n [0, 36]

Interim sums in [0, 16]

1 1 1 2 1 2

1 8 12 18 1 12 16 11 9 17 10 12 18

Transfer digits in [0, 2]

Sum digits in [0, 18]

+

(38)

Carry-Free Addition Schemes

Interim sum at position i

Transfer digit into position i Operand digits

at position i

si+1 si si–1

xi–1, yi–1

xi,

xi+1, yi+1 yi xi+1,yi+1 xi,yi xi–1,yi–1

(b) Two-stage carry-free.

si+1 si si–1

ti

(c) Single-stage with lookahead.

si+1 si si–1

xi–1,yi–1

xi, xi+1,yi+1 yi

(a) Ideal single-stage carry-free.

(Impossible for positional system with fixed digit set)

(39)

Redundancy Index

So, redundancy helps us achieve carry-free addition

But how much redundancy is actually needed? Is [0, 11] enough for r = 10?

18 12 16 21 12 16 Position sums in [0, 22]

8 2 6 1 2 6 1 1 1 2 1 1

Interim sums in [0, 9]

Transfer digits in [0, 2]

1 9 3 8 2 3 6 11 10 7 11 3 8

Sum digits in [0, 11]

+ 7 2 9 10 9 8

Operand digits in [0, 11]

Redundancy index ρ = α + β + 1 – r For example, 0 + 11 + 1 – 10 = 2

(40)

Digit Sets and Digit-Set Conversions

Example 3.1: Convert from digit set [0, 18] to [0, 9] in radix 10

11 9 17 10 12 18 18 = 10 (carry 1) + 8 11 9 17 10 13 8 13 = 10 (carry 1) + 3

11 9 17 11 3 8 11 = 10 (carry 1) + 1

11 9 18 1 3 8 18 = 10 (carry 1) + 8

11 10 8 1 3 8 10 = 10 (carry 1) + 0

12 0 8 1 3 8 12 = 10 (carry 1) + 2

1 2 0 8 1 3 8 Answer;

all digits in [0, 9]

Note: Conversion from redundant to nonredundant representation always involves carry propagation

Thus, the process is sequential and slow

(41)

Generalized Signed-Digit Numbers

Radix-r Positional

ρ = 0 ρ ≥ 1

Non-redundant

α = 0 α ≥ 1

Conventional Non-redundant signed-digit

Generalized

signed-digit (GSD)

ρ = 1 ρ ≥ 2

Minimal GSD

Non-minimal GSD

α = β

(even r) α ≠ β

Symmetric minimal GSD r = 2

BSD or BSB

Asymmetric minimal GSD

α = 0 α = 1

(r ?2) Stored-

carry (SC) Non-binary SB

Symmetric non- minimal GSD

α = β α ≠ β

Asymmetric non- minimal GSD α < r

Ordinary signed-digit

Minimally

redundant OSD Maximally

redundant OSD BSCB SCB

r = 2 α = 1

β = r α = 0

Unsigned-digit redundant (UDR) r = 2

BSC

α = r ?1 α = ⎣ ⎦r/2 + 1

Radix r

Digit set [–α, β]

Requirement α + β + 1 ≥ r Redundancy index

ρ = α + β + 1 – r

(42)

Binary Signed Digit (BSD)

xi 1 1 0 1 0 BSD representation of +6

〈s, v〉 01 11 00 11 00 Sign and value encoding

2’s-compl 01 10 00 10 00 2-bit 2’s-complement

〈n, p〉 01 10 00 10 00 Negative & positive flags

〈n, z, p〉 001 100 010 100 010 1-out-of-3 encoding

(43)

Carry-Free Addition Algorithms

Carry-free addition of GSD numbers Compute the position sums pi = xi + yi

Divide pi into a transfer ti+1 and interim sum wi = pi – rti+1 Add incoming transfers to get the sum digits si = wi + ti

xi? ,yi?

xi, xi+1,yi+1 yi

si+1 si si?

ti wi

If the transfer digits ti are in [–λ, μ], we must have:

–α + λ ≤ pi – rti+1 β – μ interim sum

Smallest interim sum Largest interim sum if a transfer of –λ if a transfer of μ

is to be absorbable is to be absorbable

These constraints lead to:

λ ≥ α / (r – 1) μ ≥ β / (r – 1)

(44)

Is Carry-Free Addition Always Applicable?

No: It requires one of the following two conditions [Parh 90]

a. r > 2, ρ ≥ 3

b. r > 2, ρ = 2, α ≠ 1, β ≠ 1 e.g., not [−1, 10] in radix 10 In other words, it is inapplicable for

r = 2 Perhaps most useful case

ρ = 1 e.g., carry-save

ρ = 2 with α = 1 or β = 1 e.g., carry/borrow-save

BSD is not two-stage carry-free -1 -1 0 -1 -1 -2 -1

-1

(45)

Use Carry-Estimate

A position sum –1 is kept intact when the incoming transfer is in [0, 1], whereas it is rewritten as 1 with a carry of –1 for incoming transfer in [–1, 0]. This guarantees that ti ≠ wi and thus –1≤ si ≤ 1.

1 –1 0 –1 0 x in [–1, 1]

+ 0 –1 –1 0 1 1 –2 –1 –1 1

1 0 1 –1 –1 –1 –1 0 1

0 –1 1 0 –1

i

i+1

y in [–1, 1] i p in [–2, 2] i

w in [–1, 1] i

s in [–1, 1] i t in [–1, 1]

low low low high high high

0 0

e in {low: [–1, 0], high: [0, 1]} i

(46)

Residue Number Systems

Chapter Goals

Study a way of encoding large numbers as a collection of smaller numbers

to simplify and speed up some operations Chapter Highlights

Moduli, range, arithmetic operations Many sets of moduli possible: tradeoffs Conversions between RNS and binary The Chinese remainder theorem

Why are RNS applications limited?

(47)

RNS Representations and Arithmetic

Chinese puzzle, 1500 years ago:

What number has the remainders of 2, 3, and 2 when divided by 7, 5, and 3, respectively?

Residues uniquely identify the number, hence they constitute a representation

Pairwise relatively prime moduli: mk–1 > . . . > m1 > m0

The residue xi of x wrt the ith modulus mi (similar to a digit):

xi = x mod mi = 〈x〉mi

RNS representation contains a list of k residues or digits:

x = (2 | 3 | 2)RNS(7|5|3)

Default RNS for this chapter: RNS(8 | 7 | 5 | 3)

(48)

RNS Dynamic Range

Product M of the k pairwise relatively prime moduli is the dynamic range M = mk–1 × . . . × m1 × m0

For RNS(8 | 7 | 5 | 3), M = 8× 7 × 5 × 3 = 840 Negative numbers: Complement relative to M

〈–x〉mi = 〈M – x〉mi

21 = (5 | 0 | 1 | 0)RNS

–21 = (8 – 5 | 0 | 5 – 1 | 0)RNS = (3 | 0 | 4 | 0)RNS

Here are some example numbers in our default RNS(8 | 7 | 5 | 3):

(0 | 0 | 0 | 0)RNS Represents 0 or 840 or . . . (1 | 1 | 1 | 1)RNS Represents 1 or 841 or . . . (2 | 2 | 2 | 2)RNS Represents 2 or 842 or . . . . .

(0 | 1 | 4 | 1)RNS Represents 64 or 904 or . . . (2 | 0 | 0 | 2)RNS Represents –70 or 770 or . . . (7 | 6 | 4 | 2)RNS Represents –1 or 839 or . . .

We can take the

range of RNS(8|7|5|3) to be [−420, 419] or any other set of 840 consecutive integers

(49)

We will see later how the weights can be determined for a given RNS

RNS as Weighted Representation

For RNS(8 | 7 | 5 | 3), the weights of the 4 positions are:

105 120 336 280

Example: (1 | 2 | 4 | 0)RNS represents the number

〈105×1 + 120×2 + 336×4 + 280×0〉840 = 〈1689〉840 = 9

For RNS(7 | 5 | 3), the weights of the 3 positions are:

15 21 70

Example -- Chinese puzzle: (2 | 3 | 2)RNS(7|5|3) represents the number

〈15 × 2 + 21 × 3 + 70 × 2〉105 = 〈233〉105 = 23

(50)

RNS Encoding and Arithmetic Operations

Binary-coded format for RNS(8 | 7 | 5 | 3).

Arithmetic in RNS(8 | 7 | 5 | 3)

(5 | 5 | 0 | 2)RNS Represents x = +5 (7 | 6 | 4 | 2)RNS Represents y = –1

(4 | 4 | 4 | 1)RNS x + y : 〈5 + 7〉8 = 4, 〈5 + 6〉7 = 4, etc.

(6 | 6 | 1 | 0)RNS x – y : 〈5 – 7〉8 = 6, 〈5 – 6〉7 = 6, etc.

(alternatively, find –y and add to x) (3 | 2 | 0 | 1)RNS x × y : 〈5 × 7〉8 = 3, 〈5 × 6〉7 = 2, etc.

mod 8 mod 7 mod 5 mod 3

mod 8 mod 7 mod 5 mod 3 Mod-8

Unit Mod-7

Unit Mod-5

Unit Mod-3 Unit

3 3 3 2

Operand 1 Operand 2

Result

(51)

Choosing the RNS Moduli

Target range for our RNS: Decimal values [0, 100 000]

Strategy 1: To minimize the largest modulus, and thus ensure high-speed arithmetic, pick prime numbers in sequence

Pick m0 = 2, m1 = 3, m2 = 5, etc. After adding m5 = 13:

RNS(13 | 11 | 7 | 5 | 3 | 2) M = 30 030 Inadequate RNS(17 | 13 | 11 | 7 | 5 | 3 | 2) M = 510 510 Too large RNS(17 | 13 | 11 | 7 | 3 | 2) M = 102 102 Just right!

5 + 4 + 4 + 3 + 2 + 1 = 19 bits Fine tuning: Combine pairs of moduli 2 & 13 (26) and 3 & 7 (21)

RNS(26 | 21 | 17 | 11) M = 102 102

(52)

An Improved Strategy

Target range for our RNS: Decimal values [0, 100 000]

Strategy 2: Improve strategy 1 by including powers of smaller primes before proceeding to the next larger prime

RNS(22 | 3) M = 12

RNS(32 | 23 | 7 | 5) M = 2520

RNS(11 | 32 | 23 | 7 | 5) M = 27 720 RNS(13 | 11 | 32 | 23 | 7 | 5) M = 360 360

(remove one 3, combine 3 & 5) RNS(15 | 13 | 11 | 23 | 7) M = 120 120

4 + 4 + 4 + 3 + 3 = 18 bits Fine tuning: Maximize the size of the even modulus within the 4-bit limit RNS(24 | 13 | 11 | 32 | 7 | 5) M = 720 720 Too large

We can now remove 5 or 7; not an improvement in this example

(53)

Low-Cost RNS Moduli

Target range for our RNS: Decimal values [0, 100 000]

Strategy 3: To simplify the modular reduction (mod mi) operations, choose only moduli of the forms 2a or 2a – 1, aka “low-cost moduli”

RNS(2ak–1 | 2ak–2 – 1 | . . . | 2a1 – 1 | 2a0 – 1) We can have only one even modulus

2ai – 1 and 2aj – 1 are relatively prime iff ai and aj are relatively prime RNS(23 | 23–1 | 22–1) basis: 3, 2 M = 168 RNS(24 | 24–1 | 23–1) basis: 4, 3 M = 1680 RNS(25 | 25–1 | 23–1 | 22–1) basis: 5, 3, 2 M = 20 832 RNS(25 | 25–1 | 24–1 | 23–1) basis: 5, 4, 3 M = 104 160 Comparison

RNS(15 | 13 | 11 | 23 | 7) 18 bits M = 120 120 RNS(25 | 25–1 | 24–1 | 23–1) 17 bits M = 104 160

It’s easy to mod 2k and 2k -1

(54)

Encoding and Decoding of Numbers

Conversion from binary/decimal to RNS

–––––––––––––––––––––––––––––

i 2i 〈2i7 〈2i5 〈2i3 –––––––––––––––––––––––––––––

0 1 1 1 1

1 2 2 2 2

2 4 4 4 1

3 8 1 3 2

4 16 2 1 1

5 32 4 2 2

6 64 1 4 1

7 128 2 3 2

8 256 4 1 1

9 512 1 2 2

–––––––––––––––––––––––––––––

Table 4.1 Residues of the first 10 powers of 2 Example 4.1: Represent the

number y = (1010 0100)two = (164)ten in RNS(8 | 7 | 5 | 3)

The mod-8 residue is easy to find x3 = 〈y〉8 = (100)two = 4

We have y = 27+25+22; thus x2 = 〈y〉7 = 〈2 + 4 + 4〉7 = 3 x1 = 〈y〉5 = 〈3 + 2 + 4〉5 = 4 x0 = 〈y〉3 = 〈2 + 2 + 1〉3 = 2

(55)

Conversion from RNS to Binary/Decimal

Theorem 4.1 (The Chinese remainder theorem)

x = (xk–1 | . . . | x2 | x1 | x0)RNS = 〈 ∑i Mi 〈αi ximi M

where Mi = M/mi and αi = 〈Mi –1mi (multiplicative inverse of Mi wrt mi) Implementing CRT-based RNS-to-binary conversion

x = 〈 ∑i Mi 〈αi ximi M = 〈 ∑i fi(xi)M

We can use a table to store the fi values –- i mi entries Table 4.2 Values needed in applying the

Chinese remainder theorem to RNS(8 | 7 | 5 | 3) ––––––––––––––––––––––––––––––

i mi xi 〈Mi 〈αi ximiM ––––––––––––––––––––––––––––––

3 8 0 0

1 105

2 210

3 315

. .

. .

. .

(56)

Intuitive Justification for CRT

Puzzle: What number has the remainders of 2, 3, and 2

when divided by the numbers 7, 5, and 3, respectively?

x = (2 | 3 | 2)RNS(7|5|3) = (?)ten

(1 | 0 | 0)RNS(7|5|3) = multiple of 15 that is 1 mod 7 = 15 (0 | 1 | 0)RNS(7|5|3) = multiple of 21 that is 1 mod 5 = 21 (0 | 0 | 1)RNS(7|5|3) = multiple of 35 that is 1 mod 3 = 70 (2 | 3 | 2)RNS(7|5|3) = (2 | 0 | 0) + (0 | 3 | 0) + (0 | 0 | 2)

= 2 × (1 | 0 | 0) + 3 × (0 | 1 | 0) + 2 × (0 | 0 | 1)

= 2 × 15 + 3 × 21 + 2 × 70

= 30 + 63 + 140

= 233 = 23 mod 105 Therefore, x = (23)ten

(57)

Difficult RNS Arithmetic Operations

„ Sign test

„ Magnitude comparison

„ Division

•Could convert back and forth to/from binary.

•Another approach: convert to a mixed radix system, as numbers in a mixed radix system are comparable.

(58)

Difficult RNS Arithmetic Operations

Example: Of the following RNS(8 | 7 | 5 | 3) numbers:

Which, if any, are negative?

Which is the largest?

Which is the smallest?

Assume a range of [–420, 419]

a = (0 | 1 | 3 | 2)RNS b = (0 | 1 | 4 | 1)RNS c = (0 | 6 | 2 | 1)RNS d = (2 | 0 | 0 | 2)RNS e = (5 | 0 | 1 | 0)RNS f = (7 | 6 | 4 | 2)RNS

Answers:

d < c < f < a < e < b

–70 < –8 < –1 < 8 < 21 < 64

(59)

General RNS Division

General RNS division, as opposed to division by one of the moduli (aka scaling), is difficult; hence, use of RNS is unlikely to be effective when an application requires many divisions

Scheme proposed in 1994 PhD thesis of Ching-Yu Hung (UCSB):

Use an algorithm that has built-in tolerance to imprecision, and apply the approximate CRT decoding to choose quotient digits

Example –– SRT algorithm (s is the partial remainder) s < 0 quotient digit = –1

s ≅ 0 quotient digit = 0 s > 0 quotient digit = 1

The BSD quotient can be converted to RNS on the fly

(60)

Limits of Fast Arithmetic in RNS

Known results from number theory

Implications to speed of arithmetic in RNS

Theorem 4.5: It is possible to represent all k-bit binary numbers in RNS with O(k / log k) moduli such that the largest modulus has O(log k) bits

That is, with fast log-time adders, addition needs O(log log k) time Theorem 4.2: The ith prime pi is asymptotically i ln i

Theorem 4.3: The number of primes in [1, n] is asymptotically n / ln n Theorem 4.4: The product of all primes in [1, n] is asymptotically en

(61)

Hardware Implementation for RNS Representations

mod 8 mod 7 mod 5 mod 3 Mod-8

Unit Mod-7

Unit Mod-5

Unit Mod-3 Unit

3 3 3 2

Operand 1 Operand 2

Result

(62)

Addition/Subtraction

Instructor: Kuan Jen Lin

E-Mail: [email protected] Dept. of EE, FJU, Taiwan

Room: SF 727B

Most slides originate from the textbook author’s PowerPoint presentation files.

(63)

II Addition / Subtraction

Chapter 8 Multioperand Addition Chapter 7 Variations in Fast Adder Chapter 6 Carry-Lookahead Adders

Chapter 5 Basic Addition and Counting

Topics in This Part

Review addition schemes and various speedup methods

• Addition is a key op (in itself, and as a building block)

• Subtraction = negation + addition

• Carry propagation speedup: lookahead, skip, select, …

• Two-operand versus multioperand addition

(64)

Basic Addition and Counting

Chapter Goals

Study the design of ripple-carry adders, discuss why their latency is unacceptable, and set the foundation for faster adders Chapter Highlights

Full adders are versatile building blocks Longest carry chain on average: log2k bits Fast asynchronous adders are simple

Counting is relatively easy to speed up

(65)

HA and FA Adders

Half-adder (HA): Truth table and block diagram

Full-adder (FA): Truth table and block diagram

x y c c s --- 0 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 1 1 0 1 0 0 0 1 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1 Inputs Outputs

c out c in

out

in x y

s FA x y c s

--- 0 0 0 0 0 1 0 1 1 0 0 1 1 1 1 0

Inputs Outputs

HA

x y c

s

數據

Fig. 5.6 Four-bit binary adder used to realize the  logic function f = w + xyz and its complement.
Fig. 8.5    Ripple-carry adders at levels i and i + 1 in  the tree of adders used for multi-operand addition.
Table 8.1  The maximum number n(h)  of inputs for an h-level CSA tree

參考文獻

相關文件

More precisely, it is the problem of partitioning a positive integer m into n positive integers such that any of the numbers is less than the sum of the remaining n − 1

11[] If a and b are fixed numbers, find parametric equations for the curve that consists of all possible positions of the point P in the figure, using the angle (J as the

In this process, we use the following facts: Law of Large Numbers, Central Limit Theorem, and the Approximation of Binomial Distribution by Normal Distribution or Poisson

Rashed, Roshdi (1994), The Development of Arabic Mathematics: Between Arithmetic and Algebra.. Dordrecht: Kluwer

Teachers may provide students with examples with contexts to enable them to discover the associative property of multiplication of three numbers, and design some concrete examples

Directed numbers 2.1 understand the concept of directed numbers 9 Students are required to represent the directed numbers on the number line.. Students are required to

Transforming Graphene Moire Blisters into Geometric Nanobubbles Jiong Lu, Antonion C.. Decouple graphene and merging of

• When a number can not be represented exactly with the fixed finite number of digits in a computer, a near-by floating-point number is chosen for approximate