I i f if h k ki • It is not for you if you have taken or are taking computer architecture

(1)

Course overview

Computer Organization and Assembly Languages p g z y g g Yung-Yu Chuang

with slides by Kip Irvine

(2)

Logistics

• Meeting time: 2:20pm-5:20pm, Wednesday Classroom: CSIE Room 111

• Classroom: CSIE Room 111

• Instructor: 莊永裕 Yung-Yu Chuang T hi i t t 黃子桓

• Teaching assistant:黃子桓

• Webpage:

http://www.csie.ntu.edu.tw/~cyy/asm

id / password p

• Mailing list: [email protected] Please subscribe via

Please subscribe via

https://cmlmail.csie.ntu.edu.tw/mailman/listinfo/assembly/

(3)

Caveats

• It is a course from the old curriculum.

I i f if h k ki

• It is not for you if you have taken or are taking computer architecture.

• It is not tested in your graduate school entrance exam, and not listed as a required course

anymore.

• It is a fundamental course, not a geek-level , g one.

• It is more like advanced introduction to CS

• It is more like advanced introduction to CS, better suited to freshman or sophomore.

(4)

Prerequisites

• Better to have programming experience with some high level languages such C C ++ Java some high-level languages such C, C ++,Java …

(5)

Textbook

• Readings and slides

(6)

References (TOY)

Princeton’s Introduction to CS,

htt // i t d /i t

http://www.cs.princeton.edu/intro cs/50machine/

http://www.cs.princeton.edu/intro cs/60circuits/

(7)

References (ARM)

ARM Assembly Language

P i P t K d

Programming, Peter Knaggs and Stephen Welsh

ARM System Developer’s Guide, Andrew Sloss, Dominic Symes and Andrew Sloss, Dominic Symes and Chris Wright

(8)

References (ARM)

Whirlwind Tour of ARM Assembly, TONC J Vij

TONC, Jasper Vijn.

ARM System-on-chip Architecture ARM System on chip Architecture, Steve Furber.

(9)

References (IA32)

Assembly Language for Intel-Based C t 5th Editi Ki I i Computers, 5th Edition, Kip Irvine

Th A t f A bl L R d The Art of Assembly Language, Randy Hyde

(10)

References (IA32)

Michael Abrash' s Graphics Programming Bl k B k

Black Book

C t S t A P '

Computer Systems: A Programmer's

Perspective, Randal E. Bryant and David R O'H ll

R. O'Hallaron

(11)

Grading (subject to change)

• Assignments (4 projects, 56%), most graded by performance

performance

• Class participation (4%)

• Midterm exam (16%)

• Final project (24%)p j ( )

– Examples from previous years

(12)

Computer Organization and Assembly language

• It is not only about assembly but also about

“computer organization” computer organization .

(13)

Early computers

(14)

Early programming tools

(15)

First popular PCs

(16)

Early PCs

• Intel 8086 processor processor

• 768KB memory

• 20MB disk

• Dot-Matrix

printer (9-pin)

(17)

GUI/IDE

(18)

More advanced architectures

• Pipeline SIMD

• SIMD

• Multi-core

• Cache

(19)

More advanced software

(20)

More “computers” around us

(21)

My computers

Desktop

(Intel Pentium D 3GHz Nvidia 7900)

VAIO Z46TD

(I l C 2 D P9700 2 8GH ) 3GHz, Nvidia 7900)

(Intel Core 2 Duo P9700 2.8GHz)

iPhone 3GS (ARM Cortex-A8

GBA SP 833MHz)

GBA SP

(ARM7 16.78MHz)

(22)

Computer Organization and Assembly language

• It is not only about assembly but also about

“computer organization” computer organization .

• It will cover

– Basic concept of computer systems and architecture – ARM architecture and assembly language

– x86 architecture and assembly language

(23)

TOY machine

(24)

TOY machine

• Starting from a simple construct

(25)

TOY machine

• Build several components and connect them together

together

(26)

TOY machine

• Almost as good as any computers

(27)

TOY machine

A DUP 32

int A[32]; 10: C020

lda R1, 1 lda RA, A

20: 7101 21: 7A00 lda RC, 0

d ld RD 0 FF i=0;

Do {

RD tdi

22: 7C00 23 8DFF read ld RD, 0xFF

bz RD, exit add R2 RA RC RD=stdin;

if (RD==0) break;

23: 8DFF 24: CD29 25: 12AC add R2, RA, RC

sti RD, R2 add RC, RC, R1 A[i]=RD;

i=i+1;

25: 12AC 26: BD02 27: 1CC1 bz R0, read

it jl RF i t } while (1);

i t ()

28: C023 29 FF2B exit jl RF, printr

hlt

printr(); 29: FF2B

2A: 0000

(28)

ARM

• ARM architecture

ARM bl i

• ARM assembly programming

(29)

IA32

• IA-32 Processor Architecture

• Data Transfers Addressing and Arithmetic

• Data Transfers, Addressing, and Arithmetic

• Procedures

• Conditional Processing g

• Integer Arithmetic

• Advanced Procedures

• Strings and Arrays

• High-Level Language Interface

• Real Arithmetic (FPU)

• SIMD

• Code Optimization

(30)

What you will learn

• Basic principle of computer architecture

H k

• How your computer works

• How your C programs work

• Assembly basics

• ARM assembly programming

• IA-32 assembly programming

S ifi t FPU/MMX

• Specific components, FPU/MMX

• Code optimization

• Interface between assembly to high-level languageg g

(31)

Why taking this course?

• Does anyone really program in assembly nowadays?

nowadays?

Yes at times you do need to write assembly

• Yes, at times, you do need to write assembly code.

• It is foundation for computer architecture and

• It is foundation for computer architecture and compilers. It is related to electronics, logic

design and operating system design and operating system.

(32)

CSIE courses

• Hardware: electronics, digital system, architecture

architecture

• Software: operating system, compiler

(33)

wikipedia

• Today, assembly language is used primarily for direct hardware manipulation access to

direct hardware manipulation, access to

specialized processor instructions, or to address critical performance issues Typical uses

critical performance issues. Typical uses

are device drivers, low-level embedded systems, and real time systems

and real-time systems.

(34)

Reasons for not using assembly

• Development time: it takes much longer to

develop in assembly Harder to debug no type develop in assembly. Harder to debug, no type checking, side effects…

M i t i bilit t t d di t t i k

• Maintainability: unstructured, dirty tricks

• Portability: platform-dependent

(35)

Reasons for using assembly

• Educational reasons: to understand how CPUs and compilers work Better understanding to and compilers work. Better understanding to efficiency issues of various constructs.

D l i il d b d th

• Developing compilers, debuggers and other development tools.

• Hardware drivers and system code

• Embedded systemsy

• Developing libraries.

• Accessing instructions that are not available

• Accessing instructions that are not available through high-level languages.

O ti i i f d

• Optimizing for speed or space

(36)

To sum up

• It is all about lack of smart compilers

• Faster code, compiler is not good enough

• Smaller code , compiler is not good enough, e.g.

mobile devices, embedded devices, also , ,

Smaller code → better cache performance → faster code

• Unusual architecture , there isn’t even a

compiler or compiler quality is bad eg GPU compiler or compiler quality is bad, eg GPU, DSP chips, even MMX.

(37)

Overview

• Virtual Machine Conceptp

• Data Representation

• Boolean Operations

(38)

Translating languages

English: Display the sum of A times B plus C English: Display the sum of A times B plus C.

C++:

cout << (A * B + C);

cout << (A B + C);

Intel Machine Language:

Assembly Language:

mov eax,A

Intel Machine Language:

A1 00000000

F7 25 00000004 mul B

add eax,C

ll W it I t

F7 25 00000004 03 05 00000008 E8 00500000

call WriteInt E8 00500000

(39)

Virtual machines

Abstractions for computers

High-Level Language Level 5

Assembly Language Level 4

Operating System Instruction Set

Level 3

Architecture

Microarchitecture Level 1 Level 2

Digital Logic

Level 0

(40)

High-level language

• Level 5

• Application-oriented languages

• Programs compile into assembly language Programs compile into assembly language (Level 4)

cout << (A * B + C);

(41)

Assembly language

• Level 4

• Instruction mnemonics that have a one-to-one correspondence to machine language

• Calls functions written at the operating system level (Level 3)y ( )

• Programs are translated into machine language (Level 2)

language (Level 2)

mov eax, A mul B

mul B

add eax, C call WriteInt

(42)

Operating system

• Level 3

• Provides services

• Programs translated and run at the instruction g set architecture level (Level 2)

(43)

Instruction set architecture

• Level 2

• Also known as conventional machine language

• Executed by Level 1 program y p g (microarchitecture, Level 1)

A1 00000000

F7 25 00000004 03 05 00000008 E8 00500000

(44)

Microarchitecture

• Level 1

• Interprets conventional machine instructions (Level 2)

• Executed by digital hardware (Level 0)

(45)

Digital logic

• Level 0

CPU d f di i l l i

• CPU, constructed from digital logic gates

• System bus

• Memory

(46)

Data representation

• Computer is a construction of digital circuits with two states: on and off

with two states: on and off

• You need to have the ability to translate

b t diff t t ti t i

between different representations to examine the content of the machine

• Common number systems: binary, octal, decimal and hexadecimal

(47)

Binary representations

• Electronic Implementation

E t t ith bi t bl l t – Easy to store with bistable elements

– Reliably transmitted on noisy and inaccurate wires

0 1 0

2.8V 3.3V

0.0V 0.5V

(48)

Binary numbers

• Digits are 1 and 0

( bi di it i ll d bit) (a binary digit is called a bit) 1 = true

0 = false

• MSB –most significant bit

• LSB –least significant bit

MSB LSB

• Bit numbering: 1 0 1 1 0 0 1 0 1 0 0 1 1 1 0 0

MSB LSB

A bit string could have different interpretations

0 15

• A bit string could have different interpretations

(49)

Unsigned binary integers

• Each digit (bit) is either 1 or 0

• Each bit represents a power of 2: ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹

2⁷ 2⁶ 2⁵ 2⁴ 2³ 2² 2¹ 2⁰

Every binary number is a

f

sum of powers of 2

(50)

Translating binary to decimal

Weighted positional notation shows how to Weighted positional notation shows how to

calculate the decimal value of each binary bit:

d (D 2^{n 1}) (D 2^{n 2}) (D 2¹) (D dec = (D_n-1  2^n-1)  (D_n-2  2^n-2)  ...  (D₁  2¹)  (D₀

 2⁰)

D = binary digit

binary 00001001 = decimal 9:

(1 2³) (1 2⁰) 9 (1  2³) + (1  2⁰) = 9

(51)

Translating unsigned decimal to binary

• Repeatedly divide the decimal integer by 2. Each remainder is a binary digit in the translated value:

remainder is a binary digit in the translated value:

37 = 100101 37 = 100101

(52)

Binary addition

• Starting with the LSB, add each pair of digits, include the carry if present

include the carry if present.

1 carry:

0 0 0 0 0 1 0 0

1

(4)

carry:

0 0 0 0 0 1 1 1

+

⁽⁷⁾

0 0 0 0 1 0 1 1

0 0 0 0 1 0 1 1 (11)

0 1

2 3

4

bit position: 7 6 5 4 3 2 1 0

bit position: 7 6 5

(53)

Integer storage sizes

byte

16 8

Standard sizes: ¹⁶

32 word

doubleword

64 quadword

Standard sizes:

64 quadword

Practice: What is the largest unsigned integer that may be stored in 20 bits?

(54)

Large measurements

• Kilobyte (KB), 2¹⁰ bytes M b (MB) 2²⁰ b

• Megabyte (MB), 2²⁰ bytes

• Gigabyte (GB), 2³⁰ bytes

• Terabyte (TB), 2⁴⁰ bytes

• Petabyte

• Exabyte Z tt b t

• Zettabyte

• Yottabyte

(55)

Hexadecimal integers

All values in memory are stored in binary. Because long binary numbers are hard to read we use hexadecimal binary numbers are hard to read, we use hexadecimal representation.

(56)

Translating binary to hexadecimal

• Each hexadecimal digit corresponds to 4 binary bits.

• Example: Translate the binary integer

000101101010011110010100 to hexadecimal:

(57)

Converting hexadecimal to decimal

• Multiply each digit by its corresponding f 16

power of 16:

dec = (D₃  16³) + (D₂  16²) + (D₁  16¹) + (D₀  16⁰)

H 1234 l (1 16³) + (2 16²) + (3 16¹) + (4

• Hex 1234 equals (1  16³) + (2  16²) + (3  16¹) + (4

 16⁰), or decimal 4,660.

• Hex 3BA4 equals (3Hex 3BA4 equals (3  16 ) + (11 16 ) + (10  16 )  16³) + (11 * 16²) + (10  16¹) + (4  16⁰), or decimal 15,268.

(58)

Powers of 16

Used when calculating hexadecimal values up to 8 digits long:

(59)

Converting decimal to hexadecimal

decimal 422 = 1A6 hexadecimal

(60)

Hexadecimal addition

Divide the sum of two digits by the number base (16) Th ti t b th l d (16). The quotient becomes the carry value, and the remainder is the sum digit.

36 28 28 6A

1 1

36 28 28 6A

42 45 58 4B

78 6D 80 B5

Important skill: Programmers frequently add and subtract the addresses of variables and instructions

addresses of variables and instructions.

(61)

Hexadecimal subtraction

When a borrow is required from the digit to the l ft dd 10h t th t di it' l

left, add 10h to the current digit's value:

C6 75

1

A2 47

24 2E

Practice: The address of var1 is 00400020. The address of the next variable after var1 is 0040006A How many bytes are used by var1?

variable after var1 is 0040006A. How many bytes are used by var1?

(62)

Signed integers

The highest bit indicates the sign. 1 = negative, 0 i i

0 = positive

sign bit sign bit

1 1 1 1 0 1 1 0

Negative

0 0 0 0 1 0 1 0 Positive

If the highest digit of a hexadecmal integer is > 7, the value is negative Examples: 8A C5 A2 9D

negative. Examples: 8A, C5, A2, 9D

(63)

Two's complement notation

Steps:

Complement (reverse) each bit – Complement (reverse) each bit – Add 1

Note that 00000001 + 11111111 = 00000000

(64)

Binary subtraction

• When subtracting A – B, convert B to its two's complement

complement

• Add A to (–B)

0 1 0 1 0 0 1 0 1 0 – 0 1 0 1 1 1 0 1 0 0 1 1 1 1 1 Advantages for 2’s complement:

Advantages for 2’s complement:

• No two 0’s

• Sign bit

• Remove the need for separate circuits for add and sub

(65)

Ranges of signed integers

The highest bit is reserved for the sign. This limits the range:

the range:

(66)

Character

• Character sets

St d d ASCII(0 127) – Standard ASCII(0 – 127) – Extended ASCII (0 – 255)

ANSI (0 255) – ANSI (0 – 255)

– Unicode (0 – 65,535)

• Null-terminated String

– Array of characters followed by a null byte

• Using the ASCII table

– back inside cover of book

(67)

Representing Instructions

int sum(int x, int y)

{ ^{Alpha sum} ^{Sun sum} ^{PC sum}

{

return x+y;

}

55 89 00

00 p

81 C3

– For this example, Alpha &

Sun use two 4-byte

E5 8B 45 30

42 01

E0 08 90

instructions

• Use differing numbers of instructions in other cases

0C 03 45 80

FA 6B

02 00 instructions in other cases 09

– PC uses 7 instructions with lengths 1, 2, and 3

08 89 EC

Diff t hi t t ll diff t

g , ,

bytes

• Same for NT and for Linux

EC 5D C3 Different machines use totally different instructions and encodings

• NT / Linux not fully binary compatible

(68)

Boolean algebra

• Boolean expressions created from:

– NOT, AND, OR

(69)

NOT

• Inverts (reverses) a boolean value

• Truth table for Boolean NOT operator:

Digital gate diagram for NOT:

NOT

(70)

AND

• Truth if both are true

• Truth table for Boolean AND operator:

Digital gate diagram for AND:

A N D

(71)

OR

• True if either is true

• Truth table for Boolean OR operator:

Digital gate diagram for OR:

O R

(72)

Implementation of gates

• Fluid switch (http://www.cs.princeton.edu/introcs/lectures/fluid-computer.swf)

(73)

Implementation of gates

(74)

Implementation of gates

(75)

Truth Tables

^{(1 of 2)}

• A Boolean function has one or more Boolean

i d i l B l

inputs, and returns a single Boolean output.

• A truth table shows all the inputs and outputs of a Boolean function

Example: X  Y

(76)

Truth Tables

^{(2 of 2)}

• Example: X  Y