Network Protocols: Design and Analysis

(1)

Network Protocols:

Design and Analysis

Polly Huang EE NTU

http://cc.ee.ntu.edu.tw/~phuang phuang@cc.ee.ntu.edu.tw

(2)

What’s important about the web?

• easy access to information

– vs. gopher: also easy to use, but no graphics – vs. e-mail, ftp

– and today, search engine

• supports rich data formats (text, graphics,

hypertext, etc.)

• fast, cheap, ubiquitous

• huge economic impact

(3)

Why was the web successful?

• Multimedia information

– vs. gopher: web had graphics – vs. ftp: a lot easier to use

• Multi-platform

– Really simple to implement

– My undergrad network course students are implementing one

(4)

Aside: the web and computer

science

• the web was invented by a physicist; why not a computer scientist?

– computer scientists did hypertext since the late ’60s

– he had info to exchange

• with non-CS expert

• (vs. CS people who knew FTP)

– web was too simple

• not interesting research, except that it found the right trade-off in complexity vs. benefit

(5)

What is the web?

• protocols

– formatting: HTML (XML, CSS)

– addressing: URLs (builds on DNS, IP, files; als o URNs)

(6)

Formatting: HTML

• SGML subset: simple ASCII with tags

– structure (ex. blockquote, headings, em)—logical info – display (ex. fonts, justifiction, colors, it)—physical layo

ut

– pro: easy to write (no special tools req’d)

• extensible (has grown [a lot!] over time)

– now includes forms, tables, frames, math, style sheets, etc.

– augmented by CSS (Cascading Style Sheets) do define display for HTML markup

(7)

Addressing: URLs

• protocol://host:port/hierarchy#part?search

– multiple protocols (http, ftp, etc.) – host:port—use DNS, distributed – hierarchy—locally defined

• pros:

– location independent (can move servers around, as long as DNS name stays same)

– global

• cons:

– some things don’t have URLs (sometimes data is “hidden” behind search forms and logins)

– some things are location dependent (ex. ~johnh stuff)

(8)

Data Transfer: HTTP

• stateless: no inherent notion of client

– so no shopping carts! later work added optional cookies to identify clients

• layers over TCP

• designed as faster replacement to FTP

• many features tacked on:

– content negotiation (graphics, language) – caching support

(9)

HTTP/0.9

• original protocol

– client: “GET /path” (close) – server: “data” (close)

• very very simple

• still valid

(10)

HTTP/1.0

• informational rfc1945 (60 pages) • client

– “GET /path HTTP/1.0”, headers, body (close) – other operations (GET, HEAD, POST)

• can send data (POST)

• can check for changes (HEAD)

• server

– headers, data (close)

– headers allow info (type, change time, etc.) – uses MIME types, adds content negoation, etc.

(11)

HTTP/1.1

• rfc2068 (162 pages!)

• basically like 1.0, but adds:

– persistent connections

– identity of host (supports virtual hosts) – detailed caching models

(12)

HTTP Request—More Detail

• Request line

– Method

• GET – return URI

• HEAD – return headers only of GET response • POST – send data to the server (forms, etc.)

– URI

• E.g. http://www.isi.edu/~govindan/index.html with a proxy • E.g. /index.html if no proxy

(13)

HTTP Request

• Request headers

– Authorization – authentication info – Acceptable document types/encodings – From – user email

– If-Modified-Since

– Referer – what caused this page to be requested – User-Agent – client software

• Blank-line • Body

(14)

HTTP Request Example

GET / HTTP/1.1 Accept: */*

Accept-Language: en-us

Accept-Encoding: gzip, deflate

User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Wi ndows NT 5.0)

Host: www.isi.edu

(15)

HTTP Response

• Status-line

– HTTP version

– 3 digit response code

• 1XX – informational • 2XX – success

• 3XX – redirection • 4XX – client error • 5XX – server error

(16)

HTTP Response

• Headers

– Location – for redirection – Server – server software

– WWW-Authenticate – request for authentication – Allow – list of methods supported (get, head, etc) – Content-Encoding – E.g x-gzip

– Content-Length – Content-Type – Expires – Last-Modified • Blank-line • Body

(17)

HTTP Response Example

HTTP/1.1 200 OK

Date: Tue, 27 Mar 2001 03:49:38 GMT

Server: Apache/1.3.14 (Unix) (Red-Hat/Linux) mod_ssl/2.7.1 OpenSSL/ 0.9.5a DAV/1.0.2 PHP/4.0.1pl2 mod_perl/1.24

Last-Modified: Mon, 29 Jan 2001 17:54:18 GMT ETag: "7a11f-10ed-3a75ae4a"

Accept-Ranges: bytes Content-Length: 4333

Keep-Alive: timeout=15, max=100 Connection: Keep-Alive

Content-Type: text/html …..

(18)

Caching

• Why cache?

– can save a lot of bandwidth

• Risk: out-of-date data

– file systems are usually strongly cache

coherent, detecting when data is modified and invalidating cached copies

– the web is provides only weak coherence, you can get old data

(19)

Web Cache Coherence

• Assume page is good until time t

–t could be in past

• First request for http:// foo/

–returns data with valid p eriod (“Expires: t”)

–clients may estimate val

• Subsequent request:

–if in valid period, replay data out of cache (no request)

–(server can prohibit this w/0 valid period)

–if beyond valid period, issue a conditional request

• GET with “If-Modified-Si nce” header

• Respose is either “304 Not modified” or “200 OK” +

(20)

Example Cache Check Request

GET / HTTP/1.1 Accept: */*

Accept-Language: en-us

Accept-Encoding: gzip, deflate

If-Modified-Since: Mon, 29 Jan 2001 17:54:18 GMT If-None-Match: "7a11f-10ed-3a75ae4a"

User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows N T 5.0)

(21)

Example Cache Check Response

HTTP/1.1 304 Not Modified

Date: Tue, 27 Mar 2001 03:50:51 GMT

Server: Apache/1.3.14 (Unix) (Red-Hat/Linu

x) mod_ssl/2.7.1 OpenSSL/0.9.5a DAV/1.0.

2 PHP/4.0.1pl2 mod_perl/1.24

Connection: Keep-Alive

(22)

Web Workloads

• Users make connections and request pages made up of objects

– a structural model

• user arrival: Poisson

• connection duration, number of object per page, object size: all often heavy tailed (Pareto)

– P&M give object stats:

• median: 1946 bytes • mean: 13767 bytes

(23)

Page Stats

• Popularity

– Zipf distribution

– requests for ith most popular document ~ i-a

– (popular are really popular, but long tail)

• Requests are bursty

(24)